How to use speech to text

Introduction

Web Speech API is one of useful API to translate speech to text. In order to use this API, user should have mic and mic permission on their browser.

Browser Support

if ('webkitSpeechRecognition' in window) {
  console.log('Speech Recognition is avaialable')
} else {
  console.log('Speech Recognition Not Available')
}

Checking browser supports is required

`SpeechRecognition()`

let recognition = new webkitSpeechRecognition();

The SpeechRecognition() interface is not working on recent browser, so create instance with webkitSpeechRecognition()

Methods

recognition.start();
recognition.stop();

Name	Description
start()	Start to record
stop()	Stop to record and return reulst
abort()	Stop to record
There is a difference between `stop()` method and `abort()` method. The `stop()` method returns speaking results.
On the other hands, the `abort()` method stop without any results.

Event Listeners

recognition.onresult = function(event) {
  // results handling
}

// When speech is end
recognition.onspeechend = function() {
  recognition.stop();
}

Events

Name	Description
onresut	Event for result handling. The fring time is different depending on option.
onspeechstart	When the speech that will be used for speech recognition has started
onspeechend	When the speech that will be used for speech recognition has ended.
onstart	When the recognition service has begun to listen to the audio with the intention of recognizing.
onnomatch	GrammarList option is used and when words are matched
onend	When the service has disconnected. The event must always be generated when the session ends no matter the reason for the end
onerror	When a speech recognition error occur

Configuration

Name	Type	Description
interimResults	boolean	Controls whether interim results are returned. When set to true, interim results should be returned
continuous	boolean	When the continuous attribute is set to false, the user agent must return no more than one final result in response to starting recognition
lang	string	Set the language of the recognition. It's using System langauge on default

Examples

HTML

html

<div class="font-sans ">
  <div class="flex">
    <div>
      Status:
    </div>
    <div id="status">
      WAITING
    </div>
  </div>
  <div class="border py-2">
    <div class="flex">
      <div>
        Final:
      </div>
      <div id="final">
      </div>
    </div>
    <div class="flex">
      <div>
        interim:
      </div>
      <div id="interim">
      </div>
    </div>
  </div>
  <div>
    <button id="start-btn">
      Start
    </button>
    <button id="stop-btn">
      Stop
    </button>
  </div>
</div>

Script

// Check browser supports
if ("webkitSpeechRecognition" in window) {
  // webkitSpeechRecognition Instance
  let speechRecognition = new webkitSpeechRecognition();

  // Final transcription will be here
  let finalTranscript = "";

    
  speechRecognition.continuous = true; // Continuse to record until stop button is clicked
  speechRecognition.interimResults = true; // To display test results for  InterimResults, make it true. 

  // Callback Function for the onStart Event
  speechRecognition.onstart = () => {
    document.querySelector("#status").innerHTML = 'START'
  };
  speechRecognition.onerror = () => {
    document.querySelector("#status").innerHTML = 'ERROR'
  };
  speechRecognition.onend = () => {
    document.querySelector("#status").innerHTML = 'WAITING'
  };

  speechRecognition.onresult = (event) => {
    let interimTranscript = "";

    // Loop through the results from the speech recognition object.
    for (let i = event.resultIndex; i < event.results.length; ++i) {
      // Result is consist of two dimention array
      if (event.results[i].isFinal) {
        finalTranscript += event.results[i][0].transcript;
      } else {
        interimTranscript += event.results[i][0].transcript;
      }
    }

    // Update HTML 
    document.querySelector("#final").innerHTML = finalTranscript;
    document.querySelector("#interim").innerHTML = interimTranscript;
  };

  // Start button click event handler
  document.querySelector("#start-btn").onclick = () => {
    // Start the Speech Recognition
    speechRecognition.start();
  };
  // Stop button click event handler
  document.querySelector("#stop-btn").onclick = () => {
    // Stop the Speech Recognition
    speechRecognition.stop();
  };
} else {
  console.log("Speech Recognition Not Available");
}

Copy and Past above code. Clicking the Start button starts to record and the stop button stops recoding

Advanced - Continous on mobile

Unfortunately, it's hard to use continous option on mobile because of security issue. However, there is a small trick to solve this issue.

if ("webkitSpeechRecognition" in window) {
  let speechRecognition = new webkitSpeechRecognition();

  let finalTranscript = "";
  // Flag for whether starts recognition again
  let isContinous = false;

    
  speechRecognition.continuous = false; // It's not working on mobile
  speechRecognition.interimResults = true;

  // Callback Function for the onStart Event
  speechRecognition.onstart = () => {
    document.querySelector("#status").innerHTML = 'START'
  };
  speechRecognition.onerror = () => {
    document.querySelector("#status").innerHTML = 'ERROR'
  };
  speechRecognition.onend = () => {
    document.querySelector("#status").innerHTML = 'WAITING'
  };
  
  speechRecognition.onspeechend = () => {
    // Restart if flag is true
    if(isContinous)
        speechRecognition.start();
  };

  speechRecognition.onresult = (event) => {
    let interimTranscript = "";

    // Loop through the results from the speech recognition object.
    for (let i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        finalTranscript += event.results[i][0].transcript;
      } else {
        interimTranscript += event.results[i][0].transcript;
      }
    }

    document.querySelector("#final").innerHTML = finalTranscript;
    document.querySelector("#interim").innerHTML = interimTranscript;
  };

  document.querySelector("#start-btn").onclick = () => {
    speechRecognition.start();
    isContinous = true;
  };

  document.querySelector("#stop-btn").onclick = () => {
    speechRecognition.stop();
    isContinous = false;
  };
} else {
  console.log("Speech Recognition Not Available");
}

The isContious variable restarts again even though recoding is off automatically until user clicks stop button.