How to use speech to text

How to use speech to text

2023-10-20
4 min read

Introduction

Web Speech API is one of useful API to translate speech to text. In order to use this API, user should have mic and mic permission on their browser.

Browser Support

js
if ('webkitSpeechRecognition' in window) {
  console.log('Speech Recognition is avaialable')
} else {
  console.log('Speech Recognition Not Available')
}

Checking browser supports is required

SpeechRecognition()

js
let recognition = new webkitSpeechRecognition();

The SpeechRecognition() interface is not working on recent browser, so create instance with webkitSpeechRecognition()

Methods

js
recognition.start();
recognition.stop();
NameDescription
start()Start to record
stop()Stop to record and return reulst
abort()Stop to record
There is a difference between stop() method and abort() method. The stop() method returns speaking results.
On the other hands, the abort() method stop without any results.

Event Listeners

js
recognition.onresult = function(event) {
  // results handling
}

// When speech is end
recognition.onspeechend = function() {
  recognition.stop();
}

Events

NameDescription
onresutEvent for result handling. The fring time is different depending on option.
onspeechstartWhen the speech that will be used for speech recognition has started
onspeechendWhen the speech that will be used for speech recognition has ended.
onstartWhen the recognition service has begun to listen to the audio with the intention of recognizing.
onnomatchGrammarList option is used and when words are matched
onendWhen the service has disconnected. The event must always be generated when the session ends no matter the reason for the end
onerrorWhen a speech recognition error occur

Configuration

NameTypeDescription
interimResultsbooleanControls whether interim results are returned. When set to true, interim results should be returned
continuousbooleanWhen the continuous attribute is set to false, the user agent must return no more than one final result in response to starting recognition
langstringSet the language of the recognition. It's using System langauge on default

Examples

HTML

html
<div class="font-sans ">
  <div class="flex">
    <div>
      Status:
    </div>
    <div id="status">
      WAITING
    </div>
  </div>
  <div class="border py-2">
    <div class="flex">
      <div>
        Final:
      </div>
      <div id="final">
      </div>
    </div>
    <div class="flex">
      <div>
        interim:
      </div>
      <div id="interim">
      </div>
    </div>
  </div>
  <div>
    <button id="start-btn">
      Start
    </button>
    <button id="stop-btn">
      Stop
    </button>
  </div>
</div>

Script

js
// Check browser supports
if ("webkitSpeechRecognition" in window) {
  // webkitSpeechRecognition Instance
  let speechRecognition = new webkitSpeechRecognition();

  // Final transcription will be here
  let finalTranscript = "";

    
  speechRecognition.continuous = true; // Continuse to record until stop button is clicked
  speechRecognition.interimResults = true; // To display test results for  InterimResults, make it true. 

  // Callback Function for the onStart Event
  speechRecognition.onstart = () => {
    document.querySelector("#status").innerHTML = 'START'
  };
  speechRecognition.onerror = () => {
    document.querySelector("#status").innerHTML = 'ERROR'
  };
  speechRecognition.onend = () => {
    document.querySelector("#status").innerHTML = 'WAITING'
  };

  speechRecognition.onresult = (event) => {
    let interimTranscript = "";

    // Loop through the results from the speech recognition object.
    for (let i = event.resultIndex; i < event.results.length; ++i) {
      // Result is consist of two dimention array
      if (event.results[i].isFinal) {
        finalTranscript += event.results[i][0].transcript;
      } else {
        interimTranscript += event.results[i][0].transcript;
      }
    }

    // Update HTML 
    document.querySelector("#final").innerHTML = finalTranscript;
    document.querySelector("#interim").innerHTML = interimTranscript;
  };

  // Start button click event handler
  document.querySelector("#start-btn").onclick = () => {
    // Start the Speech Recognition
    speechRecognition.start();
  };
  // Stop button click event handler
  document.querySelector("#stop-btn").onclick = () => {
    // Stop the Speech Recognition
    speechRecognition.stop();
  };
} else {
  console.log("Speech Recognition Not Available");
}

Copy and Past above code. Clicking the Start button starts to record and the stop button stops recoding

Advanced - Continous on mobile

Unfortunately, it's hard to use continous option on mobile because of security issue. However, there is a small trick to solve this issue.

js
if ("webkitSpeechRecognition" in window) {
  let speechRecognition = new webkitSpeechRecognition();

  let finalTranscript = "";
  // Flag for whether starts recognition again
  let isContinous = false;

    
  speechRecognition.continuous = false; // It's not working on mobile
  speechRecognition.interimResults = true;

  // Callback Function for the onStart Event
  speechRecognition.onstart = () => {
    document.querySelector("#status").innerHTML = 'START'
  };
  speechRecognition.onerror = () => {
    document.querySelector("#status").innerHTML = 'ERROR'
  };
  speechRecognition.onend = () => {
    document.querySelector("#status").innerHTML = 'WAITING'
  };
  
  speechRecognition.onspeechend = () => {
    // Restart if flag is true
    if(isContinous)
        speechRecognition.start();
  };

  speechRecognition.onresult = (event) => {
    let interimTranscript = "";

    // Loop through the results from the speech recognition object.
    for (let i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        finalTranscript += event.results[i][0].transcript;
      } else {
        interimTranscript += event.results[i][0].transcript;
      }
    }

    document.querySelector("#final").innerHTML = finalTranscript;
    document.querySelector("#interim").innerHTML = interimTranscript;
  };

  document.querySelector("#start-btn").onclick = () => {
    speechRecognition.start();
    isContinous = true;
  };

  document.querySelector("#stop-btn").onclick = () => {
    speechRecognition.stop();
    isContinous = false;
  };
} else {
  console.log("Speech Recognition Not Available");
}

The isContious variable restarts again even though recoding is off automatically until user clicks stop button.

Reference

Free open source project made by Youngjin