Text to Speech in Javascript

Text to speech (also known as tts or speech synthesis) is an offshoot of the Web Speech API which provides distinct areas of functionality, including speech recognition.

Speech Synthesis

Speech synthesis (aka text-to-speech, or tts) has to do with receiving synthesising text contained within an app to speech, and playing it out of a device’s speaker or audio output connection.

The Web Speech API has a main controller interface for this — SpeechSynthesis — plus a number of closely-related interfaces for representing text to be synthesized (known as utterances), voices to be used for the utterance, etc. Again, most OSes have some kind of speech synthesis system, which will be used by the API for this task as available.

In this tutorial, we shall build a simple webpage that uses the Web Speech API to implement the text to speech.

Prerequisites

To get through with this tutorial, you should have:

A basic understanding of HTML & CSS
A basic understanding of Javascript
Code editor of your choice.

Let’s begin …

First, HTML

<!DOCTYPE>
<html lang="en">
  <head>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta1/dist/css/bootstrap.min.css" rel="stylesheet" />
    <link rel="stylesheet" href="index.css" />
    <title>Text to Speech</title>
  </head>
  <body class="container mt-5 bg-secondary">
    <h1 class="text-light">Codeflare Text to Speech</h1>
    <p class="lead text-light mt-4">Select Voice</p>

    <!-- Select Menu for Voice -->
    <select id="voices" class="form-select bg-secondary text-light"></select>

    <!-- Range Slliders for Volume, Rate & Pitch -->
    <div class="d-flex mt-4 text-light">
      <div>
        <p class="lead">Volume</p>
        <input type="range" min="0" max="1" value="1" step="0.1" id="volume" />
        <span id="volume-label" class="ms-2">1</span>
      </div>
      <div class="mx-5">
        <p class="lead">Rate</p>
        <input type="range" min="0.1" max="10" value="1" id="rate" step="0.1" />
        <span id="rate-label" class="ms-2">1</span>
      </div>
      <div>
        <p class="lead">Pitch</p>
        <input type="range" min="0" max="2" value="1" step="0.1" id="pitch" />
        <span id="pitch-label" class="ms-2">1</span>
      </div>
    </div>

    <!-- Text Area  for the User to Type -->
    <textarea class="form-control bg-dark text-light mt-5" cols="30" rows="10" placeholder="Type here..."></textarea>

    <!-- Control Buttons -->
    <div class="mb-5">
      <button id="start" class="btn btn-success mt-5 me-3">Start</button>
      <button id="pause" class="btn btn-warning mt-5 me-3">Pause</button>
      <button id="resume" class="btn btn-info mt-5 me-3">Resume</button>
      <button id="cancel" class="btn btn-danger mt-5 me-3">Stop</button>
    </div>
  </body>
  <script src="./main.js"></script>
</html>

Here, we are using:

Bootstrap to style the web page
A textarea where what we want to be read is typed
Control buttons
A select option where we can select available voices
Range sliders for volume, pitch as well as rate.
We are also setting our language to English because we want the text to read in English.

Text to speech in Javascript

Now, Javascript …

We first need to create an instance of the SpeechSynthesisUtterance class. This instance will be configured with various properties, such as language, volume, text.

// Initialize new SpeechSynthesisUtterance object
let speech = new SpeechSynthesisUtterance();

// Set Speech Language
speech.lang = "en";

let voices = []; // global array of available voices

window.speechSynthesis.onvoiceschanged = () => {
  // Get List of Voices
  voices = window.speechSynthesis.getVoices();

  // Initially set the First Voice in the Array.
  speech.voice = voices[0];

  // Set the Voice Select List. (Set the Index as the value, which we'll use later when the user updates the Voice using the Select Menu.)
  let voiceSelect = document.querySelector("#voices");
  voices.forEach((voice, i) => (voiceSelect.options[i] = new Option(voice.name, i)));
};

document.querySelector("#rate").addEventListener("input", () => {
  // Get rate Value from the input
  const rate = document.querySelector("#rate").value;

  // Set rate property of the SpeechSynthesisUtterance instance
  speech.rate = rate;

  // Update the rate label
  document.querySelector("#rate-label").innerHTML = rate;
});

document.querySelector("#volume").addEventListener("input", () => {
  // Get volume Value from the input
  const volume = document.querySelector("#volume").value;

  // Set volume property of the SpeechSynthesisUtterance instance
  speech.volume = volume;

  // Update the volume label
  document.querySelector("#volume-label").innerHTML = volume;
});

document.querySelector("#pitch").addEventListener("input", () => {
  // Get pitch Value from the input
  const pitch = document.querySelector("#pitch").value;

  // Set pitch property of the SpeechSynthesisUtterance instance
  speech.pitch = pitch;

  // Update the pitch label
  document.querySelector("#pitch-label").innerHTML = pitch;
});

document.querySelector("#voices").addEventListener("change", () => {
  // On Voice change, use the value of the select menu (which is the index of the voice in the global voice array)
  speech.voice = voices[document.querySelector("#voices").value];
});

document.querySelector("#start").addEventListener("click", () => {
  // Set the text property with the value of the textarea
  speech.text = document.querySelector("textarea").value;

  // Start Speaking
  window.speechSynthesis.speak(speech);
});

document.querySelector("#pause").addEventListener("click", () => {
  // Pause the speechSynthesis instance
  window.speechSynthesis.pause();
});

document.querySelector("#resume").addEventListener("click", () => {
  // Resume the paused speechSynthesis instance
  window.speechSynthesis.resume();
});

document.querySelector("#cancel").addEventListener("click", () => {
  // Cancel the speechSynthesis instance
  window.speechSynthesis.cancel();
});

Now, we have our text-to-speech logic fully functioning.

Go ahead and check the Github Repo.

See also …

7 Javascript Shortcut you should use in your next project

Next Aspect Ratio in CSS »

Previous « Dynamically Populate Select Options in React JS

Leave a Comment

Share

Published by

codeflare

3 years ago

Recent Posts

programming

How Facial Recognition Software Works

Facial recognition technology is rapidly changing how we interact with devices, access services, and enhance…

1 day ago

softare development

Why Grok 4 is the AI Game-Changer You Need to Know

Move over ChatGPT, there's a new, significantly upgraded player causing a stir. xAI, Elon Musk's…

1 week ago

softare development

Cloudinary vs. AWS vs. ImageKit.io vs. Cloudflare

Choosing the right asset management service is vital. Cloudinary is frequently mentioned, but how does…

2 weeks ago

softare development

How to Integrate Cloudinary with PHP

Cloudinary is a powerful cloud-based media management platform that allows you to upload, store, manage,…

3 weeks ago

softare development

Trump Extends U.S. TikTok Sale Deadline to September 2025

In a surprising turn of events, former President Donald Trump announced on June 19, 2025,…

4 weeks ago

softare development

Master React Native Flexbox

Flexbox is a powerful layout system in React Native that allows developers to create responsive…

1 month ago

L