Text to speech (also known as tts or speech synthesis) is an offshoot of the Web Speech API which provides distinct areas of functionality, including speech recognition.
Speech synthesis (aka text-to-speech, or tts) has to do with receiving synthesising text contained within an app to speech, and playing it out of a device’s speaker or audio output connection.
The Web Speech API has a main controller interface for this — SpeechSynthesis
— plus a number of closely-related interfaces for representing text to be synthesized (known as utterances), voices to be used for the utterance, etc. Again, most OSes have some kind of speech synthesis system, which will be used by the API for this task as available.
In this tutorial, we shall build a simple webpage that uses the Web Speech API to implement the text to speech.
To get through with this tutorial, you should have:
Let’s begin …
<!DOCTYPE>
<html lang="en">
<head>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta1/dist/css/bootstrap.min.css" rel="stylesheet" />
<link rel="stylesheet" href="index.css" />
<title>Text to Speech</title>
</head>
<body class="container mt-5 bg-secondary">
<h1 class="text-light">Codeflare Text to Speech</h1>
<p class="lead text-light mt-4">Select Voice</p>
<!-- Select Menu for Voice -->
<select id="voices" class="form-select bg-secondary text-light"></select>
<!-- Range Slliders for Volume, Rate & Pitch -->
<div class="d-flex mt-4 text-light">
<div>
<p class="lead">Volume</p>
<input type="range" min="0" max="1" value="1" step="0.1" id="volume" />
<span id="volume-label" class="ms-2">1</span>
</div>
<div class="mx-5">
<p class="lead">Rate</p>
<input type="range" min="0.1" max="10" value="1" id="rate" step="0.1" />
<span id="rate-label" class="ms-2">1</span>
</div>
<div>
<p class="lead">Pitch</p>
<input type="range" min="0" max="2" value="1" step="0.1" id="pitch" />
<span id="pitch-label" class="ms-2">1</span>
</div>
</div>
<!-- Text Area for the User to Type -->
<textarea class="form-control bg-dark text-light mt-5" cols="30" rows="10" placeholder="Type here..."></textarea>
<!-- Control Buttons -->
<div class="mb-5">
<button id="start" class="btn btn-success mt-5 me-3">Start</button>
<button id="pause" class="btn btn-warning mt-5 me-3">Pause</button>
<button id="resume" class="btn btn-info mt-5 me-3">Resume</button>
<button id="cancel" class="btn btn-danger mt-5 me-3">Stop</button>
</div>
</body>
<script src="./main.js"></script>
</html>
Here, we are using:
We first need to create an instance of the SpeechSynthesisUtterance
class. This instance will be configured with various properties, such as language, volume, text.
// Initialize new SpeechSynthesisUtterance object
let speech = new SpeechSynthesisUtterance();
// Set Speech Language
speech.lang = "en";
let voices = []; // global array of available voices
window.speechSynthesis.onvoiceschanged = () => {
// Get List of Voices
voices = window.speechSynthesis.getVoices();
// Initially set the First Voice in the Array.
speech.voice = voices[0];
// Set the Voice Select List. (Set the Index as the value, which we'll use later when the user updates the Voice using the Select Menu.)
let voiceSelect = document.querySelector("#voices");
voices.forEach((voice, i) => (voiceSelect.options[i] = new Option(voice.name, i)));
};
document.querySelector("#rate").addEventListener("input", () => {
// Get rate Value from the input
const rate = document.querySelector("#rate").value;
// Set rate property of the SpeechSynthesisUtterance instance
speech.rate = rate;
// Update the rate label
document.querySelector("#rate-label").innerHTML = rate;
});
document.querySelector("#volume").addEventListener("input", () => {
// Get volume Value from the input
const volume = document.querySelector("#volume").value;
// Set volume property of the SpeechSynthesisUtterance instance
speech.volume = volume;
// Update the volume label
document.querySelector("#volume-label").innerHTML = volume;
});
document.querySelector("#pitch").addEventListener("input", () => {
// Get pitch Value from the input
const pitch = document.querySelector("#pitch").value;
// Set pitch property of the SpeechSynthesisUtterance instance
speech.pitch = pitch;
// Update the pitch label
document.querySelector("#pitch-label").innerHTML = pitch;
});
document.querySelector("#voices").addEventListener("change", () => {
// On Voice change, use the value of the select menu (which is the index of the voice in the global voice array)
speech.voice = voices[document.querySelector("#voices").value];
});
document.querySelector("#start").addEventListener("click", () => {
// Set the text property with the value of the textarea
speech.text = document.querySelector("textarea").value;
// Start Speaking
window.speechSynthesis.speak(speech);
});
document.querySelector("#pause").addEventListener("click", () => {
// Pause the speechSynthesis instance
window.speechSynthesis.pause();
});
document.querySelector("#resume").addEventListener("click", () => {
// Resume the paused speechSynthesis instance
window.speechSynthesis.resume();
});
document.querySelector("#cancel").addEventListener("click", () => {
// Cancel the speechSynthesis instance
window.speechSynthesis.cancel();
});
Now, we have our text-to-speech logic fully functioning.
Go ahead and check the Github Repo.
Introduction The Observer Pattern is a design pattern used to manage and notify multiple objects…
Memory management is like housekeeping for your program—it ensures that your application runs smoothly without…
JavaScript has been a developer’s best friend for years, powering everything from simple websites to…
In the digital age, web development plays a crucial role in shaping how individuals interact…
Introduction Handling large amounts of data efficiently can be a challenge for developers, especially when…