ExploreOurteamtrainingCourses (3)

As a society, we’ve become increasingly intrigued by the concept of machines that can talk and listen. From fictional AI systems like HAL 9000 in 2001: A Space Odyssey (“I’m sorry, Dave. I’m afraid I can’t do that.”), to Apple’s Siri ,and Google’s new Assistant, our culture seems inexorably drawn to the idea of digital beings with ears and a voice. Implementing such sophisticated technology may seem far beyond the grasp of a beginner or even more experienced programmer; however, that assumption couldn’t be further from the the truth. Thanks to user friendly APIs found in modern browsers, creating simple speech recognition and speech synthesis programs using JavaScript is actually pretty straightforward. In the following tutorial, I’ll show you how to use JavaScript to access your browser’s speechRecognition and speechSynthesis APIs so that you too can create programs you control with your voice; ones that not only can hear you, but ones that can speak to you as well. Come, let’s have a listen…

Explore JavaScript Courses

To start, it’s typically a good idea to explore which browsers best support the technologies we’re going to be working with. Here’s MDN’s spec sheet for the speechRecognition API: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition#Browser_compatibility. As you can see, it’s pretty much Chrome leading the way; however, Firefox has some capability as well. The same holds true for the speechSynthesis API: https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis#Browser_compatibility. Do note that Microsoft’s Edge browser enjoys some speech synthesis capabilities. David Walsh wrote a nice article on setting up the speechSynthesis API for cross-browser functionality; but, for simplicity’s sake and for the remainder of this tutorial, I’m going to assume the use of Chrome. 

So what does the code look like? A simple example of speech recognition code (written in JavaScript) looks like this (NOTE: You’ll probably have to open a new tab/window by clicking the “Edit in JSFiddle” link AND allowing your the browser access to your computer’s microphone (it should prompt you to do so).):

Try it out. Give the browser access to your computers mic and then try saying a few different words or phrases. If all goes according to plan, you should see what you say being written to the body of the html. If you’re having trouble getting it to work, try:

1) Making sure your computer has a mic and that your browser has access to it.

2) Making sure you have open only 1 application/tab/window that’s using the microphone.

3) Making sure to use Chrome as your browser and loading up the JSFiddle example in a new tab/window.

Okay, so the above example simply writes what the computer heard to the body of the html. That’s pretty interesting, but now let’s do something a bit more fancy; let’s tell the computer to do something for us! In the following example, trying opening up the Fiddle and telling the browser to change the background color of the HTML. The way I programmed it, you’ll have to say this exact phrase:

“Change background color to…” and then say any of the many colors recognized by CSS (e.g., “red,” “blue,” “green,” “yellow,”etc.).

Pretty cool, right? And really not all that difficult to pull off!

Now let’s look at giving the program a voice. I’m going to use Chrome’s default voice (yes, it sounds pretty robotic); but once you get the hang of it, feel free to read up on how to get and use different voices. Here we go; let’s see what it’s got to say:

Hear that? This time, the program audibly confirms that it’s changing the color of the background! Fantastic.

To recap all of this…

Speech Recognition

– The browser’s speechRecognition object has start and stop methods you can use to start and stop listening for audio input.

– The speechRecoginition object can react to for onend and onresult events.

– To get a string/text of what the computer heard, you can pass the onresult event to a function and then reference the event.results[0][0].transcript property.

Speech Synthesis

– The speechSynthesis object has a speak method that you can use to utter new SpeechSynthesisUtterances.

– You can pass a string (or number) value to the SpeechSynthesisUtterance constructor to create words or phrases.

    – pass that whole thing to the speak method and you’ve got a talking computer!

And there you have it. In this tutorial I’ve shown that it’s relatively simple to employ speech recognition and speech synthesis technology in your browser through the use of JavaScript. With this new tool set, my hope is that you’ll explore the wide array of possibilities that now exist. In theory, you can program your browser to execute most if not all of its functions at the sound of your voice. And you can make the browser say pretty much anything you want it to say. As usual, the limit is your imagination; so get out there and say something interesting! Better yet, have your browser say it for you. 😉

Explore JavaScript Courses

Sev Leonard
Sev (sev@thedatascout.com) is a consultant, writer, and technical trainer with 20 years of industry experience. His areas of expertise include analytics and data science, software development, web, and business consulting. http://www.thedatascout.com

Comments