Generate Free, Automatic Live Stream Captions With The Web Speech API

Increasing numbers of live streamer are using live captions for their streams. It’s great and I love the idea of using them to make my stream more accessible. A little research shows a few services that offer the feature. WebCaptioner and PixelChat, for example. I was looking at these until I saw TrostCodes messing around with the Web Speech API. He was making a voice powered image search. It’s slick. Better still, it got me turned onto the API.

The docs make it seem like you have to input all the words you want the API to recognize. That seemed super weird and turns out not to be the case. After cutting away the excess and adding some code to keep the captioning from stopping, we get this:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Web Speech API Captions</title>
</head>
<body>
    <div id="transcript"></div>

    <script>
        let ts = document.getElementById('transcript')

        var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;
        var recognition = new SpeechRecognition();

        recognition.continuous = true;
        recognition.lang = 'en-US';
        recognition.interimResults = true;
        recognition.maxAlternatives = 1;

        document.addEventListener("DOMContentLoaded", recognition.start());
        
        // Restart if the service stops
        recognition.addEventListener("end", function () { 
            console.log("restarting...");
            recognition.start();
        })

        recognition.onresult = function(event) {
            let isFinal = true 

            for (let i = 0; i < event.results.length; i = i + 1) {
                if (event.results[i].isFinal === false) {
                    isFinal = false
                    ts.innerHTML = event.results[i][0].transcript;
                    break;
                }
            }

            if (isFinal === true) {
                last_result_index = event.results.length - 1;
                ts.innerHTML = event.results[last_result_index][0].transcript;
            }
        }
    </script>
</body>
</html>

As of May, 2021 the API only works with Google Chrome. It also needs to be served from a domain to work effectively. I do this by using the python3’s built-in web server. To use it, run this in the directory with the html file:

python -m http.server

If you’re in python 2, you can use:

python -m SimpleHTTPServer

If you named the file index.html it’ll be available at:

http://localhost:8000/

(Of course, you can serve it with anything that acts as a web host. I just use those because I have python installed and it’s a simple command.)

When you first load the page, you’ll be asked to allow it to access the mic. Click “Allow”, start talking, and you’ll start seeing your speech. It’s kind of amazing.

The last piece of the puzzle is getting the text to show up in OBS. To do this, create a new “Window Capture” source and point it to the Chrome window with your page in it. Crop in to just the text area and you’ll be good to go.

You can’t include the page directly as a Browser Source. It won’t work because there’s no way to allow access to the mic and I don’t think the OBS browser is Chrome based. Instead, open the page directly in Google Chrome then add it to OBS as a Window Capture Source, crop in on the area with the text and you’ll be good to go.

2021-05-9T10:50:49-0400

Random Related Links