Live Transcribing Phone Calls Using Twilio Media Streams and Google Speech-to-Text
Live Transcribing Phone Calls Using Twilio Media Streams and Google Speech-to-Text
B Y N AT H A N I E L O K E N WA ▪ 2 0 1 9 - 0 9 - 1 2
TWITTER FA C E B O O K LINKEDIN
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 1/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
With Twilio Media Streams, you can now extend the capabilities of your Twilio-powered voice
application with real time access to the raw audio stream of phone calls. For example, we can
build tools that transcribe the speech from a phone call live into a browser window, run
sentiment analysis of the speech on a phone call or even use voice biometrics to identify
individuals.
This blog post will guide you step-by-step through transcribing speech from a phone call into
text, live in the browser using Twilio and Google Speech-to-Text with Node.js.
If you want to skip the step-by-step instructions, you can clone my Github Repository and
follow the ReadMe to get setup or if you prefer to watch Video, check out a video walkthrough
here.
Requirements
Before we can get started, you’ll need to make sure to have:
Installed ngrok
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 2/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Open your terminal and create a new project folder and create an index.js file.
1 $ mkdir twilio-streams
2 $ cd twilio-streams
3 $ touch index.js
To handle HTTP requests we will use node’s built-in http module and Express. For
WebSocket connections we will be using ws, a lightweight WebSocket client for node.
Open your index.js file and add the following code to set up your server.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 3/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Save and run index.js with node index.js . Open your browser and navigate to
http://localhost:8080 . Your browser should show Hello World .
Now that we know HTTP requests are working, let’s test our WebSocket connection. Open your
browser’s console and run this command:
If you go back to the terminal you should see a log saying New Connection Initiated .
First we need to modify our server to handle the WebSocket messages that will be sent from
Twilio when our phone call starts streaming. There are four main message events we want to
listen for: connected`, `start`, `media` and `stop`.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 4/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Modify your index.js file to log messages when each of these messages arrive at our server.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 5/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Now we need to set up or Twilio number to start streaming audio to our server. We can control
what happens when we call our Twilio number using TwiML. We’ll create a HTTP route that will
return TwiML` instructing Twilio to stream audio from the call to our server.
49 server.listen(8080);
For Twilio to connect to your local server we need to expose the port to the internet. The
easiest way to do that is using the Twilio CLI. Open a new Terminal to continue.
First let’s buy a phone number. In your terminal run the following command. I have used the
GB country code to buy a mobile number, but feel free to change this for a number local to
you. Hold on to the number’s Friendly Name once the response is returned.
Finally lets update the phone number to point to our localhost url. We need to use ngrok to
create a tunnel to our localhost port and expose it to the internet. In a new terminal window
run the following command:
You should get an output with a forwarding address like this. Copy the URL onto the clipboard.
Make sure you record the https url.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 8/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Back in the terminal window where we bought our twilio number lets update our phone
number to make a post http request to our server.
Head over to a new terminal window and run your index.js file. Now call your Twilio phone
number and you should hear the following prompt, “I will stream the next 60 seconds of audio
through your websocket”. The terminal should be logging Receiving Audio…
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 9/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
NOTE: Make sure that you have at least 2 terminals running if your log doesn’t match the
expected response. One running your server (index.js) and one running ngrok.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 10/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Run the following command to install the Google Cloud Speech-to-Text client libraries.
First we’ll include the Speech Client from the Google Speech-to-Text library then we will
configure a Transcription Request . In order to get live transcription results, make sure you
set interimResults to true. I have also set the language code to en-GB , feel free to set
yours to a different language region.
Now let’s create a new stream to send audio from our server to the Google API. We will call it
the recognizeStream and we will write our audio packets from our phone call to this stream.
When the call has ended we will call .destroy() to end the stream.
47 break;
48 case "stop":
49 console.log(`Call Has Ended`);
50 recognizeStream.destroy();
51 break;
52 }
53 });
54 });
55
56 //Handle HTTP Request
57 app.get("/", (req, res) => res.send("Hello World");
58
59 app.post("/", (req, res) => {
60 res.set("Content-Type", "text/xml");
61
62 res.send(`
63 <Response>
64 <Start>
65 <Stream url="wss://${req.headers.host}/"/>
66 </Start>
67 <Say>I will stream the next 60 seconds of audio through your websocket</
68 <Pause length="60" />
69 </Response>
70 `);
71 });
72
73 console.log("Listening at Port 8080");
74 server.listen(8080);
Restart your server, call your Twilio phone number and start talking down the phone. You
should see interim transcription results begin to appear in your terminal.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 14/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Let’s modify our code to broadcast our interim transcription results to all connected clients.
We’ll also modify the GET route. Rather than sending ‘Hello World’ let’s send a HTML file.
We will need the path package also, so don’t forget to require it.
64 }
65 });
66
67 });
68
69 //Handle HTTP Request
70 app.get("/", (req, res) => res.sendFile(path.join(__dirname, "/index.html")))
71
72 app.post("/", (req, res) => {
73 res.set("Content-Type", "text/xml");
74
75 res.send(`
76 <Response>
77 <Start>
78 <Stream url="wss://${req.headers.host}/"/>
79 </Start>
80 <Say>I will stream the next 60 seconds of audio through your websocket</
81 <Pause length="60" />
82 </Response>
83 `);
84 });
85
86 console.log("Listening at Port 8080");
87 server.listen(8080);
Let’s setup a web page to handle the interim transcriptions and display them in the browser.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 17/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
1 <!DOCTYPE html>
2 <html>
3 <head>
4 <title>Live Transcription with Twilio Media Streams</title>
5 </head>
6 <body>
7 <h1>Live Transcription with Twilio Media Streams</h1>
8 <h3>
9 Call your Twilio Number, start talking and watch your words magically
10 appear.
11 </h3>
12 <p id="transcription-container"></p>
13 <script>
14 document.addEventListener("DOMContentLoaded", event => {
15 webSocket = new WebSocket("ws://localhost:8080");
16 webSocket.onmessage = function(msg) {
17 const data = JSON.parse(msg.data);
18 if (data.event === "interim-transcription") {
19 document.getElementById("transcription-container").innerHTML =
20 data.text;
21 }
22 };
23 });
24 </script>
25 </body>
26 </html>
Restart your server, load localhost:8080 in your browser then give your Twilio phone
number a call and watch your words begin to appear in your browser.
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 18/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Wrapping up
Congratulations! You can now harness the power of Twilio media streams to extend your voice
applications. Now that you have live transcription, try translating the text with Google’s
Translate API to create live speech translation or run sentiment analysis on the audio stream to
work out the emotions behind the speech.
If you have any questions, feedback or just want to show me what you build, feel free to reach
out to me:
Twitter: @chatterboxcoder
GitHub: nokenwa
Email: [email protected]
Search
Build the future of communications. Start today with Twilio's APIs and services.
S TA R T B U I L D I N G F O R F R E E
P O S T S B Y S TA C K
POSTS BY PRODUCT
EMAIL SMS AUTHY VOICE MMS VIDEO I OT TA S K R O U T E R FLEX SIP TWILIO CLIENT
P R O G R A M M A B L E C H AT STUDIO
C AT E G O R I E S
C stomer Highlights
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 19/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Customer Highlights
News
TWITTER FA C E B O O K
Developer stories
to your inbox.
Subscribe to the Developer Digest, a monthly dose of all things code.
You may unsubscribe at any time using the unsubscribe link in the digest email. See our privacy policy for more information.
NEW!
Tutorials
Sample applications that cover common use cases in a variety of languages. Download, test drive,
and tweak them yourself.
Get started
S I G N U P A N D S TA R T B U I L D I N G
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 20/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
ABOUT
LEGAL
COPYRIGHT © 2021 TWILIO INC.
A L L R I G H TS R E S E RV E D.
P R O T E C T E D B Y R E C A P T C H A – P R I VA C Y – T E R M S
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 21/21