0% found this document useful (0 votes)
282 views21 pages

Live Transcribing Phone Calls Using Twilio Media Streams and Google Speech-to-Text

Uploaded by

luis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
282 views21 pages

Live Transcribing Phone Calls Using Twilio Media Streams and Google Speech-to-Text

Uploaded by

luis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

Level up your Twilio API skills in TwilioQuest, an Download


educational game for Mac, Windows, and Linux. Now

BLOG DOCS LOG IN SIGN UP TWILIO

Build the future of S TA R T B U I L D I N G F O R F R E E


communications.

B Y N AT H A N I E L O K E N WA ▪ 2 0 1 9 - 0 9 - 1 2

TWITTER FA C E B O O K LINKEDIN

Live Transcribing Phone Calls using


Twilio Media Streams and Google
Speech-to-Text

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 1/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

How to transcribe live phone calls using Twilio Media Strea…


Strea…

With Twilio Media Streams, you can now extend the capabilities of your Twilio-powered voice
application with real time access to the raw audio stream of phone calls. For example, we can
build tools that transcribe the speech from a phone call live into a browser window, run
sentiment analysis of the speech on a phone call or even use voice biometrics to identify
individuals.

This blog post will guide you step-by-step through transcribing speech from a phone call into
text, live in the browser using Twilio and Google Speech-to-Text with Node.js.

If you want to skip the step-by-step instructions, you can clone my Github Repository and
follow the ReadMe to get setup or if you prefer to watch Video, check out a video walkthrough
here.

Requirements
Before we can get started, you’ll need to make sure to have:

A Free Twilio Account

A Google Cloud Account

Installed ngrok

Installed the Twilio CLI

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 2/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

Setting up the Local Server


Twilio Media Streams use the WebSocket API to live stream the audio from the phone call to
your application. Let’s get started by setting up a server that can handle WebSocket
connections.

Open your terminal and create a new project folder and create an index.js file.

1 $ mkdir twilio-streams
2 $ cd twilio-streams
3 $ touch index.js

To handle HTTP requests we will use node’s built-in http module and Express. For
WebSocket connections we will be using ws, a lightweight WebSocket client for node.

In the terminal run these commands to install ws and Express :

1 $ npm install ws express

Open your index.js file and add the following code to set up your server.

1 const WebSocket = require("ws");


2 const express = require("express");
3 const app = express();
4 const server = require("http").createServer(app);
5 const wss = new WebSocket.Server({ server });
6
7 // Handle Web Socket Connection
8 wss.on("connection", function connection(ws) {
9 console.log("New Connection Initiated");
10 });
11
12 //Handle HTTP Request
13 app.get("/", (req, res) => res.send("Hello World"));
14
15 console.log("Listening at Port 8080");
16 server.listen(8080);

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 3/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

Save and run index.js with node index.js . Open your browser and navigate to
http://localhost:8080 . Your browser should show Hello World .

Now that we know HTTP requests are working, let’s test our WebSocket connection. Open your
browser’s console and run this command:

1 var connection = new WebSocket('ws://localhost:8080')

If you go back to the terminal you should see a log saying New Connection Initiated .

Setting up Phone Calls


Let’s set up our Twilio number to connect to our WebSocket server.

First we need to modify our server to handle the WebSocket messages that will be sent from
Twilio when our phone call starts streaming. There are four main message events we want to
listen for: connected`, `start`, `media` and `stop`.

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 4/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

Connected: When Twilio makes a successful WebSocket connection to a server

Start: When Twilio starts streaming Media Packets

Media: Encoded Media Packets (This is the Raw Audio)

Stop: When streaming ends the stop event is sent.

Modify your index.js file to log messages when each of these messages arrive at our server.

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 5/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

1 const WebSocket = require("ws");


2 const express = require("express");
3 const app = express();
4 const server = require("http").createServer(app);
5 const wss = new WebSocket.Server({ server });
6
7 // Handle Web Socket Connection
8 wss.on("connection", function connection(ws) {
9 console.log("New Connection Initiated");
10
11 ws.on("message", function incoming(message) {
12 const msg = JSON.parse(message);
13 switch (msg.event) {
14 case "connected":
15 console.log(`A new call has connected.`);
16 break;
17 case "start":
18 console.log(`Starting Media Stream ${msg.streamSid}`);
19 break;
20 case "media":
21 console.log(`Receiving Audio...`)
22 break;
23 case "stop":
24 console.log(`Call Has Ended`);
25 break;
26 }
27 });
28
29 });
30
31 //Handle HTTP Request
32 app.get("/", (req, res) => res.send("Hello World");
33
34 console.log("Listening at Port 8080");
35 server.listen(8080);

Now we need to set up or Twilio number to start streaming audio to our server. We can control
what happens when we call our Twilio number using TwiML. We’ll create a HTTP route that will
return TwiML` instructing Twilio to stream audio from the call to our server.

Add the following POST route to your index.js file.

1 const WebSocket = require("ws");


2 const express = require("express");
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 6/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
2 const express = require( express );
3 const app = express();
4 const server = require("http").createServer(app);
5 const wss = new WebSocket.Server({ server });
6
7 // Handle Web Socket Connection
8 wss.on("connection", function connection(ws) {
9 console.log("New Connection Initiated");
10
11 ws.on("message", function incoming(message) {
12 const msg = JSON.parse(message);
13 switch (msg.event) {
14 case "connected":
15 console.log(`A new call has connected.`);
16 break;
17 case "start":
18 console.log(`Starting Media Stream ${msg.streamSid}`);
19 break;
20 case "media":
21 console.log(`Receiving Audio...`)
22 break;
23 case "stop":
24 console.log(`Call Has Ended`);
25 break;
26 }
27 });
28
29 };
30
31 //Handle HTTP Request
32 app.get("/", (req, res) => res.send("Hello World");
33
34 app.post("/", (req, res) => {
35 res.set("Content-Type", "text/xml");
36
37 res.send(`
38 <Response>
39 <Start>
40 <Stream url="wss://${req.headers.host}/"/>
41 </Start>
42 <Say>I will stream the next 60 seconds of audio through your websocke
43 <Pause length="60" />
44 </Response>
45 `);
46 });
47
48 console.log("Listening at Port 8080");
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 7/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

49 server.listen(8080);

For Twilio to connect to your local server we need to expose the port to the internet. The
easiest way to do that is using the Twilio CLI. Open a new Terminal to continue.

First let’s buy a phone number. In your terminal run the following command. I have used the
GB country code to buy a mobile number, but feel free to change this for a number local to
you. Hold on to the number’s Friendly Name once the response is returned.

1 $ twilio phone-numbers:buy:mobile --country-code GB

Finally lets update the phone number to point to our localhost url. We need to use ngrok to
create a tunnel to our localhost port and expose it to the internet. In a new terminal window
run the following command:

1 $ ngrok http 8080

You should get an output with a forwarding address like this. Copy the URL onto the clipboard.
Make sure you record the https url.

1 Forwarding https://xxxxxxxx.ngrok.io -> http://localhost:80

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 8/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

Back in the terminal window where we bought our twilio number lets update our phone
number to make a post http request to our server.

Run the following command:

1 $ twilio phone-numbers:update $TWILIO_NUMBER --voice-url https://xxxxxxxx.ng

Head over to a new terminal window and run your index.js file. Now call your Twilio phone
number and you should hear the following prompt, “I will stream the next 60 seconds of audio
through your websocket”. The terminal should be logging Receiving Audio…

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 9/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

NOTE: Make sure that you have at least 2 terminals running if your log doesn’t match the
expected response. One running your server (index.js) and one running ngrok.

Transcribing Speech into Text


At this point we have audio from our call streaming to our server. Today, we’ll be using Google
Cloud Platform’s Speech-to-Text API to transcribe the voice data from the phone call.

There is some setup that we need to do before we get started.

1. Install and initialize the Cloud SDK

2. Setup a new GCP Project

Create or select a project.

Enable the Google Speech-to-Text API for that project.

Create a service account.

Download a private key as JSON.

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 10/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

1. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of


the JSON file that contains your service account key. This variable only applies to
your current shell session, so if you open a new session, set the variable again.

Run the following command to install the Google Cloud Speech-to-Text client libraries.

1 $ npm install --save @google-cloud/speech

Now let’s use it in our code.

First we’ll include the Speech Client from the Google Speech-to-Text library then we will
configure a Transcription Request . In order to get live transcription results, make sure you
set interimResults to true. I have also set the language code to en-GB , feel free to set
yours to a different language region.

1 const WebSocket = require("ws");


2 const express = require("express");
3 const app = express();
4 const server = require("http").createServer(app);
5 const wss = new WebSocket.Server({ server });
6
7 //Include Google Speech to Text
8 const speech = require("@google-cloud/speech");
9 const client = new speech.SpeechClient();
10
11 //Configure Transcription Request
12 const request = {
13 config: {
14 encoding: "MULAW",
15 sampleRateHertz: 8000,
16 languageCode: "en-GB"
17 },
18 interimResults: true
19 };
20
21 // Handle Web Socket Connection
22 wss.on("connection", function connection(ws) {
23 console.log("New Connection Initiated");
24
25 ws.on("message", function incoming(message) {
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 11/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
( g , g( g ) {
26 const msg = JSON.parse(message);
27 switch (msg.event) {
28 case "connected":
29 console.log(`A new call has connected.`);
30 break;
31 case "start":
32 console.log(`Starting Media Stream ${msg.streamSid}`);
33 break;
34 case "media":
35 console.log(`Receiving Audio...`)
36 break;
37 case "stop":
38 console.log(`Call Has Ended`);
39 break;
40 }
41 });
42
43 });
44
45 //Handle HTTP Request
46 app.get("/", (req, res) => res.send("Hello World");
47
48 app.post("/", (req, res) => {
49 res.set("Content-Type", "text/xml");
50
51 res.send(`
52 <Response>
53 <Start>
54 <Stream url="wss://${req.headers.host}/"/>
55 </Start>
56 <Say>I will stream the next 60 seconds of audio through your websocket</
57 <Pause length="60" />
58 </Response>
59 `);
60 });
61
62 console.log("Listening at Port 8080");
63 server.listen(8080);

Now let’s create a new stream to send audio from our server to the Google API. We will call it
the recognizeStream and we will write our audio packets from our phone call to this stream.
When the call has ended we will call .destroy() to end the stream.

Edit your code to include these changes.


https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 12/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

1 const WebSocket = require("ws");


2 const express = require("express");
3 const app = express();
4 const server = require("http").createServer(app);
5 const wss = new WebSocket.Server({ server });
6
7 //Include Google Speech to Text
8 const speech = require("@google-cloud/speech");
9 const client = new speech.SpeechClient();
10
11 //Configure Transcription Request
12 const request = {
13 config: {
14 encoding: "MULAW",
15 sampleRateHertz: 8000,
16 languageCode: "en-GB"
17 },
18 interimResults: true
19 };
20
21 // Handle Web Socket Connection
22 wss.on("connection", function connection(ws) {
23 console.log("New Connection Initiated");
24
25 let recognizeStream = null;
26
27 ws.on("message", function incoming(message) {
28 const msg = JSON.parse(message);
29 switch (msg.event) {
30 case "connected":
31 console.log(`A new call has connected.`);
32
33 // Create Stream to the Google Speech to Text API
34 recognizeStream = client
35 .streamingRecognize(request)
36 .on("error", console.error)
37 .on("data", data => {
38 console.log(data.results[0].alternatives[0].transcript);
39 });
40 break;
41 case "start":
42 console.log(`Starting Media Stream ${msg.streamSid}`);
43 break;
44 case "media":
45 // Write Media Packets to the recognize stream
46 recognizeStream.write(msg.media.payload);
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 13/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

47 break;
48 case "stop":
49 console.log(`Call Has Ended`);
50 recognizeStream.destroy();
51 break;
52 }
53 });
54 });
55
56 //Handle HTTP Request
57 app.get("/", (req, res) => res.send("Hello World");
58
59 app.post("/", (req, res) => {
60 res.set("Content-Type", "text/xml");
61
62 res.send(`
63 <Response>
64 <Start>
65 <Stream url="wss://${req.headers.host}/"/>
66 </Start>
67 <Say>I will stream the next 60 seconds of audio through your websocket</
68 <Pause length="60" />
69 </Response>
70 `);
71 });
72
73 console.log("Listening at Port 8080");
74 server.listen(8080);

Restart your server, call your Twilio phone number and start talking down the phone. You
should see interim transcription results begin to appear in your terminal.

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 14/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

Sending Live Transcription to the Browser


One of the benefits of using WebSockets is that we can broadcast messages to other clients,
including browsers.

Let’s modify our code to broadcast our interim transcription results to all connected clients.
We’ll also modify the GET route. Rather than sending ‘Hello World’ let’s send a HTML file.
We will need the path package also, so don’t forget to require it.

Modify your index.js file like below.

1 const WebSocket = require("ws");


2 const express = require("express");
3 const app = express();
4 const server = require("http").createServer(app);
5 const wss = new WebSocket.Server({ server });
6 const path = require("path");
7
8 //Include Google Speech to Text
9 const speech = require("@google-cloud/speech");
10 const client = new speech.SpeechClient();
11
12 //Configure Transcription Request
13 const request = {
14 config: {
15 encoding: "MULAW",
16 sampleRateHertz: 8000,
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 15/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
17 languageCode: "en-GB"
18 },
19 interimResults: true
20 };
21
22 // Handle Web Socket Connection
23 wss.on("connection", function connection(ws) {
24 console.log("New Connection Initiated");
25
26 let recognizeStream = null;
27
28 ws.on("message", function incoming(message) {
29 const msg = JSON.parse(message);
30 switch (msg.event) {
31 case "connected":
32 console.log(`A new call has connected.`);
33 //Create Stream to the Google Speech to Text API
34 recognizeStream = client
35 .streamingRecognize(request)
36 .on("error", console.error)
37 .on("data", data => {
38 console.log(data.results[0].alternatives[0].transcript);
39 wss.clients.forEach( client => {
40 if (client.readyState === WebSocket.OPEN) {
41 client.send(
42 JSON.stringify({
43 event: "interim-transcription",
44 text: data.results[0].alternatives[0].transcript
45 })
46 );
47 }
48 });
49
50 });
51
52 break;
53 case "start":
54 console.log(`Starting Media Stream ${msg.streamSid}`);
55 break;
56 case "media":
57 // Write Media Packets to the recognize stream
58 recognizeStream.write(msg.media.payload);
59 break;
60 case "stop":
61 console.log(`Call Has Ended`);
62 recognizeStream.destroy();
63 break;
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 16/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

64 }
65 });
66
67 });
68
69 //Handle HTTP Request
70 app.get("/", (req, res) => res.sendFile(path.join(__dirname, "/index.html")))
71
72 app.post("/", (req, res) => {
73 res.set("Content-Type", "text/xml");
74
75 res.send(`
76 <Response>
77 <Start>
78 <Stream url="wss://${req.headers.host}/"/>
79 </Start>
80 <Say>I will stream the next 60 seconds of audio through your websocket</
81 <Pause length="60" />
82 </Response>
83 `);
84 });
85
86 console.log("Listening at Port 8080");
87 server.listen(8080);

Let’s setup a web page to handle the interim transcriptions and display them in the browser.

Create a new file, index.html and include the following:

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 17/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

1 <!DOCTYPE html>
2 <html>
3 <head>
4 <title>Live Transcription with Twilio Media Streams</title>
5 </head>
6 <body>
7 <h1>Live Transcription with Twilio Media Streams</h1>
8 <h3>
9 Call your Twilio Number, start talking and watch your words magically
10 appear.
11 </h3>
12 <p id="transcription-container"></p>
13 <script>
14 document.addEventListener("DOMContentLoaded", event => {
15 webSocket = new WebSocket("ws://localhost:8080");
16 webSocket.onmessage = function(msg) {
17 const data = JSON.parse(msg.data);
18 if (data.event === "interim-transcription") {
19 document.getElementById("transcription-container").innerHTML =
20 data.text;
21 }
22 };
23 });
24 </script>
25 </body>
26 </html>

Restart your server, load localhost:8080 in your browser then give your Twilio phone
number a call and watch your words begin to appear in your browser.

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 18/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

Wrapping up
Congratulations! You can now harness the power of Twilio media streams to extend your voice
applications. Now that you have live transcription, try translating the text with Google’s
Translate API to create live speech translation or run sentiment analysis on the audio stream to
work out the emotions behind the speech.

If you have any questions, feedback or just want to show me what you build, feel free to reach
out to me:

Twitter: @chatterboxcoder

GitHub: nokenwa

Email: [email protected]

R AT E T H I S P O S T AUTHORS Nathaniel Okenwa

Search

Build the future of communications. Start today with Twilio's APIs and services.

S TA R T B U I L D I N G F O R F R E E

P O S T S B Y S TA C K

J AVA PHP RUBY PYTHON .NET SWIFT ARDUINO J AVA S C R I P T

POSTS BY PRODUCT

EMAIL SMS AUTHY VOICE MMS VIDEO I OT TA S K R O U T E R FLEX SIP TWILIO CLIENT

P R O G R A M M A B L E C H AT STUDIO

C AT E G O R I E S

Code, Tutorials and Hacks

C stomer Highlights
https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 19/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text
Customer Highlights

Developers Drawing The Owl

News

Stories From The Road

The Owl’s Nest: Inside Twilio

TWITTER FA C E B O O K

Developer stories
to your inbox.
Subscribe to the Developer Digest, a monthly dose of all things code.

Enter your email…

You may unsubscribe at any time using the unsubscribe link in the digest email. See our privacy policy for more information.

NEW!

Tutorials
Sample applications that cover common use cases in a variety of languages. Download, test drive,
and tweak them yourself.

Get started

S I G N U P A N D S TA R T B U I L D I N G

Not ready yet? Talk to an expert.

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 20/21
4/21/2021 Live Transcribing Phone Calls using Twilio Media Streams and Google Speech-to-Text

ABOUT
LEGAL
COPYRIGHT © 2021 TWILIO INC.
A L L R I G H TS R E S E RV E D.
P R O T E C T E D B Y R E C A P T C H A – P R I VA C Y – T E R M S

https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text 21/21

You might also like