Working With The Web Audio API Preview
Working With The Web Audio API Preview
Working with the Web Audio API is the definitive and instructive guide to
understanding and using the Web Audio API.
The Web Audio API provides a powerful and versatile system for con-
trolling audio on the web. It allows developers to generate sounds, select
sources, add effects, create visualizations and render audio scenes in an
immersive environment.
This book covers all essential features, with easy to implement code
examples for every aspect. All the theory behind it is explained, so that
one can understand the design choices as well as the core audio processing
concepts. Advanced concepts are also covered, so that the reader will gain
the skills to build complex audio applications running in the browser.
Aimed at a wide audience of potential students, researchers and coders,
this is a comprehensive guide to the functionality of this industry-standard
tool for creating audio applications for the web.
Joshua Reiss is a Professor with the Centre for Digital Music at Queen Mary
University of London. He has published more than 200 scientific papers,
and co-authored the book Intelligent Music Production and textbook Audio
Effects: Theory, Implementation and Application. At the time of writing, he is
the President of the Audio Engineering Society (AES). He co-founded the
highly successful spin- out company LandR, and recently co- founded
the start-ups Tonz and Nemisindo. His primary focus of research is on
state-of-the-art signal processing techniques for sound design and audio
production. He maintains a popular blog, YouTube channel and Twitter
feed for scientific education and research dissemination.
ii
Joshua Reiss
iv
Contents
2 Oscillators 11
9 OfflineAudioContext 95
vi
vi Contents
Interlude – Audio effects 101
10 Delay 103
11 Filtering 114
12 Waveshaper 130
14 Reverberation 158
Interlude – M
ultichannel audio 173
Appendix – T
he Web Audio API interfaces 242
References 245
Index 246
vi
Figures
viii Figures
7.1 The output of Code example 7.1, showing either the
time domain waveform (top) from time domain data or
magnitude spectrum (bottom) from frequency domain data. 80
10.1 Magnitude response of a comb filter using 1 millisecond
delay. 106
10.2 Audio graph for Code example 10.3. 110
10.3 Audio graph for a simple implementation of the
Karplus-Strong algorithm. 112
11.1 Magnitude response for various ideal filters. 117
11.2 Magnitude responses for the Web Audio API’s lowpass
filter, PureData’s lowpass filter and the standard
first-order Butterworth lowpass filter. 122
11.3 Magnitude responses for the Web Audio API’s bandpass
filter, PureData’s bandpass filter and the standard
Butterworth bandpass filter design. 123
11.4 Phase responses for the Web Audio API’s allpass filter and
the standard Butterworth allpass filter design. 124
12.1 The characteristic input/output curve for a
quadratic distortion. 131
12.2 Comparison of hard and soft clipping. 132
12.3 Soft clipping of a sine wave with four different input gains. 134
12.4 Half-wave and full-wave rectification. 135
12.5 The spectrum of a single sine wave before and after
asymmetric distortion has been applied. 136
12.6 The output spectrum after distortion has been applied
for sinusoidal input, comparing two values of distortion
level for soft clipping (left), and comparing soft and hard
clipping (right). 136
12.7 The spectrum of two sinusoids before and after distortion
has been applied. 136
12.8 Output spectrum with aliasing due to distortion, and the
output spectrum after oversampling, lowpass filtering and
downsampling. 140
12.9 The effect of soft clipping on a decaying sinusoid. 141
12.10 How the waveshaping curve is used. For an original
curve over the interval −1 to +1, equally spaced values
are mapped to an array with indices 0 to N−1, and
waveshaping outputs are interpolated between the
array values. 144
12.11 The input/output curve for a bit crusher with bit
depth =3 (eight levels). 146
13.1 Static compression characteristic with make-up gain and
hard or soft knee. When the input signal level is above a
threshold, a further increase in the input level will produce
a smaller change in the output level. 150
ix
Figures ix
13.2 Graph of internal AudioNodes used as part of the
DynamicsCompressorNode processing algorithm. It
implements pre-delay and application of gain reduction. 152
14.1 Reverb is the result of sound waves traveling many
different paths from a source to a listener. 159
14.2 Impulse response of a room. 159
14.3 One block convolution, as implemented in block-based
convolutional reverb. Each block is convolved with the
impulse response h of length N. 162
14.4 Partitioned convolution for real-time artificial reverberation. 163
14.5 Supported input and output channel count possibilities
for mono inputs (left) and stereo inputs (right) with one,
two or four channels in the buffer. 165
15.1 The ChannelMergerNode (left) and ChannelSplitterNode
(right). 180
15.2 Block diagram of the flipper when set to flip the left and
right channels. 182
15.3 Block diagram of ping-pong delay. 183
16.1 Listener and loudspeaker configuration for placing a
sound source using level difference. 186
16.2 Constant-power panning for two channels. On the left
is the gain for each channel, and on the right is the total
power and total gain. 187
16.3 Perceived azimuth angle as a function of level difference. 188
16.4 How panning of a stereo source affects its panning
position. Final panning position versus the panning
applied is plotted for five stereo sources with original
positions between left (p =−1) and right speakers (p = +1) 190
16.5 The effect of stereo enhancement on the panning position
of a source. Width less than 0 moves a source towards the
center, width greater than 0 moves it away from the center.
Panning positions above 1 or below −1 indicate a change
of sign between left and right channels. 193
17.1 Right hand coordinate system. When holding out your
right hand in front of you as shown, the thumb points in
the X direction, the index finger in the Y direction and the
middle finger in the Z direction. 196
17.2 Listener and source in space. 197
17.3 Cone angles for a source in relation to the source
orientation and the listener’s position and orientation. 198
17.4 Calculation of azimuth angle α from source position S,
listener position P, and listener forward direction
F. V is the normalized S-L vector, and α is calculated
by projecting V onto the forward direction, e.g. adjacent
(F·V) over hypotenuse (1). 201
x
x Figures
17.5 Diagram showing the process of panning a
source using HRTF. 203
17.6 The spatialization procedure used in the PannerNode. 205
18.1 Interaction between an audio worklet node and audio
worklet processor, along with the syntax for how these
components can be created. 212
19.1 Rewriting a buffer to store previous inputs. 237
xi
Code examples
.1
1 Hello World Application, generating sound. 5
1.2 Hello World, version 2. 6
1.3 Hello World, version 3. 6
1.4 UserInteraction.html and UserInteraction.js. 8
2.1 Oscillator.html and Oscillator.js. 15
2.2 CustomSquareWave.html, a square wave generator using
the PeriodicWave. 17
.3
2 PulseWave.html, a pulse wave generator. 19
2.4 Detune.html and Detune.js. 20
3.1 BufferedNoise.html. Use of an AudioBufferSourceNode to
generate noise. 25
3.2 Playback.html and Playback.js, allowing the user to interact
with AudioBufferSourceNode parameters for a chirp signal
as the buffer source. 31
3.3 BufferedSquareWave.html, showing how a square wave can
be reconstructed using wave table synthesis, similar to a
Fourier series expansion. 33
3.4 Pause.html, showing how playback of an audio buffer can
be paused and resumed by changing the playback rate. 35
3.5 Backwards.html, showing how to simulate playing a buffer
backwards by playing the reverse of that buffer forwards. 37
4.1 DCoffset.html, showing use of the ConstantSourceNode to
add a value to a signal. 39
4.2 ConstantSourceSquareWave.html, which uses a
ConstantSourceNode to change the frequency of a square
wave constructed by summing weighted sinusoids. 41
4.3 NoConstantSourceSquareWave.html, which has the same
functionality as Code example 4.2, but without use of a
ConstantSourceNode. 41
.4
4 Grouping.html and Grouping.js. 43
5.1 Beep.html and Beep.js, which demonstrate audio parameter
automation. 55
xi
Resources
Sound files used in the source code were all public domain or Creative
Commons licensed.
Preface
The Web Audio API is the industry- standard tool for creating audio
applications for the web. It provides a powerful and versatile system for
controlling audio on the Web, allowing developers to generate sounds,
select sources, add effects, create visualizations and render audio scenes in
an immersive environment. The Web Audio API is gaining importance and
becoming an essential tool both for many of those whose work focuses on
audio, and those whose work focuses on web programming.
Though small guides and formal specifications exist for the Web
Audio API, there is not yet a detailed book on it, aimed at a wide audi-
ence of potential students, researchers and coders. Also, quite a lot of
the guides are outdated. For instance, many refer to the deprecated
ScriptProcessorNode, and make no mention of the AudioWorkletNode,
which vastly extends the Web Audio API’s functionality.
This book provides a definitive and instructive guide to working with
the Web Audio API. It covers all essential features, with easy-to-implement
code examples for every aspect. All the theory behind it is explained, so that
one can understand the design choices as well as the core audio processing
concepts. Advanced concepts are also covered, so that the reader will gain
the skills to build complex audio applications running in the browser.
Structure
The book is structured as follows. The book is divided into seven sections,
with six short interludes separating the sections, and most sections contain
several chapters. The organization is done in such a way that the book can
be read sequentially. With very few exceptions, features of the Web Audio
API are all introduced and explained in detail in their own chapter before
they are used in a code example in another chapter.
The first section is a single chapter. It gives an overview of the Web
Audio API, why it exists, what it does and how it is structured. It has source
code for a ‘Hello World’ application, the simplest program that does some-
thing using the Web Audio API, and then it extends that to showcase a few
more core features.
xvii
xviii Preface
The second section concerns how to generate sounds with scheduled
sources. There is a chapter for each scheduled source: oscillators, audio
buffer sources, and constant source nodes.
The third section focuses on audio parameters. It contains two chapters:
one on scheduling and setting these parameters, and then one on connecting
to audio parameters and performing modulation.
Then there is a fourth section on source nodes and destination nodes,
beyond the scheduled sources and the default destination. It has chapters
on analysis and visualization of audio streams, on loading and recording
audio, and on performing offline audio processing.
At this point, the reader now has knowledge of all the main ways in
which audio graphs are constructed and used in the Web Audio API. The
remaining sections focus on performing more specialized functions with
nodes to do common audio processing tasks or to enable arbitrary audio
generation, manipulation and processing.
The fifth section focuses on audio effects, with chapters on delay, filtering,
waveshaping, dynamic range compression and reverberation. Each chapter
introduces background on the effect and details of the associated audio
node, with the exception of the filtering chapter, for which there are two
relevant nodes, the BiquadFilterNode and IIRFilterNode.
A sixth section deals with spatial audio, and consists of three chapters.
The first looks at how multichannel audio is handled in the Web Audio
API, and introduces audio nodes for splitting a multichannel audio stream
and for merging several audio streams into a multichannel audio stream.
Two further chapters in this section address stereo panning and spatial
rendering.
The final section unleashes the full power of the Web Audio API with
audio worklets. The first chapter in this section explains audio worklets
in detail and introduces all of their features with source code examples.
The final chapter in the book revisits many source code examples from
previous chapters, and shows how alternative (and in some ways, better)
implementations can be achieved with the use of audio worklets.
Chapters and sections may be read out of order. For instance, one
may choose to delve into audio effects and multichannel processing,
Chapter 10 to Chapter 17, before exploring the details of audio parameters,
destinations and source nodes, Chapter 5 to Chapter 9. In which case, just a
basic understanding of some nodes and connections from earlier chapters
is necessary to fully understand the examples. Or one may skip Chapter 9
entirely without issue, since the OfflineAudioContext is not used in other
chapters.
Only a very small amount of the full Web Audio API specification is
not covered in this book. This includes some aspects of measuring or
controlling latency, aspects that are not included in the Chrome browser
implementation, such as the MediaStreamTrackAudioSourceNode
(used only by Firefox), and discussion of deprecated features, such as the
ScriptProcessorNode.
xi
Preface xix
Volume.oninput =() =
> VolumeNode.gain.value =Volume.value
rather than
However, variables are usually (but not always) declared in the code
examples. We also make no use of CSS files; our aim is to present working
examples of Web Audio API concepts, but not complete and pretty
applications.
x
newgenprepdf
Acknowledgments
The author is a member of the Centre for Digital Music at Queen Mary
University of London. This visionary research group has promoted adven-
turous research since its inception, and he is grateful for the support and
inspiration that they provide.
Much of the audio- related research that underlies techniques and
algorithms used in the Web Audio API was first published in conventions,
conferences or journal articles from the Audio Engineering Society (AES).
The author is greatly indebted to the AES, which has been promoting
advances in the science and practice of audio engineering since 1948.
The author has worked with Web Audio since 2017. Much of that
work has been in the field of procedural audio, which is essentially real-
time, interactive sound synthesis. It led to the formation of the company
Nemisindo, which provides, among other things, a large online proced-
ural sound effect generation system based on the Web Audio API. Many
great researchers have worked with the author, either on projects leading to
Nemisindo or as part of the Nemisindo team, including Thomas Vassallo,
Adan Benito, Parham Bahadoran, Jake Lee, Rod Selfridge, Hazar Tez,
Jack Walters, Selim Sheta and Clifford Manasseh.
There is also an amazing community of Web Audio developers, whom
the author knows only through their contributions and discussions online.
Without their work, this book (and the Web Audio API itself) would have
been far weaker and less useful. Many of the examples and insights in the
book are based on their work. The best explanations often lie in their ori-
ginal contributions, whereas any errors or omissions are due to the author.
Finally, the author dedicates this book to his family: his wife Sabrina,
daughters Eliza and Laura, and parents Chris and Judith.
1
1
Introducing the Web Audio API
This chapter introduces the Web Audio API. It explains the motivations
behind it, and compares it to other APIs, packages and environments
for audio programming. It gives an overview of key concepts, such as
the audio graph and how connections are made. The AudioContext is
introduced, as well as a few essential nodes and methods that are explored
in more detail in later chapters. A ‘hello world’ application is presented
as a code example, showing perhaps the simplest use of the Web Audio
API to produce sound. We then extend this application to show alterna-
tive approaches to its implementation, coding practices, and how sound is
manipulated in an audio graph.
DOI: 10.4324/9781003221937-1
2
AudioContext
Source Destination
1. AudioBufferSourceNode
2. MediaElementAudioSourceNode
3. MediaStreamAudioSourceNode
4. ConstantSourceNode
5. OscillatorNode
6. BiquadFilterNode
7. ChannelMergerNode
8. ChannelSplitterNode
9. ConvolverNode
10. DelayNode
11. DynamicsCompressorNode
12. GainNode
13. PannerNode
14. StereoPannerNode
15. WaveShaperNode
16. IIRFilterNode
17. AnalyserNode
18. MediaStreamAudioDestinationNode
19. AudioDestinationNode
The first five are all source nodes, defining some audio content and
where it comes from. The last three are all destinations, giving some
output. Everything else is an intermediate node, which processes the audio,
and has inputs and outputs. We will be talking about all of these nodes,
including providing examples, in later sections. We will also introduce the
AudioWorkletNode, which provides the means to design your own audio
node with its own functionality.
To give you a sense of how these nodes might be used, another
graph is shown in Figure 1.2. The idea of this graph is to shape some
noise and add some effects, perhaps to create a boomy explosion. An
BiquadNode
Lowshelf
BiquadNode
audioWorkletNode Peaking
gainNode waveshaperNode convolverNode destination
whiteNoise
BiquadNode
Peaking
BiquadNode curve buffer
setValueCurveAtTime Highshelf
Figure 1.2 A complex audio routing graph. It applies an envelope, filterbank, dis-
tortion and reverb to noise.
4
• The Chrome web browser –of all available web browsers, Chrome
has perhaps the most extensive implementation of the Web Audio
API. Almost all features of the Web Audio API are implemented
in other popular browsers (Firefox, Safari, Edge, Opera, and their
mobile device equivalents) but there are enough subtle differences that
applications can’t be guaranteed to work out-of-the-box in another
browser just because they work in Chrome. There are third-party tools
to help ensure cross-browser functionality, but for all the code herein,
we just stick with Chrome.
When running your applications, make sure to have the Developer
Console open in Chrome. That way, you will see any error messages that
appear. You can find the Developer Console by opening the Chrome
Menu in the upper-right-hand corner of the browser window and
selecting More Tools → Developer Tools.
• A source code editor –this could be any text editor designed specifically
for editing source code. Atom, Visual Studio Code and Sublime Text
are popular choices. The author mainly used Atom, but it shouldn’t
really matter which one you use, as long as you can satisfy the next
bullet point.
• An http server package –for most source code editors, this is an
add-on. A lot of the code examples will not run properly without a
5
<button onclick='context.resume()'>Start</
button>
<script>
context= new AudioContext()
Tone= context.createOscillator()
Tone.start()
Tone.connect(context.destination)
</
script>
Most browsers will know this is an html file and display it correctly.
Inside <script> and </script> is the JavaScript code that uses
the Web Audio API. A new AudioContext is created. It has a single
OscillatorNode, with default settings. We will introduce the oscillator
node later, but for now its sufficient to note that it is a source node that
generates a periodic waveform, and with its default values it is a 440 Hz
sine wave.
The oscillator is started, and connected to the destination. Any
audio node’s output can be connected to any other audio node’s input
by using the connect() function. context.destination is an
AudioDestinationNode, sending the audio stream to the default audio
output of the user’s system.
However, the script by itself will not do anything. The audio context is
suspended by default, and needs to be started by some user interaction. We
did this by creating a button on the web page, and having that perform the
line context.resume() when clicked.
This Hello World application does not showcase any intermediate pro-
cessing. That is, the source is connected directly to the destination. So let’s
extend it a little bit. In Code example 1.2, we have added a gain node, which
simply multiplies the input by some value, in order to produce the output.
We set that value to 0.25. Now, the source tone is connected to the gain
node and the gain node is connected to the destination. So the oscillator’s
signal level is reduced by one-fourth.
6
<button onclick='context.resume()'>Start</
button>
<script>
var context= new AudioContext()
var Tone= context.createOscillator()
var Volume= context.createGain()
Volume.gain.value=0.25
Tone.start()
Tone.connect(Volume)
Volume.connect(context.destination)
</
script>
Let’s take a step back now and look at a few lines of code in detail.
Changing the gain of an audio signal is a fundamental operation in audio
applications. createGain is a method of an audioContext that creates
a GainNode. The GainNode is an AudioNode with one input stream and
one output stream. It has a single parameter, gain, which represents the
amount of gain to apply. Its default value is 1, meaning that the input is
left unchanged.
Alternatively, we could have created a new gain node directly using a
gainNode constructor. This takes as input the context and, optionally,
parameter values. Besides allowing us to set the parameters when created,
it can be slightly more efficient than the createGain method. Also, serial
connections can be combined as A.connect(B).connect(C) …. So we
can rewrite this as in Code example 1.3.
<button onclick='context.resume()'>Start</
button>
<script>
var context= new AudioContext()
var Tone= new OscillatorNode(context)
var Volume= new GainNode(context,{gain:0.25})
Tone.start()
Tone.connect(Volume).connect(context.destination)
</
script>
The resulting audio graph, shown in Figure 1.3, is the same for Code
example 1.2 and Code example 1.3.
AudioContext
Figure 1.3 A simple audio graph to generate a tone with reduced volume.
<button onclick=
'StartStop()'>Start/Stop</button>
<p>Gain</
p>
<input type=
'range' max=
1 value=
0.1 step=0.01 id='VolumeSlider'>
<span id= span>
'VolumeLabel'></
<script src= script>
'UserInteraction.js'></
We now have our first Web Audio API application with user controls.
It allows one to experiment with oscillators, listen to different volume
settings, and disconnect the oscillator at will. For a lot of this, like the
OscillatorNode and disconnect(), we only introduced just enough to
show off some functionality. They will be more formally introduced, with
more detail, in later sections.
References
Bristow-Johnson, R ., 1994. The equivalence of various methods of computing biquad
coefficients for audio parametric equalizers. Audio Engineering Society Convention 97.
Buffa, M. , et al., 2018. Towards an open Web Audio plugin standard. The Web Audio
Conference.
Choi, H ., 2018. AudioWorklet: The future of web audio. International Computer Music
Conference.
Chowning, J. M. , 1973. The synthesis of complex audio spectra by means of frequency
modulation. Journal of the Audio Engineering Society, 21(7).
Creasey, D. J. , 2017. Audio Processes: Musical Analysis, Modification, Synthesis, and
Control, Routledge.
Farnell, A ., 2010. Designing Sound. MIT Press.
Gardner, W. G. , 1994. Efficient convolution without input/output delay. Audio Engineering
Society Convention 97.
Giannoulis, D ., Massberg, M ., & Reiss, J. D. , 2012. Digital dynamic range compressor
design—a tutorial and analysis. Journal of the Audio Engineering Society, 60(6).
Jaffe, D. A. , & Smith, J. O. , 1983. Extensions of the Karplus-Strong plucked string algorithm.
Computer Music Journal, 7(2), pp. 56–69.
Jillings, N ., Wang, Y ., Reiss, J. D. , & Stables, R ., 2016. JSAP: A plugin standard for the
Web Audio API with intelligent functionality. 141st Audio Engineering Society Convention.
Karplus, K ., & Strong, A ., 1983. Digital synthesis of plucked string and drum timbres.
Computer Music Journal, 7(2), pp. 43–55.
Massenburg, G ., 1972. Parametric equalization. 42nd Audio Engineering Society
Convention.
Pulkki, V ., 1997. Virtual sound source positioning using vector base amplitude panning.
Journal of the Audio Engineering Society, 45(6).
Reid, G ., 2000. Amplitude Modulation. s.l.: Sound on Sound.
Reiss, J. D. , & McPherson, A ., 2014. Audio Effects: Theory, Implementation and
Application. s.l.: CRC Press.
Stockham, T. G. Jr, 1966. High-speed convolution and correlation. Spring Joint Computer
Conference.
Valimaki, V ., & Reiss, J. D. , 2016. All about audio equalization: Solutions and frontiers.
Applied Sciences, special issue on Audio Signal Processing, 6(5).