Oreilly Design For Voice Interfaces
Oreilly Design For Voice Interfaces
Laura Klein
Design for Voice Interfaces
by Laura Klein
Copyright 2016 OReilly Media, Inc. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
OReilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://safaribooksonline.com). For
more information, contact our corporate/institutional sales department:
800-998-9938 or [email protected].
While the publisher and the author have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the author disclaim all responsibility for errors or omissions, including without limi
tation responsibility for damages resulting from the use of or reliance on this work.
Use of the information and instructions contained in this work is at your own risk. If
any code samples or other technology this work contains or describes is subject to
open source licenses or the intellectual property rights of others, it is your responsi
bility to ensure that your use thereof complies with such licenses and/or rights.
978-1-4919-3458-6
[LSI]
Table of Contents
vii
CHAPTER 1
Design for Voice Interfaces
1
Figure 1-1. Radio Rex.
And why would it? Weve been dreaming about perfect voice inter
faces since the 1960s, at least. The computer from Star Trek under
stood Captain Kirk perfectly and could answer any question he
asked. HAL, the computer from 2001: A Space Odyssey, although not
without one or two fairly significant bugs, was flawless from a
speech input and output perspective.
Unfortunately, reality never started to approach fiction until fairly
recently, and even now there are quite a few technical challenges
that we need to take into consideration when designing voice
interfaces.
Quite a bit of progress was made in the 1990s, and voice recognition
technology improved to the point that people could begin using it
for a very limited number of everyday tasks. One of the first uses for
the technology were voice dialers, which allowed people to dial up
to ten different phone numbers on their touch-tone phones just by
Conversational Skills
Content and tone are important in all design, but when designing
for speech output, it takes on an entirely new meaning. The best
voice interface designs make the user feel like shes having a perfectly
normal dialog, but doing that can be harder than it sounds. Prod
ucts that talk dont just need to have good copy; they must have
good conversations. And its harder for a computer to have a good
conversation than a human.
Tony Sheeder, senior manager of user experience design at Nuance
Communications, has been with the company for more than 14
years and has been working in voice design for longer than that. As
he explains it:
Each voice interaction is a little narrative experience, with a begin
ning, middle and an end. Humans just get this and understand the
rules naturallysome more than others. When you go to a party,
you can tell within a very short time whether another person is easy
But, what if you thought you wanted to open a new business account
that was tied to your old savings account, and there were several
options to choose from, each with different fee structures and
options? Thats a much harder conversation to start, because you
might not even know exactly what to ask for. You might never even
realize that the business plans existed if you didnt know to ask for it.
This sort of discoverability is a serious problem when designing for
open prompt voice interfaces. When Abi Jones first began designing
for voice, she carried around a phony voice recorder and treated it
like a magic device that could do whatever she wanted it to do. It
made me realize how hard it was to say what I wanted in the world,
she says.
Even in voice interfaces that limit inputs and make functionality
extremely discoverablelike IVRs that prompt the user to say spe
cific wordsdesigners still must deal with a level of unpredictability
in response that is somewhat unusual when designing for screens.
Most of our selections within a visual product are constrained by the
UI. There are buttons or links to click, options to select, sliders to
When to use it? Finite-state, pure voice systems are still useful for cer
tain systems. Because the only input and output methods are voice
and audio, theyre going to be handy for products that dont have a
screen. This obviously includes IVR phone systems, but it could also
be a physical device like a screenless wearable device (Figure 1-6).
In general, youll use a finite-state system when your product is sim
ple enough that its not worth going for NLP. Theyre useful for
products for which users can be trained to do a very small number
of tasks. For example, a bedside clock that lets you set alarms doesnt
necessarily need a full NLP system. It just needs to understand pre
set commands such as, Set alarm, that users could memorize. The
same is true for the autodialer on a corporate phone system. Its not
handling open-ended queries. Its just recognizing a specific list of
names and directing calls.
One of the main problems with finite-state systemsand the reason
so many people hate most IVRsis that they often require users to
go through a labyrinth of prompts to get to the one thing they want.
If the system tries to handle too much, it can require a huge amount
of investment on the users part, only to end with having to talk to a
representative or being disconnected.
Simple systems that handle just a few predictable tasks that users
might not know how to ask for naturally are good candidates for a
pure voice, finite-state interface. For example, cars audio system
might be fine for one. There are a limited number of things you
might want from it: play a song, turn up the volume, and so on. The
user interacts with it daily, so theyre more likely to use the same
vocabulary for the commands every time. Each command is simple
and discrete, so users wont get trapped. And finally, its very easy to
recognize and recover from a mistake.
Pure voiceNLP
As soon as the technology improved, many IVR systems moved to
NLP. This means that, when you call a company for help, you might
get a computer asking, What can I help you with today? after
which there is a very good possibility that it will recognize what
A little of everything
Many products are moving to multimodal interfaces that combine
voice and physical inputs with screen and audio output. Navigation
apps might be the perfect example of a category of product that
combines all of these elements well.
Users can touch places on the map, scroll around to see whats
nearby, or type in an address using physical input. When driving,
they can simply say the name of a destination; this way, they dont
Resources
The hardest part about becoming a VUI designer right now might
be the lack of classes and resources available to new designers. If
youre serious about voice design, your best bet is to get a solid
grounding in User-Centered Design, good user research techniques,
and information architecture, and then to begin working with a
team with some voice design experience.
Books
Nass, Clifford. Wired for Speech: How Voice Activates and
Advances the Human-Computer Relationship.
Cohen, Michael H., James P. Giangola, and Jennifer Balogh.
Voice User Interface Design.
Organization
The Association for Voice Interaction Design