Time Asia - October 24, 2017

(nextflipdebug5) #1

TIME October 23, 2017


the ways in which people experience emo-
tions differently, particularly feelings that
are similar and may be easily confused.
She might delve into how disappointment
is distinct from being angry. Or, say, why
feeling mellow isn’t the same as satiation.
This is supposed to help the writers come
up with responses that provide a sense of
empathy. Take Assistant’s answer to the
phraseI’m stressed out. It replies, “You
must have a ton on your mind. How can I
help?” Says Krettek: “That acknowledg-
ment makes people feel seen and heard.
It’s the equivalent of eye contact.”


GOOGLE’S PERSONALITY architects
sometimes draw inspiration from unex-
pected places. Improv, both Coats and
Germick tell me, has been one of the
most important ones. That’s because the
dialogue in improv comedy is meant to
facilitate conversation by building on
previous lines in a way that encourages
participants to keep engaging with one
another—a principle known as “yes and.”
Germick says almost everyone working
on personality at Google has done improv
at some point.
You’ll get an example of the “yes and”
principle at work if you ask Assistant
about its favorite flavor of ice cream.
“We wouldn’t say, ‘I do not eat ice cream,
I do not have a body,’ ” explains Germick.
“We also wouldn’t say, ‘I love chocolate
ice cream and I eat it every Tuesday with
my sister,’ because that also is not true.”
In these situations, the writers look for
general answers that invite the user to
continue talking. Google responds to
the ice cream question, for instance,
by saying something like, “You can’t go
wrong with Neapolitan, there’s something
for everyone.”
But taking the conversation further is
still considerably difficult for Assistant.
Ask it about a specific flavor within
Neapolitan, like vanilla or strawberry,
and it gets stumped. Google’s digital
helper also struggles with some of the
basic fundamentals of conversation, such
an interpreting certain requests that are
phrased differently from the questions
it’s programmed to understand. And
the tools Google can use to understand
what a user wants or how he or she may
feel are limited. It can’t tell, for instance,
whether a person is annoyed, excited or
tired depending on the tone of their voice.


And it certainly can’t notice when facial
expressions change.
The best characteristics Google cur-
rently has to work with are a user’s
history. By looking at what a person has
previously asked and the features he or
she uses most, it can try to avoid sounding
repetitive. In the future, Google hopes to
make broader observations about a user’s
preferences based on how they interact
with Assistant. “We’re not totally there
yet,” says Germick. “But we’d be able to
start to understand, is this a user that
likes to joke around more, or is this a user
that’s more about business? The holy grail
to me is that we can really understand
human language to a point where almost
anything I can say will be understood,”
he says. “Even if there’s an emotional
subtext or some sort of idiom.”
When that will be exactly is unclear.
Ask most people working on voice in
Silicon Valley, including those at Google,
and they will respond with some version
of the same pat phrase: “It’s early days,”
which roughly translates to “Nobody
really knows.”
In the meantime, Google is focusing
on the nuances of speech. When Assistant
tells you about the weather, it may empha-
size words likemostly. Or perhaps you’ve
noticed the way its voice sounds slightly
higher when it says “no” at the start of a
sentence. Those seemingly minor inflec-
tions are intentional, and they’re proba-
bly James Giangola’s doing. As Google’s
conversation and persona design lead,
he’s an expert in linguistics and prosody,
a field that examines the patterns of stress
and intonation in language.
Of all the people I meet on Google’s
personality team, Giangola is the most
engineer-like. He comes to our meeting
prepared with notes and talking points
that he half-reads to me from behind his
laptop. He’s all business but also thrilled
to tell me about why voice interaction is
so important for technology companies
to get right. “The stakes are really high
for voice user interfaces, because voice
is such a personal marker of social iden-
tity,” he says. Like many other Assistant
team members I spoke with, Giangola
often abbreviates the termvoice user
interface as VUI, pronounced “vooey,” in
conversation. “Helen Keller said blind-
ness separates people from things, and
deafness separates people from people,”

A BRIEF HISTORY OF
SPEECH RECOGNITION

BELL LABSAUDREY

IBMSHOEBOX

CMUHARPY

APPLESIRI

1952

1970S

2011

One of the first serious
attempts to enable machines
to recognize speech, Audrey
could recognize spoken
numeric digits

Demonstrated at the
World’s Fair in Seattle, it
could recognize and respond
to 16 spoken words

Developed at Carnegie Mellon
University as part of a five-
year DARPA program, it could
understand complete sentences
and had a 1,011-word
vocabulary

1980S
IBMTANGORA
This typewriter could identify
spoken words and type them on
paper. By the mid-1980s, it had
a vocabulary of 20,000 words

1997
DragonNATURALLYSPEAKING
The first software product
to recognize continuous
speech—at a rate of about
100 words per minute

Launched alongside the
iPhone 4S, Siri finally proved
that voice could be useful

1962

AUDREY: NOKIA BELL LABS; SIRI: PAUL SAKUMA—AP/REX/SHUTTERSTOCK
Free download pdf