Time Asia - October 24, 2017

TIME October 23, 2017

the ways in which people experience emo-
tions differently, particularly feelings that
are similar and may be easily confused.
She might delve into how disappointment
is distinct from being angry. Or, say, why
feeling mellow isn’t the same as satiation.
This is supposed to help the writers come
up with responses that provide a sense of
empathy. Take Assistant’s answer to the
phraseI’m stressed out. It replies, “You
must have a ton on your mind. How can I
help?” Says Krettek: “That acknowledg-
ment makes people feel seen and heard.
It’s the equivalent of eye contact.”

GOOGLE’S PERSONALITY architects
sometimes draw inspiration from unex-
pected places. Improv, both Coats and
Germick tell me, has been one of the
most important ones. That’s because the
dialogue in improv comedy is meant to
facilitate conversation by building on
previous lines in a way that encourages
participants to keep engaging with one
another—a principle known as “yes and.”
Germick says almost everyone working
on personality at Google has done improv
at some point.
You’ll get an example of the “yes and”
principle at work if you ask Assistant
about its favorite flavor of ice cream.
“We wouldn’t say, ‘I do not eat ice cream,
I do not have a body,’ ” explains Germick.
“We also wouldn’t say, ‘I love chocolate
ice cream and I eat it every Tuesday with
my sister,’ because that also is not true.”
In these situations, the writers look for
general answers that invite the user to
continue talking. Google responds to
the ice cream question, for instance,
by saying something like, “You can’t go
wrong with Neapolitan, there’s something
for everyone.”
But taking the conversation further is
still considerably difficult for Assistant.
Ask it about a specific flavor within
Neapolitan, like vanilla or strawberry,
and it gets stumped. Google’s digital
helper also struggles with some of the
basic fundamentals of conversation, such
an interpreting certain requests that are
phrased differently from the questions
it’s programmed to understand. And
the tools Google can use to understand
what a user wants or how he or she may
feel are limited. It can’t tell, for instance,
whether a person is annoyed, excited or
tired depending on the tone of their voice.

And it certainly can’t notice when facial expressions change. The best characteristics Google cur- rently has to work with are a user’s history. By looking at what a person has previously asked and the features he or she uses most, it can try to avoid sounding repetitive. In the future, Google hopes to make broader observations about a user’s preferences based on how they interact with Assistant. “We’re not totally there yet,” says Germick. “But we’d be able to start to understand, is this a user that likes to joke around more, or is this a user that’s more about business? The holy grail to me is that we can really understand human language to a point where almost anything I can say will be understood,” he says. “Even if there’s an emotional subtext or some sort of idiom.” When that will be exactly is unclear. Ask most people working on voice in Silicon Valley, including those at Google, and they will respond with some version of the same pat phrase: “It’s early days,” which roughly translates to “Nobody really knows.” In the meantime, Google is focusing on the nuances of speech. When Assistant tells you about the weather, it may empha- size words likemostly. Or perhaps you’ve noticed the way its voice sounds slightly higher when it says “no” at the start of a sentence. Those seemingly minor inflec- tions are intentional, and they’re proba- bly James Giangola’s doing. As Google’s conversation and persona design lead, he’s an expert in linguistics and prosody, a field that examines the patterns of stress and intonation in language. Of all the people I meet on Google’s personality team, Giangola is the most engineer-like. He comes to our meeting prepared with notes and talking points that he half-reads to me from behind his laptop. He’s all business but also thrilled to tell me about why voice interaction is so important for technology companies to get right. “The stakes are really high for voice user interfaces, because voice is such a personal marker of social iden- tity,” he says. Like many other Assistant team members I spoke with, Giangola often abbreviates the termvoice user interface as VUI, pronounced “vooey,” in conversation. “Helen Keller said blind- ness separates people from things, and deafness separates people from people,”

A BRIEF HISTORY OF SPEECH RECOGNITION

BELL LABSAUDREY

IBMSHOEBOX

CMUHARPY

APPLESIRI

1952

1970S

2011

One of the first serious attempts to enable machines to recognize speech, Audrey could recognize spoken numeric digits

Demonstrated at the World’s Fair in Seattle, it could recognize and respond to 16 spoken words

Developed at Carnegie Mellon University as part of a five- year DARPA program, it could understand complete sentences and had a 1,011-word vocabulary

1980S IBMTANGORA This typewriter could identify spoken words and type them on paper. By the mid-1980s, it had a vocabulary of 20,000 words

1997 DragonNATURALLYSPEAKING The first software product to recognize continuous speech—at a rate of about 100 words per minute

Launched alongside the iPhone 4S, Siri finally proved that voice could be useful

1962

AUDREY: NOKIA BELL LABS; SIRI: PAUL SAKUMA—AP/REX/SHUTTERSTOCK

Time Asia - October 24, 2017

Get our desktop app

Company

Features

Documentation

Resources