Scientific American - USA (2020-10)

14 Scientific American, October 2020

FORUM
COMMENTARY ON SCIENCE IN
THE NEWS FROM THE EXPERTS

Illustration by Martin Gee

Claudia Lopez-Lloreda is a freelance science writer and neuroscience graduate student at the University of Pennsylvania.

“Clow-dia,” I say once. Twice. A third time. Defeated, I say the
Americanized version of my name: “Claw-dee-ah.” Finally, Siri
recognizes it. Having to adapt our way of speaking to interact
with speech-recognition technologies is a familiar experience for
people whose first language is not English or who do not have
conventionally American-sounding names. I have now stopped
using Siri, Apple’s voice-based virtual assistant, because of it.
The growth of this tech in the past decade—not just Siri but
Alexa and Cortana and others—has unveiled a problem in it: racial
bias. One recent study, published in the Proceedings of the Nation-
al Academy of Sciences USA, showed that speech-recognition pro-
grams are biased against Black speakers. On average, the authors
found, all five programs from leading technology companies,
including Apple and Microsoft, showed significant race dispari-
ties; they were roughly twice as likely to incorrectly transcribe
audio from Black speakers compared with white speakers.
This effectively censors voices that are not part of the “stan-
dard” languages or accents used to create these technologies. “I
don’t get to negotiate with these devices unless I adapt my lan-
guage patterns,” says Halcyon Lawrence, an assistant professor
of technical communication and information design at Towson

University, who was not part of the study. “That is problematic.” For Lawrence, who has a Trinidad and Tobagonian accent, or for me as a Puerto Rican, part of our identity comes from speaking a particular language, having an accent or using a set of speech forms such as African American Vernacular English (AAVE). Hav- ing to change such an integral part of an identity to be able to be recognized is inherently cruel. The inability to be understood impacts other marginalized com- munities, such as people with visual or movement disabilities who rely on voice recognition and speech-to-text tools, says Allison Koe- necke, a computational graduate student and first author of the PNAS study. For someone with a disability who is dependent on these technologies, being misunderstood could have serious con- sequences. There are probably many culprits for these disparities, but Koenecke points to the most likely: the data used for training, which are predominantly from white, native speakers of American English. By using databases that are narrow both in the words that are used and how they are said, training systems exclude accents and other ways of speaking that have unique linguistic features. Hu mans, presumably including those who create these technologies, have accent and language biases. For example, research shows that the presence of an accent affects whether jurors find people guilty and whether patients find their doctors competent. Recognizing these biases would be an important way to avoid implementing them in technologies. But developing more inclu- sive technology takes time, effort and money, and often the deci- sion to invest these are market-driven. (In response to several queries, only a Google spokesperson responded in time for pub - lication, saying, in part, “We’ve been working on the challenge of accurately recognizing variations of speech for several years and will continue to do so.”) Safiya Noble, an associate professor of information studies at the University of California, Los Angeles, admits that it’s a tricky challenge. “Language is contextual,” says Noble, who was not involved in the study. “But that doesn’t mean that companies shouldn’t strive to decrease bias and disparities.” To do this, they need the input of humanists and social scientists who understand how language actually works. From the tech side, feeding more diverse training data into the programs could close this gap, Koenecke says. Noble adds that tech companies should also test their products more widely and have more diverse workforces so people from different back- grounds and perspectives can directly influence the design of speech technologies. Koenecke suggests that automated speech- recognition companies use the PNAS study as a preliminary benchmark and keep using it to assess their systems over time. In the meantime, many of us will continue to struggle between identity and being understood when interacting with Alexa, Cor- tana or Siri. But Lawrence chooses identity every time: “I’m not switching,” she says. “I’m not doing it.”

JOIN THE CONVERSATION ONLINE Visit Scientific American on Facebook and Twitter or send a letter to the editor: [email protected]

Siri Is a Biased

Listener

Scientific American - USA (2020-10)

Get our desktop app

Company

Features

Documentation

Resources