Here we demonstrate real-time decoding of perceived and produced speech
from high-density ECoG activity in humans during a task that mimics
natural question-and-answer dialogue. While this task still provides explicit
external cueing and timing to participants, the interactive and goal-
oriented aspects of a question-and-answer paradigm represent a major
step towards more naturalistic applications. During ECoG recording,
SDUWLFLSDQWV¿UVWOLVWHQHGWRDVHWRISUHUHFRUGHGTXHVWLRQVDQGWKHQ
verbally produced a set of answer responses. These data served as input to
train speech detection and decoding models. After training, participants
performed a task in which, during each trial, they listened to a question and
responded aloud with an answer of their choice. Using only neural signals,
we detect when participants are listening or speaking and predict the
identity of each detected utterance using phone-level Viterbi decoding.
Because certain answers are valid responses only to certain questions, we
integrate the question and answer predictions by dynamically updating
the prior probabilities of each answer using the preceding predicted
question likelihoods.
Participants provided live answers to prerecorded questions, and researchers
used their brain-signal data to train models to understand both what they said
and heard. On average, the software correctly detected questions 76 percent
percent of the time and the response of the participant at a lower rate of 61
percent. While it’s easy to concoct theories of nefarious uses for this technology,
it shows a promise in communication with non-verbal people with injuries or
neurodegenerative disorders.