How AI Reads Neural Signals to Help Man with ALS Speak

Brain-computer interfaces are a breakthrough technology that could help paralyzed people regain lost functions, such as hand movement. These devices record signals from the brain and decipher the user’s intended action, bypassing the damaged or degraded nerves that would normally send those brain signals to the muscles that control them.

Since 2006, demonstrations of brain-computer interfaces in humans have focused primarily on restoring arm and hand movements by allowing people to control computer cursors or robotic arms. More recently, researchers have begun developing speech-enabled brain-computer interfaces to restore communication to people who cannot speak.

When a user tries to speak, brain-computer interfaces record the unique brain signals associated with attempted muscle movements during speech, then translate them into words. These words can then be displayed as text on a screen or spoken aloud using text-to-speech software.

I’m a researcher in the Neuroprosthetics Lab at UC Davis, which is part of the BrainGate2 clinical trial. My colleagues and I recently demonstrated a speech brain-computer interface that deciphers the attempted speech of a man with ALS, or amyotrophic lateral sclerosis, also known as Lou Gehrig’s disease. The interface converts neural signals into text with more than 97% accuracy. Key to our system is a set of artificial intelligence language models—artificial neural networks that help interpret natural language.

Recording brain signals

The first step in our brain-computer speech interface is to record brain signals. There are several sources of brain signals, some of which require surgery to record. Surgically implanted recording devices can record high-quality brain signals because they are placed closer to the neurons, resulting in stronger signals with less interference. These neural recording devices include electrode grids placed on the surface of the brain or electrodes implanted directly into the brain tissue.

In our study, we used electrode arrays surgically placed in the speech motor cortex, the part of the brain that controls the muscles involved in speech, of participant Casey Harrell. We recorded neural activity from 256 electrodes as Harrell attempted to speak.

A small square device with a series of spikes on the bottom and a bundle of wires on top — A set of 64 electrodes placed in the brain tissue records neural signals.
UC Davis Health

Decoding brain signals

The next challenge is associating complex brain signals with the words the user is trying to say.

One way is to map patterns of neural activity directly to spoken words. This method requires repeatedly recording brain signals corresponding to each word to identify the average relationship between neural activity and specific words. While this strategy works for small dictionaries, as shown in a 2021 study with a 50-word dictionary, it becomes impractical for larger dictionaries. Imagine asking a brain-computer interface user to try to say every word in a dictionary many times—it could take months, and it still wouldn’t work for new words.

Instead, we use an alternative strategy: mapping brain signals onto phonemes, the basic units of sound that make up words. There are 39 phonemes in English, including ch, er, oo, pl, and sh, that can be combined to make any word. We can measure the neural activity associated with each phoneme multiple times by simply asking the participant to read a few sentences aloud. By carefully mapping neural activity onto phonemes, we can assemble them into any English word, even ones the system hasn’t been explicitly trained to work with.

To map brain signals to phonemes, we use advanced machine learning models. These models are particularly well-suited to this task because of their ability to find patterns in large amounts of complex data that humans would otherwise be unable to see. Think of these models as super-intelligent listeners that can pick out important information from noisy brain signals, much like you can focus on a conversation in a crowded room. Using these models, we were able to decipher sequences of phonemes during a speech trial with over 90% accuracy.

Brain-computer interface uses a clone of Casey Harrell’s voice to read aloud text decoded based on neural activity.

From phonemes to words

Once we have decoded phoneme sequences, we need to turn them into words and sentences. This is difficult, especially if the decoded phoneme sequence is not perfectly accurate. To solve this puzzle, we use two complementary types of machine learning language models.

The first is n-gram based language models that predict which word is most likely to follow a set N words. We trained a 5-gram, or five-word, language model on millions of sentences to predict the probability of a word from the previous four words, capturing local context and common phrases. For example, after “I am very good,” it might suggest “today” as more likely than “potato.” Using this model, we transform our phoneme sequences into the 100 most likely word sequences, each with an associated probability.

The second is large language models that power AI chat bots and predict which words are most likely to follow others. We use large language models to refine our choices. These models, trained on vast amounts of diverse text, have a broader understanding of the structure and meaning of language. They help us determine which of our 100 candidate sentences makes the most sense in the larger context.

By carefully balancing the probabilities from the n-gram model, the large language model, and our initial phoneme predictions, we can make a highly educated guess about what the brain-computer interface user is trying to say. This multi-step process allows us to cope with uncertainties in phoneme decoding and produce coherent, contextually appropriate sentences.

Diagram of a man, his brain, wires and computer screen — How the University of California, Davis, brain-computer interface reads neural activity and turns it into words.
UC Davis Health

Real-world benefits

In practice, this speech decoding strategy has proven to be incredibly effective. We enabled Casey Harrell, a man with ALS, to “speak” with over 97% accuracy using only his thoughts. This breakthrough solution allows him to easily converse with family and friends for the first time in years, all from the comfort of his own home.

Speech brain-computer interfaces represent a significant step forward in restoring communication. As we improve these devices, they promise to give a voice to those who have lost the ability to speak, reconnecting them with loved ones and the world around them.

However, challenges remain, such as making the technology more accessible, portable, and durable for years of use. Despite these obstacles, speech brain-computer interfaces are a powerful example of how science and technology can come together to solve complex problems and radically improve people’s lives.