TY - JOUR
T1 - Continuous and discrete decoding of overt speech with scalp electroencephalography (EEG)
AU - Craik, Alexander
AU - Dial, Heather
AU - L Contreras-Vidal, Jose
N1 - Publisher Copyright:
© 2025 The Author(s). Published by IOP Publishing Ltd.
PY - 2025/4/1
Y1 - 2025/4/1
N2 - Objective. Neurological disorders affecting speech production adversely impact quality of life for over 7 million individuals in the US. Traditional speech interfaces like eye-tracking devices and P300 spellers are slow and unnatural for these patients. An alternative solution, speech brain-computer interfaces (BCIs), directly decodes speech characteristics, offering a more natural communication mechanism. This research explores the feasibility of decoding speech features using non-invasive EEG. Approach. Nine neurologically intact participants were equipped with a 63-channel EEG system with additional sensors to eliminate eye artifacts. Participants read aloud sentences selected for phonetic similarity to the English language. Deep learning models, including Convolutional Neural Networks and Recurrent Neural Networks with and without attention modules, were optimized with a focus on minimizing trainable parameters and utilizing small input window sizes for real-time application. These models were employed for discrete and continuous speech decoding tasks. Main results. Statistically significant participant-independent decoding performance was achieved for discrete classes and continuous characteristics of the produced audio signal. A frequency sub-band analysis highlighted the significance of certain frequency bands (delta, theta, gamma) for decoding performance, and a perturbation analysis was used to identify crucial channels. Assessed channel selection methods did not significantly improve performance, suggesting a distributed representation of speech information encoded in the EEG signals. Leave-One-Out training demonstrated the feasibility of utilizing common speech neural correlates, reducing data collection requirements from individual participants. Significance. These findings contribute significantly to the development of EEG-enabled speech synthesis by demonstrating the feasibility of decoding both discrete and continuous speech features from EEG signals, even in the presence of EMG artifacts. By addressing the challenges of EMG interference and optimizing deep learning models for speech decoding, this study lays a strong foundation for EEG-based speech BCIs.
AB - Objective. Neurological disorders affecting speech production adversely impact quality of life for over 7 million individuals in the US. Traditional speech interfaces like eye-tracking devices and P300 spellers are slow and unnatural for these patients. An alternative solution, speech brain-computer interfaces (BCIs), directly decodes speech characteristics, offering a more natural communication mechanism. This research explores the feasibility of decoding speech features using non-invasive EEG. Approach. Nine neurologically intact participants were equipped with a 63-channel EEG system with additional sensors to eliminate eye artifacts. Participants read aloud sentences selected for phonetic similarity to the English language. Deep learning models, including Convolutional Neural Networks and Recurrent Neural Networks with and without attention modules, were optimized with a focus on minimizing trainable parameters and utilizing small input window sizes for real-time application. These models were employed for discrete and continuous speech decoding tasks. Main results. Statistically significant participant-independent decoding performance was achieved for discrete classes and continuous characteristics of the produced audio signal. A frequency sub-band analysis highlighted the significance of certain frequency bands (delta, theta, gamma) for decoding performance, and a perturbation analysis was used to identify crucial channels. Assessed channel selection methods did not significantly improve performance, suggesting a distributed representation of speech information encoded in the EEG signals. Leave-One-Out training demonstrated the feasibility of utilizing common speech neural correlates, reducing data collection requirements from individual participants. Significance. These findings contribute significantly to the development of EEG-enabled speech synthesis by demonstrating the feasibility of decoding both discrete and continuous speech features from EEG signals, even in the presence of EMG artifacts. By addressing the challenges of EMG interference and optimizing deep learning models for speech decoding, this study lays a strong foundation for EEG-based speech BCIs.
KW - attention networks
KW - convolutional neural networks
KW - deep learning
KW - electroencephalography (EEG)
KW - electromyography (EMG) removal
KW - recurrent neural networks
KW - speech decoding
UR - http://www.scopus.com/inward/record.url?scp=86000780061&partnerID=8YFLogxK
U2 - 10.1088/1741-2552/ad8d0a
DO - 10.1088/1741-2552/ad8d0a
M3 - Article
C2 - 39476487
AN - SCOPUS:86000780061
SN - 1741-2560
VL - 22
JO - Journal of Neural Engineering
JF - Journal of Neural Engineering
IS - 2
M1 - 026017
ER -