Catalogue Search | MBRL

Speech synthesis from neural decoding of spoken sentences

by Chang, Edward F. , Anumanchipalli, Gopala K. , Chartier, Josh in 631/378/2629 , 631/378/2632/2634 , 9/30

2019

Technology that translates neural activity into speech would be transformative for people who are unable to communicate as a result of neurological impairments. Decoding speech from neural activity is challenging because speaking requires very precise and rapid multi-dimensional control of vocal tract articulators. Here we designed a neural decoder that explicitly leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech. Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. Intermediate articulatory dynamics enhanced performance even with limited data. Decoded articulatory representations were highly conserved across speakers, enabling a component of the decoder to be transferrable across participants. Furthermore, the decoder could synthesize speech when a participant silently mimed sentences. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication. A neural decoder uses kinematic and sound representations encoded in human cortical activity to synthesize audible sentences, which are readily identified and transcribed by listeners.

Journal Article

Share this book

Add to My Shelf

Selective cortical representation of attended speaker in multi-talker speech perception

by Chang, Edward F. , Mesgarani, Nima in 631/378/1697 , 631/378/2619 , 631/378/2649/1723

2012

The neural correlates of how attended speech is internally represented are described, shedding light on the ‘cocktail party problem’. Heard instinct The 'cocktail-party problem' — the question of what goes on in our brains when we listen selectively for one person's voice while ignoring many others — has puzzled researchers from various disciplines for years. Using electrophysiological recordings from neurosurgery patients listening to two speakers simultaneously, Nima Mesgarani and Edward Chang determine the neural correlates associated with the internal representation of attended speech. They find that the neural responses in the auditory cortex represent the attended voice robustly, almost as if the second voice were not there. With these patterns established, a simple algorithm trained on various speakers predicts which stimulus a subject is attending to, on the basis of the patterns emerging in the secondary auditory cortex. These results suggest that speech representation in the brain reflects not only the acoustic environment, but also the listener's understanding of these signals. As well as shedding light on a long-standing neurobiological problem, this work may give clues as to how automatic speech recognition might be improved to cope with more than one talker. Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background 1 , 2 , 3 . How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented 4 , 5 . Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.

Journal Article

Share this book

Add to My Shelf

Phonetic Feature Encoding in Human Superior Temporal Gyrus

by Chang, Edward F. , Mesgarani, Nima , Cheung, Connie in Acoustic spectra , Acoustics , Auditory Cortex - anatomy & histology

2014

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.

Journal Article

Share this book

Add to My Shelf

Real-time decoding of question-and-answer speech dialogue using human cortical activity

by Chang, Edward F. , Makin, Joseph G. , Leonard, Matthew K. in 631/378/116/2394 , 631/378/2632/2634 , 631/378/2649/1594

2019

Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance’s identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate. Speech neuroprosthetic devices should be capable of restoring a patient’s ability to participate in interactive dialogue. Here, the authors demonstrate that the context of a verbal exchange can be used to enhance neural decoder performance in real time.

Journal Article

Share this book

Add to My Shelf

A high-performance neuroprosthesis for speech decoding and avatar control

by Chang, Edward F. , Metzger, Sean L. , Anumanchipalli, Gopala K. in 631/378/2632/1663 , 631/378/2632/2634 , 631/443/376

2023

Speech neuroprostheses have the potential to restore communication to people living with paralysis, but naturalistic speed and expressivity are elusive 1 . Here we use high-density surface recordings of the speech cortex in a clinical-trial participant with severe limb and vocal paralysis to achieve high-performance real-time decoding across three complementary speech-related output modalities: text, speech audio and facial-avatar animation. We trained and evaluated deep-learning models using neural data collected as the participant attempted to silently speak sentences. For text, we demonstrate accurate and rapid large-vocabulary decoding with a median rate of 78 words per minute and median word error rate of 25%. For speech audio, we demonstrate intelligible and rapid speech synthesis and personalization to the participant’s pre-injury voice. For facial-avatar animation, we demonstrate the control of virtual orofacial movements for speech and non-speech communicative gestures. The decoders reached high performance with less than two weeks of training. Our findings introduce a multimodal speech-neuroprosthetic approach that has substantial promise to restore full, embodied communication to people living with severe paralysis. A study using high-density surface recordings of the speech cortex in a person with limb and vocal paralysis demonstrates real-time decoding of brain activity into text, speech sounds and orofacial movements.

Journal Article

Share this book

Add to My Shelf

Functional organization of human sensorimotor cortex for speech articulation

by Chang, Edward F. , Mesgarani, Nima , Bouchard, Kristofer E. in 631/378/2649/1594 , Biological and medical sciences , Cerebral Cortex - physiology

2013

Speaking is one of the most complex actions that we perform, but nearly all of us learn to do it effortlessly. Production of fluent speech requires the precise, coordinated movement of multiple articulators (for example, the lips, jaw, tongue and larynx) over rapid time scales. Here we used high-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables to determine the organization of speech sensorimotor cortex in humans. We found speech-articulator representations that are arranged somatotopically on ventral pre- and post-central gyri, and that partially overlap at individual electrodes. These representations were coordinated temporally as sequences during syllable production. Spatial patterns of cortical activity showed an emergent, population-level representation, which was organized by phonetic features. Over tens of milliseconds, the spatial patterns transitioned between distinct representations for different consonants and vowels. These results reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies our ability to speak. Multi-electrode cortical recordings during the production of different consonant-vowel syllables reveal distinct speech-articulator representations that are arranged somatotopically, with temporal and spatial patterns of activity across the neural population corresponding to phonetic features and dynamics. Brain organization for speech The act of speaking requires precisely timed coordinated movement of the lips, jaw, tongue and larynx. Edward Chang and colleagues have explored the neural basis of this precise motor control. Multi-electrode recordings in human sensorimotor cortex reveal that the region of the brain involved in speech is laid out according to a somatotopic representation of the face and vocal tract, with large populations of cells corresponding to specific phonetic features. Of particular interest is an additional laryngeal representation located at the dorsal-most end of the ventral sensorimotor cortex, apparently absent in non-human primates, that may be a feature developed uniquely for the specialized control of speech.

Journal Article

Share this book

Add to My Shelf

Human cortical encoding of pitch in tonal and non-tonal languages

by Chang, Edward F. , Wu, Jinsong , Tang, Claire in 631/378/116 , 631/378/2619/2618 , 631/378/2649/1723

2021

Languages can use a common repertoire of vocal sounds to signify distinct meanings. In tonal languages, such as Mandarin Chinese, pitch contours of syllables distinguish one word from another, whereas in non-tonal languages, such as English, pitch is used to convey intonation. The neural computations underlying language specialization in speech perception are unknown. Here, we use a cross-linguistic approach to address this. Native Mandarin- and English- speaking participants each listened to both Mandarin and English speech, while neural activity was directly recorded from the non-primary auditory cortex. Both groups show language-general coding of speaker-invariant pitch at the single electrode level. At the electrode population level, we find language-specific distribution of cortical tuning parameters in Mandarin speakers only, with enhanced sensitivity to Mandarin tone categories. Our results show that speech perception relies upon a shared cortical auditory feature processing mechanism, which may be tuned to the statistics of a given language. Different languages rely on different vocal sounds to convey meaning. Here the authors show that language-general coding of pitch occurs in the non-primary auditory cortex for both tonal (Mandarin Chinese) and non-tonal (English) languages, with some language specificity on the population level.

Journal Article

Share this book

Add to My Shelf

Multi-day rhythms modulate seizure risk in epilepsy

by Chang, Edward F. , King-Stephens, David , Kleen, Jonathan K. in 631/1647/1453/1450 , 631/378/1689/178 , 9/26

2018

Epilepsy is defined by the seemingly random occurrence of spontaneous seizures. The ability to anticipate seizures would enable preventative treatment strategies. A central but unresolved question concerns the relationship of seizure timing to fluctuating rates of interictal epileptiform discharges (here termed interictal epileptiform activity, IEA), a marker of brain irritability observed between seizures by electroencephalography (EEG). Here, in 37 subjects with an implanted brain stimulation device that detects IEA and seizures over years, we find that IEA oscillates with circadian and subject-specific multidien (multi-day) periods. Multidien periodicities, most commonly 20–30 days in duration, are robust and relatively stable for up to 10 years in men and women. We show that seizures occur preferentially during the rising phase of multidien IEA rhythms. Combining phase information from circadian and multidien IEA rhythms provides a novel biomarker for determining relative seizure risk with a large effect size in most subjects. The ability to identify periods of heightened seizure risk could enable new treatments for patients with epilepsy. Here, the authors describe long term EEG recordings from 37 patients which allow them to identify multi-day fluctuations in interictal activity.

Journal Article

Share this book

Add to My Shelf

Direct mapping of curve-crossing dynamics in IBr by attosecond transient absorption spectroscopy

by Kobayashi, Yuki , Zeng, Tao , Neumark, Daniel M. in Absorption spectroscopy , Adiabatic , Attosecond pulses

2019

The electronic character of photoexcited molecules can abruptly change at avoided crossings and conical intersections. Here, we report direct mapping of the coupled interplay between electrons and nuclei in a prototype molecule, iodine monobromide (IBr), by using attosecond transient absorption spectroscopy. A few-femtosecond visible pulse resonantly excites the (B³ΠO+), Y(0⁺), and Z(0⁺) states of IBr, and the photodissociation dynamics are tracked with an attosecond extreme-ultraviolet pulse that simultaneously probes the I-4d and Br-3d core-level absorption edges. Direct comparison with quantum mechanical simulations unambiguously identifies the absorption features associated with adiabatic and diabatic channels at the B/Y avoided crossing and concurrent two-photon dissociation processes that involve the Y/Z avoided crossing. The results show clear evidence for rapid switching of valence-electronic character at the avoided crossing.

Journal Article

Share this book

Add to My Shelf

Major depression during and after the menopausal transition: Study of Women's Health Across the Nation (SWAN)

by Cyranowski, J. M. , Kravitz, H. M. , Chang, Y.-F. in Adult , Adult and adolescent clinical studies , African Americans

2011

It is unclear whether risk for major depression during the menopausal transition or immediately thereafter is increased relative to pre-menopause. We aimed to examine whether the odds of experiencing major depression were greater when women were peri- or post-menopausal compared to when they were pre-menopausal, independent of a history of major depression at study entry and annual measures of vasomotor symptoms (VMS), serum levels of, or changes in, estradiol (E2), follicular stimulating hormone (FSH) or testosterone (T) and relevant confounders. Participants included the 221 African American and Caucasian women, aged 42-52 years, who were pre-menopausal at entry into the Pittsburgh site of a community-based study of menopause, the Study of Women's Health Across the Nation (SWAN). We conducted the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID) to assess diagnoses of lifetime, annual and current major depression at baseline and at annual follow-ups. Psychosocial and health factors, and blood samples for assay of reproductive hormones, were obtained annually. Women were two to four times more likely to experience a major depressive episode (MDE) when they were peri-menopausal or early post-menopausal. Repeated-measures logistic regression analyses showed that the effect of menopausal status was independent of history of major depression and annually measured upsetting life events, psychotropic medication use, VMS and serum levels of or changes in reproductive hormones. History of major depression was a strong predictor of major depression throughout the study. The risk of major depression is greater for women during and immediately after the menopausal transition than when they are pre-menopausal.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter