Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
10,715
result(s) for
"Chang, F."
Sort by:
Speech synthesis from neural decoding of spoken sentences
by
Chang, Edward F.
,
Anumanchipalli, Gopala K.
,
Chartier, Josh
in
631/378/2629
,
631/378/2632/2634
,
9/30
2019
Technology that translates neural activity into speech would be transformative for people who are unable to communicate as a result of neurological impairments. Decoding speech from neural activity is challenging because speaking requires very precise and rapid multi-dimensional control of vocal tract articulators. Here we designed a neural decoder that explicitly leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech. Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. Intermediate articulatory dynamics enhanced performance even with limited data. Decoded articulatory representations were highly conserved across speakers, enabling a component of the decoder to be transferrable across participants. Furthermore, the decoder could synthesize speech when a participant silently mimed sentences. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication.
A neural decoder uses kinematic and sound representations encoded in human cortical activity to synthesize audible sentences, which are readily identified and transcribed by listeners.
Journal Article
Selective cortical representation of attended speaker in multi-talker speech perception
2012
The neural correlates of how attended speech is internally represented are described, shedding light on the ‘cocktail party problem’.
Heard instinct
The 'cocktail-party problem' — the question of what goes on in our brains when we listen selectively for one person's voice while ignoring many others — has puzzled researchers from various disciplines for years. Using electrophysiological recordings from neurosurgery patients listening to two speakers simultaneously, Nima Mesgarani and Edward Chang determine the neural correlates associated with the internal representation of attended speech. They find that the neural responses in the auditory cortex represent the attended voice robustly, almost as if the second voice were not there. With these patterns established, a simple algorithm trained on various speakers predicts which stimulus a subject is attending to, on the basis of the patterns emerging in the secondary auditory cortex. These results suggest that speech representation in the brain reflects not only the acoustic environment, but also the listener's understanding of these signals. As well as shedding light on a long-standing neurobiological problem, this work may give clues as to how automatic speech recognition might be improved to cope with more than one talker.
Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background
1
,
2
,
3
. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented
4
,
5
. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.
Journal Article
Phonetic Feature Encoding in Human Superior Temporal Gyrus
by
Chang, Edward F.
,
Mesgarani, Nima
,
Cheung, Connie
in
Acoustic spectra
,
Acoustics
,
Auditory Cortex - anatomy & histology
2014
During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.
Journal Article
Real-time decoding of question-and-answer speech dialogue using human cortical activity
by
Chang, Edward F.
,
Makin, Joseph G.
,
Leonard, Matthew K.
in
631/378/116/2394
,
631/378/2632/2634
,
631/378/2649/1594
2019
Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance’s identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate.
Speech neuroprosthetic devices should be capable of restoring a patient’s ability to participate in interactive dialogue. Here, the authors demonstrate that the context of a verbal exchange can be used to enhance neural decoder performance in real time.
Journal Article
A high-performance neuroprosthesis for speech decoding and avatar control
by
Chang, Edward F.
,
Metzger, Sean L.
,
Anumanchipalli, Gopala K.
in
631/378/2632/1663
,
631/378/2632/2634
,
631/443/376
2023
Speech neuroprostheses have the potential to restore communication to people living with paralysis, but naturalistic speed and expressivity are elusive
1
. Here we use high-density surface recordings of the speech cortex in a clinical-trial participant with severe limb and vocal paralysis to achieve high-performance real-time decoding across three complementary speech-related output modalities: text, speech audio and facial-avatar animation. We trained and evaluated deep-learning models using neural data collected as the participant attempted to silently speak sentences. For text, we demonstrate accurate and rapid large-vocabulary decoding with a median rate of 78 words per minute and median word error rate of 25%. For speech audio, we demonstrate intelligible and rapid speech synthesis and personalization to the participant’s pre-injury voice. For facial-avatar animation, we demonstrate the control of virtual orofacial movements for speech and non-speech communicative gestures. The decoders reached high performance with less than two weeks of training. Our findings introduce a multimodal speech-neuroprosthetic approach that has substantial promise to restore full, embodied communication to people living with severe paralysis.
A study using high-density surface recordings of the speech cortex in a person with limb and vocal paralysis demonstrates real-time decoding of brain activity into text, speech sounds and orofacial movements.
Journal Article
Functional organization of human sensorimotor cortex for speech articulation
by
Chang, Edward F.
,
Mesgarani, Nima
,
Bouchard, Kristofer E.
in
631/378/2649/1594
,
Biological and medical sciences
,
Cerebral Cortex - physiology
2013
Speaking is one of the most complex actions that we perform, but nearly all of us learn to do it effortlessly. Production of fluent speech requires the precise, coordinated movement of multiple articulators (for example, the lips, jaw, tongue and larynx) over rapid time scales. Here we used high-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables to determine the organization of speech sensorimotor cortex in humans. We found speech-articulator representations that are arranged somatotopically on ventral pre- and post-central gyri, and that partially overlap at individual electrodes. These representations were coordinated temporally as sequences during syllable production. Spatial patterns of cortical activity showed an emergent, population-level representation, which was organized by phonetic features. Over tens of milliseconds, the spatial patterns transitioned between distinct representations for different consonants and vowels. These results reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies our ability to speak.
Multi-electrode cortical recordings during the production of different consonant-vowel syllables reveal distinct speech-articulator representations that are arranged somatotopically, with temporal and spatial patterns of activity across the neural population corresponding to phonetic features and dynamics.
Brain organization for speech
The act of speaking requires precisely timed coordinated movement of the lips, jaw, tongue and larynx. Edward Chang and colleagues have explored the neural basis of this precise motor control. Multi-electrode recordings in human sensorimotor cortex reveal that the region of the brain involved in speech is laid out according to a somatotopic representation of the face and vocal tract, with large populations of cells corresponding to specific phonetic features. Of particular interest is an additional laryngeal representation located at the dorsal-most end of the ventral sensorimotor cortex, apparently absent in non-human primates, that may be a feature developed uniquely for the specialized control of speech.
Journal Article
Human cortical encoding of pitch in tonal and non-tonal languages
by
Chang, Edward F.
,
Wu, Jinsong
,
Tang, Claire
in
631/378/116
,
631/378/2619/2618
,
631/378/2649/1723
2021
Languages can use a common repertoire of vocal sounds to signify distinct meanings. In tonal languages, such as Mandarin Chinese, pitch contours of syllables distinguish one word from another, whereas in non-tonal languages, such as English, pitch is used to convey intonation. The neural computations underlying language specialization in speech perception are unknown. Here, we use a cross-linguistic approach to address this. Native Mandarin- and English- speaking participants each listened to both Mandarin and English speech, while neural activity was directly recorded from the non-primary auditory cortex. Both groups show language-general coding of speaker-invariant pitch at the single electrode level. At the electrode population level, we find language-specific distribution of cortical tuning parameters in Mandarin speakers only, with enhanced sensitivity to Mandarin tone categories. Our results show that speech perception relies upon a shared cortical auditory feature processing mechanism, which may be tuned to the statistics of a given language.
Different languages rely on different vocal sounds to convey meaning. Here the authors show that language-general coding of pitch occurs in the non-primary auditory cortex for both tonal (Mandarin Chinese) and non-tonal (English) languages, with some language specificity on the population level.
Journal Article
Multi-day rhythms modulate seizure risk in epilepsy
by
Chang, Edward F.
,
King-Stephens, David
,
Kleen, Jonathan K.
in
631/1647/1453/1450
,
631/378/1689/178
,
9/26
2018
Epilepsy is defined by the seemingly random occurrence of spontaneous seizures. The ability to anticipate seizures would enable preventative treatment strategies. A central but unresolved question concerns the relationship of seizure timing to fluctuating rates of interictal epileptiform discharges (here termed interictal epileptiform activity, IEA), a marker of brain irritability observed between seizures by electroencephalography (EEG). Here, in 37 subjects with an implanted brain stimulation device that detects IEA and seizures over years, we find that IEA oscillates with circadian and subject-specific multidien (multi-day) periods. Multidien periodicities, most commonly 20–30 days in duration, are robust and relatively stable for up to 10 years in men and women. We show that seizures occur preferentially during the rising phase of multidien IEA rhythms. Combining phase information from circadian and multidien IEA rhythms provides a novel biomarker for determining relative seizure risk with a large effect size in most subjects.
The ability to identify periods of heightened seizure risk could enable new treatments for patients with epilepsy. Here, the authors describe long term EEG recordings from 37 patients which allow them to identify multi-day fluctuations in interictal activity.
Journal Article
Direct mapping of curve-crossing dynamics in IBr by attosecond transient absorption spectroscopy
by
Kobayashi, Yuki
,
Zeng, Tao
,
Neumark, Daniel M.
in
Absorption spectroscopy
,
Adiabatic
,
Attosecond pulses
2019
The electronic character of photoexcited molecules can abruptly change at avoided crossings and conical intersections. Here, we report direct mapping of the coupled interplay between electrons and nuclei in a prototype molecule, iodine monobromide (IBr), by using attosecond transient absorption spectroscopy. A few-femtosecond visible pulse resonantly excites the (B³ΠO+), Y(0⁺), and Z(0⁺) states of IBr, and the photodissociation dynamics are tracked with an attosecond extreme-ultraviolet pulse that simultaneously probes the I-4d and Br-3d core-level absorption edges. Direct comparison with quantum mechanical simulations unambiguously identifies the absorption features associated with adiabatic and diabatic channels at the B/Y avoided crossing and concurrent two-photon dissociation processes that involve the Y/Z avoided crossing. The results show clear evidence for rapid switching of valence-electronic character at the avoided crossing.
Journal Article
Major depression during and after the menopausal transition: Study of Women's Health Across the Nation (SWAN)
by
Cyranowski, J. M.
,
Kravitz, H. M.
,
Chang, Y.-F.
in
Adult
,
Adult and adolescent clinical studies
,
African Americans
2011
It is unclear whether risk for major depression during the menopausal transition or immediately thereafter is increased relative to pre-menopause. We aimed to examine whether the odds of experiencing major depression were greater when women were peri- or post-menopausal compared to when they were pre-menopausal, independent of a history of major depression at study entry and annual measures of vasomotor symptoms (VMS), serum levels of, or changes in, estradiol (E2), follicular stimulating hormone (FSH) or testosterone (T) and relevant confounders.
Participants included the 221 African American and Caucasian women, aged 42-52 years, who were pre-menopausal at entry into the Pittsburgh site of a community-based study of menopause, the Study of Women's Health Across the Nation (SWAN). We conducted the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID) to assess diagnoses of lifetime, annual and current major depression at baseline and at annual follow-ups. Psychosocial and health factors, and blood samples for assay of reproductive hormones, were obtained annually.
Women were two to four times more likely to experience a major depressive episode (MDE) when they were peri-menopausal or early post-menopausal. Repeated-measures logistic regression analyses showed that the effect of menopausal status was independent of history of major depression and annually measured upsetting life events, psychotropic medication use, VMS and serum levels of or changes in reproductive hormones. History of major depression was a strong predictor of major depression throughout the study.
The risk of major depression is greater for women during and immediately after the menopausal transition than when they are pre-menopausal.
Journal Article