Catalogue Search | MBRL

Speech synthesis from neural decoding of spoken sentences

by Chang, Edward F. , Anumanchipalli, Gopala K. , Chartier, Josh in 631/378/2629 , 631/378/2632/2634 , 9/30

2019

Technology that translates neural activity into speech would be transformative for people who are unable to communicate as a result of neurological impairments. Decoding speech from neural activity is challenging because speaking requires very precise and rapid multi-dimensional control of vocal tract articulators. Here we designed a neural decoder that explicitly leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech. Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. Intermediate articulatory dynamics enhanced performance even with limited data. Decoded articulatory representations were highly conserved across speakers, enabling a component of the decoder to be transferrable across participants. Furthermore, the decoder could synthesize speech when a participant silently mimed sentences. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication. A neural decoder uses kinematic and sound representations encoded in human cortical activity to synthesize audible sentences, which are readily identified and transcribed by listeners.

Journal Article

Share this book

Add to My Shelf

Phonetic Feature Encoding in Human Superior Temporal Gyrus

by Chang, Edward F. , Mesgarani, Nima , Cheung, Connie in Acoustic spectra , Acoustics , Auditory Cortex - anatomy & histology

2014

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.

Journal Article

Share this book

Add to My Shelf

Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

by Herrero, Jose , Mehta, Ashesh D. , O’Sullivan, James in Accuracy , Acoustic Stimulation , Acoustics

2020

•Multi-talker speech perception is challenging for people with hearing loss.•Automatic speech separation cannot help without first identifying the target speaker.•We used the brain signal of listeners to jointly identify and extract target speech.•This method eliminates the need for separating sound sources or knowing their number.•We show the efficacy of this method in both normal and hearing impaired subjects. [Display omitted] Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of “neuro-steered” hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)11BISS: brain-informed speech separation. in which the information about the attended speech, as decoded from the subject’s brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices.

Journal Article

Share this book

Add to My Shelf

A Method for Measuring the Pitch Frequency of Speech Signals for the Systems of Acoustic Speech Analysis

by Savchenko, A V , Savchenko, V V in Acoustic noise , Background noise , Immunity

2019

We developed a new method for measuring the pitch frequency of speech signals with elevated noise immunity. The problem of protection against intense background noise is solved in this method by the frequency selection of vocalized segments of speech signals according to a scheme with comb filter of interperiodic accumulation. The efficiency of the method is analyzed both theoretically and experimentally with the help of a multichannel frequency meter intended for the acoustic speech analysis. It is shown that, for a signal-to-noise ratio of 10 dB and higher, the error of the method does not exceed 2%.

Journal Article

Share this book

Add to My Shelf

Effect of voicing and articulation manner on aerosol particle emission during human speech

by Barreda, Santiago , Asadi, Sima , Wexler, Anthony S. in Adolescent , Adult , Aerosol particles

2020

Previously, we demonstrated a strong correlation between the amplitude of human speech and the emission rate of micron-scale expiratory aerosol particles, which are believed to play a role in respiratory disease transmission. To further those findings, here we systematically investigate the effect of different 'phones' (the basic sound units of speech) on the emission of particles from the human respiratory tract during speech. We measured the respiratory particle emission rates of 56 healthy human volunteers voicing specific phones, both in isolation and in the context of a standard spoken text. We found that certain phones are associated with significantly higher particle production; for example, the vowel /i/ (\"need,\" \"sea\") produces more particles than /ɑ/ (\"saw,\" \"hot\") or /u/ (\"blue,\" \"mood\"), while disyllabic words including voiced plosive consonants (e.g., /d/, /b/, /g/) yield more particles than words with voiceless fricatives (e.g., /s/, /h/, /f/). These trends for discrete phones and words were corroborated by the time-resolved particle emission rates as volunteers read aloud from a standard text passage that incorporates a broad range of the phones present in spoken English. Our measurements showed that particle emission rates were positively correlated with the vowel content of a phrase; conversely, particle emission decreased during phrases with a high fraction of voiceless fricatives. Our particle emission data is broadly consistent with prior measurements of the egressive airflow rate associated with the vocalization of various phones that differ in voicing and articulation. These results suggest that airborne transmission of respiratory pathogens via speech aerosol particles could be modulated by specific phonetic characteristics of the language spoken by a given human population, along with other, more frequently considered epidemiological variables.

Journal Article

Share this book

Add to My Shelf

Vowel Acoustics in Parkinson's Disease and Multiple Sclerosis: Comparison of Clear, Loud, and Slow Speaking Conditions

by Tjaden, Kris , Lam, Jennifer , Wilding, Greg in Acoustics , Aged , Amplitude (Acoustics)

2013

Purpose: The impact of clear speech, increased vocal intensity, and rate reduction on acoustic characteristics of vowels was compared in speakers with Parkinson's disease (PD), speakers with multiple sclerosis (MS), and healthy controls. Method: Speakers read sentences in habitual, clear, loud, and slow conditions. Variations in clarity, intensity, and rate were stimulated using magnitude production. Formant frequency values for peripheral and nonperipheral vowels were obtained at 20%, 50%, and 80% of vowel duration to derive static and dynamic acoustic measures. Intensity and duration measures were obtained. Results: Rate was maximally reduced in the slow condition, and vocal intensity was maximized in the loud condition. The clear condition also yielded a reduced articulatory rate and increased intensity, although less than for the slow or loud conditions. Overall, the clear condition had the most consistent impact on vowel spectral characteristics. Spectral and temporal distinctiveness for peripheral-nonperipheral vowel pairs was largely similar across conditions. Conclusions: Clear speech maximized peripheral and nonperipheral vowel space areas for speakers with PD and MS while also reducing rate and increasing vocal intensity. These results suggest that a speech style focused on increasing articulatory amplitude yields the most robust changes in vowel segmental articulation.

Journal Article

Share this book

Add to My Shelf

Speaker-normalized sound representations in the human auditory cortex

by Chang, Edward F. , Sjerps, Matthias J. , Fox, Neal P. in 631/378/2619/2618 , 631/378/2649 , 9/30

2019

The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers. Our perception of a speech sound tends to remain stable despite variation in people’s vocal characteristics. Here, by measuring neural activity as people listened to speech from different voices, the authors provide evidence for speaker normalization processes in the human auditory cortex.

Journal Article

Share this book

Add to My Shelf

Emotional tones of voice affect the acoustics and perception of Mandarin tones

by Lee, Chao-Yang , Young, Shuenn-Tsong , Chang, Hui-Shan in Acknowledgment , Acoustic analysis , Acoustic properties

2023

Lexical tones and emotions are conveyed by a similar set of acoustic parameters; therefore, listeners of tonal languages face the challenge of processing lexical tones and emotions in the acoustic signal concurrently. This study examined how emotions affect the acoustics and perception of Mandarin tones. In Experiment 1, Mandarin tones were produced by professional actors with angry, fear, happy, sad, and neutral tones of voice. Acoustic analyses on mean F0, F0 range, mean amplitude, and duration were conducted on syllables excised from a carrier phrase. The results showed that emotions affect Mandarin tone acoustics to different degrees depending on specific Mandarin tones and specific emotions. In Experiment 2, selected syllables from Experiment 1 were presented in isolation or in context. Listeners were asked to identify the Mandarin tones and emotions of the syllables. The results showed that emotions affect Mandarin tone identification to a greater extent than Mandarin tones affect emotion recognition. Both Mandarin tones and emotions were identified more accurately in syllables presented with the carrier phrase, but the carrier phrase affected Mandarin tone identification and emotion recognition to different degrees. These findings suggest that lexical tones and emotions interact in complex but systematic ways.

Journal Article

Share this book

Add to My Shelf

Adaptation of the human auditory cortex to changing background noise

by Khalighinejad, Bahar , Mesgarani, Nima , Mehta, Ashesh D. in 631/378/2619 , 631/378/2619/2618 , 9/26

2019

Speech communication in real-world environments requires adaptation to changing acoustic conditions. How the human auditory cortex adapts as a new noise source appears in or disappears from the acoustic scene remain unclear. Here, we directly measured neural activity in the auditory cortex of six human subjects as they listened to speech with abruptly changing background noises. We report rapid and selective suppression of acoustic features of noise in the neural responses. This suppression results in enhanced representation and perception of speech acoustic features. The degree of adaptation to different background noises varies across neural sites and is predictable from the tuning properties and speech specificity of the sites. Moreover, adaptation to background noise is unaffected by the attentional focus of the listener. The convergence of these neural and perceptual effects reveals the intrinsic dynamic mechanisms that enable a listener to filter out irrelevant sound sources in a changing acoustic scene. How does the auditory system allow for accurate speech perception against changes in background noise? Here, using neural activity in the auditory cortex as people listen to speech, the authors provide evidence that background noise is selectively suppressed to enhance representations of speech.

Journal Article

Share this book

Add to My Shelf

Impact of Clear, Loud, and Slow Speech on Scaled Intelligibility and Speech Severity in Parkinson's Disease and Multiple Sclerosis

by Tjaden, Kris , Sussman, Joan E. , Wilding, Gregory E. in Adult , Aged , Amplitude (Acoustics)

2014

Purpose: The perceptual consequences of rate reduction, increased vocal intensity, and clear speech were studied in speakers with multiple sclerosis (MS), Parkinson's disease (PD), and healthy controls. Method: Seventy-eight speakers read sentences in habitual, clear, loud, and slow conditions. Sentences were equated for peak amplitude and mixed with multitalker babble for presentation to listeners. Using a computerized visual analog scale, listeners judged intelligibility or speech severity as operationally defined in Sussman and Tjaden (2012) . Results: Loud and clear but not slow conditions improved intelligibility relative to the habitual condition. With the exception of the loud condition for the PD group, speech severity did not improve above habitual and was reduced relative to habitual in some instances. Intelligibility and speech severity were strongly related, but relationships for disordered speakers were weaker in clear and slow conditions versus habitual. Conclusions: Both clear and loud speech show promise for improving intelligibility and maintaining or improving speech severity in multitalker babble for speakers with mild dysarthria secondary to MS or PD, at least as these perceptual constructs were defined and measured in this study. Although scaled intelligibility and speech severity overlap, the metrics further appear to have some separate value in documenting treatment-related speech changes.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter