Catalogue Search | MBRL

Generative Spoken Dialogue Language Modeling

by Nguyen, Tu Anh , Kharitonov, Eugene , Tomasello, Paden in Computation and Language , Computer Science , Conversation

2023

We introduce dGSLM, the first “textless” model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter, and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn taking compared to a text-based cascaded model. ,

Journal Article

Share this book

Add to My Shelf

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

by Nguyen, Tu Anh , Dupoux, Emmanuel , Hung-yi, Lee in Performance degradation , Self-supervised learning , Speech processing

2024

Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TTS) system with limited resources using SSL features and generate a large synthetic corpus for pre-training. Experimental results demonstrate that our proposed approach effectively reduces the demand for speech data by 90% with only slight performance degradation. To the best of our knowledge, this is the first work aiming to enhance low-resource self-supervised learning in speech processing.

Paper

Share this book

Add to My Shelf

Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing

by Livshits, Aleksandr , Zotov, Alexander , Desai, Shrey in Coders , Modules , Semantics

2022

Task-oriented semantic parsing models have achieved strong results in recent years, but unfortunately do not strike an appealing balance between model size, runtime latency, and cross-domain generalizability. We tackle this problem by introducing scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's \"scenario\" (an intent-slot template with variable leaf spans) before generating its frame, complete with ontology and utterance tokens. This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules, also optimizing for the axes outlined above. Concretely, we create a Retrieve-and-Fill (RAF) architecture comprised of (1) a retrieval module which ranks the best scenario given an utterance and (2) a filling module which imputes spans into the scenario to create the frame. Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios. RAF achieves strong results in high-resource, low-resource, and multilingual settings, outperforming recent approaches by wide margins despite, using base pre-trained encoders, small sequence lengths, and parallel decoding.

Paper

Share this book

Add to My Shelf

Generative Spoken Dialogue Language Modeling

by Paden Tomasello , Nguyen, Tu Anh , Kharitonov, Eugene

2022

We introduce dGSLM, the first \"textless\" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn-taking compared to a text-based cascaded model.

Paper

Share this book

Add to My Shelf

Scaling Speech Technology to 1,000+ Languages

by Bowen, Shi , Vyas, Apoorv , Kundu, Sayani in Automatic speech recognition , Information dissemination , Languages

2023

Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000 languages spoken around the world. The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task. The main ingredients are a new dataset based on readings of publicly available religious texts and effectively leveraging self-supervised learning. We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, as well as a language identification model for 4,017 languages. Experiments show that our multilingual speech recognition model more than halves the word error rate of Whisper on 54 languages of the FLEURS benchmark while being trained on a small fraction of the labeled data.

Paper

Share this book

Add to My Shelf

textless-lib: a Library for Textless Spoken Language Processing

by Paden Tomasello , Kharitonov, Eugene , Nguyen, Tu Anh in Libraries , Natural language processing , Speech

2022

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library aimed to facilitate research in this research area. We describe the building blocks that the library provides and demonstrate its usability by discuss three different use-case examples: (i) speaker probing, (ii) speech resynthesis and compression, and (iii) speech continuation. We believe that textless-lib substantially simplifies research the textless setting and will be handful not only for speech researchers but also for the NLP community at large. The code, documentation, and pre-trained models are available at https://github.com/facebookresearch/textlesslib/ .

Paper

Share this book

Add to My Shelf

STOP: A dataset for Spoken Task Oriented Semantic Parsing

by Zettlemoyer, Luke , Dupoux, Emmanuel , Tu Ahn Nguyen in Audio data , Automatic speech recognition , Benchmarks

2022

End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems. Initial experimentation show end-to-end SLU models performing slightly worse than their cascaded counterparts, which we hope encourages future work in this direction.

Paper

Share this book

Add to My Shelf

Evaluation of Motor & Cognitive Milestones in Preterm and Full Term Neonates With Hyperbilirubinemia

by Younis, Noheir Abdelhady , Aziz, Samia Samy , Elkahky, Ahmed in الأطفال الرضع , المواليد الخدج , فرط بيليروبين الدم

2020

Background: Hyperbilirubinemia is the most common cause of neonatal admission during first week of life, so it should be considered to follow up its hazards on development. Objective: To detect the possible occurrence of motor & mental delay in infants as a complication of neonatal hyperbilirubinemia. Subjects and method: A prospective longitudinal case control study was done by using Bailey scale III to evaluate and follow up motor & mental developmental parameters in 2 groups, cases group (group I) which is subdivided into group (la) that includes 55 jaundiced full- term neonates and group (lb) that represent 54 jaundiced preterm neonates, these 109 (preterm & full- term) jaundiced neonates were admitted in Neonatal Intensive Care Unit, New Cairo Hospital, and control group (group II) which includes 52 non jaundiced full term neonates attending Health Center of The Ministry of Health in New Cairo. Results: In the present study cases had lower Bailey III mental assessment scores compared to controls as there was a statistically significant difference between both groups, as cases group had a significant negative correlation between Total Serum Bilirubin level and Bailey scale scores for mental assessment at 2an,4th, 6th, 9th 12th months was proved. Despite the previous results, there was no statistically significant difference between both groups as regard mental assessment at the age of 18 months. The study also showed a significant difference between both groups (cases & control), as regard a row scores of motor scales of Bailey, as cases group achieved lower scores in motor assessment in comparison to control group. There is a statistically significant negative correlation between Total Serum Bilirubin and Bailey scores for motor assessment at 2an,4th, 6th, 9th 12th & 18th Months. These findings depicted that there was a significant relationship between neonatal hyperbilirubinemia and further developmental delays (motor and mental) in infancy (P < 0.05). Conclusion: Neonatal jaundice should be considered and followed up for motor and mental skills during infancy, as identification of early developmental delay can be effective in preventing susceptible developmental problems later on through interventional programs.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter