Catalogue Search | MBRL

Audio source separation and speech enhancement

by Vincent, Emmanuel (Research scientist), editor , Virtanen, Tuomas, editor , Gannot, Sharon, editor in Speech processing systems. , Automatic speech recognition.

Book

Share this book

Add to My Shelf

Automatic speech recognition for under-resourced languages: A survey

by Tanja Schultz , Etienne Barnard , Alexey Karpov

2014

Speech processing for under-resourced languages is an active field of research, which has experienced significant progress during the past decade. We propose, in this paper, a survey that focuses on automatic speech recognition (ASR) for these languages. The definition of under-resourced languages and the challenges associated to them are first defined. The main part of the paper is a literature review of the recent (last 8 years) contributions made in ASR for under-resourced languages. Examples of past projects and future trends when dealing with under-resourced languages are also presented. We believe that this paper will be a good starting point for anyone interested to initiate research in (or operational development of) ASR for one or several under-resourced languages. It should be clear, however, that many of the issues and approaches presented here, apply to speech technology in general (text-to-speech synthesis for instance).

Journal Article

Share this book

Add to My Shelf

Speech perception and spoken word recognition

by Gaskell, M. Gareth, editor , Mirkovic, Jelena, editor in Speech perception. , Word recognition. , Psycholinguistics.

Book

Share this book

Add to My Shelf

Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

by Khujayarov, Ilyos , Cho, Jinsoo , Djuraev, Oybek in Acknowledgment , Acoustics , Attention

2022

Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.

Journal Article

Share this book

Add to My Shelf

Analyzing emotion in spontaneous speech

by Chakraborty, Rupayan, author , Pandharipande, Meghna, author , Kopparapu, Sunil Kumar, author in Automatic speech recognition. , Human-computer interaction.

Book

Share this book

Add to My Shelf

Automatized analysis of children’s exposure to child-directed speech in reschool settings: Validation and application

by Justice, Laura M. , Chaparro-Moreno, Leidy Johana , Lin, Tzu-Jung in Acknowledgment , Adult , Adults

2020

The present study explored whether a tool for automatic detection and recognition of interactions and child-directed speech (CDS) in preschool classrooms could be developed, validated, and applied to non-coded video recordings representing children’s classroom experiences. Using first-person video recordings collected by 13 preschool children during a morning in their classrooms, we extracted high-level audiovisual features from recordings using automatic speech recognition and computer vision services from a cloud computing provider. Using manual coding for interactions and transcriptions of CDS as reference, we trained and tested supervised classifiers and linear mappings to measure five variables of interest. We show that the supervised classifiers trained with speech activity, proximity, and high-level facial features achieve adequate accuracy in detecting interactions. Furthermore, in combination with an automatic speech recognition service, the supervised classifier achieved error rates for CDS measures that are in line with other open-source automatic decoding tools in early childhood settings. Finally, we demonstrate our tool’s applicability by using it to automatically code and transcribe children’s interactions and CDS exposure vertically within a classroom day (morning to afternoon) and horizontally over time (fall to winter). Developing and scaling tools for automatized capture of children’s interactions with others in the preschool classroom, as well as exposure to CDS, may revolutionize scientific efforts to identify precise mechanisms that foster young children’s language development.

Journal Article

Share this book

Add to My Shelf

Dragon NaturallySpeaking for dummies

by Diamond, Stephanie, author in Dragon NaturallySpeaking. , Speech processing systems. , Speech processing systems Computer programs.

Command your computer, surf the web, create reports, and more-- with your voice! Dragon NaturallySpeaking is a speech recognition program that lets users dictate into any Windows application, allowing you to access documents, write e-mails, and even update Facebook using only your voice.

Book

Share this book

Add to My Shelf

Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems

by Basak, Sneha , Gite, Shilpa , Pradhan, Biswajeet in Algorithms , Automatic speech recognition , Emotion recognition

2023

Speech recognition systems have become a unique human-computer interaction (HCI) family. Speech is one of the most naturally developed human abilities; speech signal processing opens up a transparent and hand-free computation experience. This paper aims to present a retrospective yet modern approach to the world of speech recognition systems. The development journey of ASR (Automatic Speech Recognition) has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper. A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented, along with a brief discussion of various modern-day developments and applications in this domain. This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing. Since speech recognition has a vast potential in various industries like telecommunication, emotion recognition, healthcare, etc., this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution.

Journal Article

Share this book

Add to My Shelf

Voice user interface design : moving from GUI to mixed modal interaction

by Dasgupta, Ritwik, author in Automatic speech recognition. , Speech processing systems. , Human-computer interaction.

Book

Share this book

Add to My Shelf

Brain-to-text: decoding spoken phrases from phone representations in the brain

by Heger, Dominic , de Pesters, Adriana , Schalk, Gerwin in Brain , Brain activity , Brain research

2015

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter