Catalogue Search | MBRL

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

by Ivanko, Denis , Ryumin, Dmitry , Ryumina, Elena in Accuracy , Acoustics , audio-visual speech recognition

2023

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recognition. Hand gestures are a form of non-verbal communication and can be used as a very important part of modern human–computer interaction systems. Currently, audio and video modalities are easily accessible by sensors of mobile devices. However, there is no out-of-the-box solution for automatic audio-visual speech and gesture recognition. This study introduces two deep neural network-based model architectures: one for AVSR and one for gesture recognition. The main novelty regarding audio-visual speech recognition lies in fine-tuning strategies for both visual and acoustic features and in the proposed end-to-end model, which considers three modality fusion approaches: prediction-level, feature-level, and model-level. The main novelty in gesture recognition lies in a unique set of spatio-temporal features, including those that consider lip articulation information. As there are no available datasets for the combined task, we evaluated our methods on two different large-scale corpora—LRW and AUTSL—and outperformed existing methods on both audio-visual speech recognition and gesture recognition tasks. We achieved AVSR accuracy for the LRW dataset equal to 98.76% and gesture recognition rate for the AUTSL dataset equal to 98.56%. The results obtained demonstrate not only the high performance of the proposed methodology, but also the fundamental possibility of recognizing audio-visual speech and gestures by sensors of mobile devices.

Journal Article

Share this book

Add to My Shelf

Nicki Gottlieb Named COO at Content Kings

in Lipreading

2025

Newsletter

Share this book

Add to My Shelf

Content Kings appoints Gottlieb as COO

in Lipreading

2025

Newsletter

Share this book

Add to My Shelf

Brown Bob co-founder heads to Content Kings

in Lipreading

2025

Trade Publication Article

Share this book

Add to My Shelf

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

by Kuzdeuov, Askat , Lewis, Michael , Jarju, Sheikh in Cameras , computer vision , Data collection

2021

We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of each subject speaking approximately 100 imperative phrases. Data were collected from 142 subjects, yielding over 13,000 instances of synchronized data (∼3.8 TB). For technical validation, we demonstrate two baseline examples. The first baseline shows classification by gender, utilizing different combinations of the three data streams in both clean and noisy environments. The second example consists of thermal-to-visual facial image translation, as an instance of domain transfer.

Journal Article

Share this book

Add to My Shelf

Guardiola furious with Crystal Palace goalkeeper: lip-reading expert reveals what was said

in Lipreading

2025

Newsletter

Share this book

Add to My Shelf

From cursing Samuel Xavier to joking with Renato Gaúcho, lip-reading shows Neymar's return at Fluminense vs Santos

in Lipreading

2025

Newsletter

Share this book

Add to My Shelf

Lip-reading expert revealed what Brazil players said in Argentina's drubbing

in Lipreading

2025

Newsletter

Share this book

Add to My Shelf

Injured Auburn squad fails at comeback bid in loss to Norfolk State

in Lipreading

2024

Newsletter

Share this book

Add to My Shelf

Influence of surgical and N95 face masks on speech perception and listening effort in noise

by Rahne, Torsten , Wagner, Luise , Fröhlich, Laura in Acoustic attenuation , Acoustics , Analysis

2021

Daily-life conversation relies on speech perception in quiet and noise. Because of the COVID-19 pandemic, face masks have become mandatory in many situations. Acoustic attenuation of sound pressure by the mask tissue reduces speech perception ability, especially in noisy situations. Masks also can impede the process of speech comprehension by concealing the movements of the mouth, interfering with lip reading. In this prospective observational, cross-sectional study including 17 participants with normal hearing, we measured the influence of acoustic attenuation caused by medical face masks (mouth and nose protection) according to EN 14683 and of N95 masks according to EN 1149 (EN 14683) on the speech recognition threshold and listening effort in various types of background noise. Averaged over all noise signals, a surgical mask significantly reduced the speech perception threshold in noise was by 1.6 dB (95% confidence interval [CI], 1.0, 2.1) and an N95 mask reduced it significantly by 2.7 dB (95% CI, 2.2, 3.2). Use of a surgical mask did not significantly increase the 50% listening effort signal-to-noise ratio (increase of 0.58 dB; 95% CI, 0.4, 1.5), but use of an N95 mask did so significantly, by 2.2 dB (95% CI, 1.2, 3.1). In acoustic measures, mask tissue reduced amplitudes by up to 8 dB at frequencies above 1 kHz, whereas no reduction was observed below 1 kHz. We conclude that face masks reduce speech perception and increase listening effort in different noise signals. Together with additional interference because of impeded lip reading, the compound effect of face masks could have a relevant impact on daily life communication even in those with normal hearing.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter