Catalogue Search | MBRL

The Natural Statistics of Audiovisual Speech

by Ghazanfar, Asif A. , Chandrasekaran, Chandramouli , Trubanova, Andrea in Brain , Colleges & universities , Computer Science

2009

Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.

Journal Article

Share this book

Add to My Shelf

Time- and Resource-Efficient Time-to-Collision Forecasting for Indoor Pedestrian Obstacles Avoidance

by Caplier, Alice , Urban, David in Algorithms , Artificial neural networks , Autonomous navigation

2021

As difficult vision-based tasks like object detection and monocular depth estimation are making their way in real-time applications and as more light weighted solutions for autonomous vehicles navigation systems are emerging, obstacle detection and collision prediction are two very challenging tasks for small embedded devices like drones. We propose a novel light weighted and time-efficient vision-based solution to predict Time-to-Collision from a monocular video camera embedded in a smartglasses device as a module of a navigation system for visually impaired pedestrians. It consists of two modules: a static data extractor made of a convolutional neural network to predict the obstacle position and distance and a dynamic data extractor that stacks the obstacle data from multiple frames and predicts the Time-to-Collision with a simple fully connected neural network. This paper focuses on the Time-to-Collision network’s ability to adapt to new sceneries with different types of obstacles with supervised learning.

Journal Article

Share this book

Add to My Shelf

Computational Analysis of Correlations between Image Aesthetic and Image Naturalness in the Relation with Image Quality

by Ladret, Patricia , Nguyen, Huu-Tuan , Caplier, Alice in Aesthetics , Algorithms , Context

2022

The main purpose of this paper is the study of the correlations between Image Aesthetic (IA) and Image Naturalness (IN) and the analysis of the influence of IA and IN on Image Quality (IQ) in different contexts. The first contribution is a study about the potential relationships between IA and IN. For that study, two sub-questions are considered. The first one is to validate the idea that IA and IN are not correlated to each other. The second one is about the influence of IA and IN features on Image Naturalness Assessment (INA) and Image Aesthetic Assessment (IAA), respectively. Secondly, it is obvious that IQ is related to IA and IN, but the exact influence of IA and IN on IQ has not been evaluated. Besides that, the context impact on those influences has not been clarified, so the second contribution is to investigate the influence of IA and IN on IQ in different contexts. The results obtained from rigorous experiments prove that although there are moderate and weak correlations between IA and IN, they are still two different components of IQ. It also appears that viewers’ IQ perception is affected by some contextual factors, and the influence of IA and IN on IQ depends on the considered context.

Journal Article

Share this book

Add to My Shelf

Image Aesthetic Assessment Based on Image Classification and Region Segmentation

by Ladret, Patricia , Nguyen, Huu-Tuan , Caplier, Alice in Classification , close-up image , color saliency map

2020

The main goal of this paper is to study Image Aesthetic Assessment (IAA) indicating images as high or low aesthetic. The main contributions concern three points. Firstly, following the idea that photos in different categories (human, flower, animal, landscape, …) are taken with different photographic rules, image aesthetic should be evaluated in a different way for each image category. Large field images and close-up images are two typical categories of images with opposite photographic rules so we want to investigate the intuition that prior Large field/Close-up Image Classification (LCIC) might improve the performance of IAA. Secondly, when a viewer looks at a photo, some regions receive more attention than other regions. Those regions are defined as Regions Of Interest (ROI) and it might be worthy to identify those regions before IAA. The question “Is it worthy to extract some ROIs before IAA?” is considered by studying Region Of Interest Extraction (ROIE) before investigating IAA based on each feature set (global image features, ROI features and background features). Based on the answers, a new IAA model is proposed. The last point is about a comparison between the efficiency of handcrafted and learned features for the purpose of IAA.

Journal Article

Share this book

Add to My Shelf

Lip contour segmentation and tracking compliant with lip-reading application constraints

by Girondel, Vincent , Caplier, Alice , Stillittano, Sébastien in Algorithms , Applied sciences , Artificial intelligence

2013

We propose to use both active contours and parametric models for lip contour extraction and tracking. In the first image, jumping snakes are used to detect outer and inner contour key points. These points initialize a lip parametric model composed of several cubic curves that are appropriate to the mouth deformations. According to a combined luminance and chrominance gradient, the initial model is optimized and precisely locked onto the lip contours. On subsequent images, the segmentation is based on the mouth bounding box and key point tracking. Quantitative and qualitative evaluations show the effectiveness of the algorithm for lip-reading applications.

Journal Article

Share this book

Add to My Shelf

Cued Speech Gesture Recognition: A First Prototype Based on Early Reduction

by Caplier, Alice , Burger, Thomas , Perret, Pascal in Biometrics , Computer Science , Engineering

2007

Cued Speech is a specific linguistic code for hearing-impaired people. It is based on both lip reading and manual gestures. In the context of THIMP (Telephony for the Hearing-IMpaired Project), we work on automatic cued speech translation. In this paper, we only address the problem of automatic cued speech manual gesture recognition. Such a gesture recognition issue is really common from a theoretical point of view, but we approach it with respect to its particularities in order to derive an original method. This method is essentially built around a bioinspired method called early reduction . Prior to a complete analysis of each image of a sequence, the early reduction process automatically extracts a restricted number of key images which summarize the whole sequence. Only the key images are studied from a temporal point of view with lighter computation than the complete sequence.

Journal Article

Share this book

Add to My Shelf

Using retina modelling to characterize blinking: comparison between EOG and video analysis

by Charbonnier, Sylvie , Picot, Antoine , Vu, Ngoc-Son in Blinking , Communications Engineering , Computer Science

2012

A large number of car crashes are caused by drowsiness every year. The analysis of eye blinks provides reliable information about drowsiness. This paper proposes to study the relation between electrooculogram (EOG) and video analysis for blink detection and characterization. An original method to detect and characterize blinks from a video analysis is presented here. The method uses different filters based on the human retina modelling. A illumination robust filter is first used to normalize illumination variations of the video input. Then, Outer and an Inner Plexiform Layer filters are used to extract energy signals from eye area. The eye detection is processed mixing gradient and projection methods which makes it able to detect even closed eyes. The illumination robust filter makes it possible to detect the eyes even in night conditions, without using external lighting. The video analysis extracts two signals from the eye area measuring the quantity of static edges and moving edges. Blinks are then detected and characterized from these two signals. A comparison between the features extracted from the EOG and the same features extracted from the video analysis is then performed on a database of 14 different people. This study shows that some blink features extracted from the video are highly correlated with their EOG equivalent: the duration, the duration at 50%, the frequency, the percentage of eye closure at 80% and the amplitude velocity ratio. The frame rate influence on the accuracy of the different features extracted is also studied and enlightens on the need of a high frame rate camera to detect and characterize accurately blinks from a video analysis.

Journal Article

Share this book

Add to My Shelf

Image and Video for Hearing Impaired People

by Beautemps, Denis , Akarun, Lale , Aran, Oya in Biometrics , Computer Science , Computer Vision and Pattern Recognition

2007

We present a global overview of image- and video-processing-based methods to help the communication of hearing impaired people. Two directions of communication have to be considered: from a hearing person to a hearing impaired person and vice versa. In this paper, firstly, we describe sign language (SL) and the cued speech (CS) language which are two different languages used by the deaf community. Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people. Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.

Journal Article

Share this book

Add to My Shelf