Catalogue Search | MBRL

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

by Callejas, Zoraida , Fernández-Martínez, Fernando , Kleinlein, Ricardo in Accuracy , audio–visual emotion recognition , computational paralinguistics

2021

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

Journal Article

Share this book

Add to My Shelf

Emotion Recognition Using Different Sensors, Emotion Models, Methods and Datasets: A Comprehensive Review

by Li, Jinsong , Li, Xingguang , Cai, Yujian in Algorithms , Autism , Blood

2023

In recent years, the rapid development of sensors and information technology has made it possible for machines to recognize and analyze human emotions. Emotion recognition is an important research direction in various fields. Human emotions have many manifestations. Therefore, emotion recognition can be realized by analyzing facial expressions, speech, behavior, or physiological signals. These signals are collected by different sensors. Correct recognition of human emotions can promote the development of affective computing. Most existing emotion recognition surveys only focus on a single sensor. Therefore, it is more important to compare different sensors or unimodality and multimodality. In this survey, we collect and review more than 200 papers on emotion recognition by literature research methods. We categorize these papers according to different innovations. These articles mainly focus on the methods and datasets used for emotion recognition with different sensors. This survey also provides application examples and developments in emotion recognition. Furthermore, this survey compares the advantages and disadvantages of different sensors for emotion recognition. The proposed survey can help researchers gain a better understanding of existing emotion recognition systems, thus facilitating the selection of suitable sensors, algorithms, and datasets.

Journal Article

Share this book

Add to My Shelf

Multi-modal emotion recognition using EEG and speech signals

by Yang, Yan , Zhang, Xiaolei , Wang, Qian in Algorithms , Anechoic chambers , Artificial neural networks

2022

Automatic Emotion Recognition (AER) is critical for naturalistic Human–Machine Interactions (HMI). Emotions can be detected through both external behaviors, e.g., tone of voice and internal physiological signals, e.g., electroencephalogram (EEG). In this paper, we first constructed a multi-modal emotion database, named Multi-modal Emotion Database with four modalities (MED4). MED4 consists of synchronously recorded signals of participants’ EEG, photoplethysmography, speech and facial images when they were influenced by video stimuli designed to induce happy, sad, angry and neutral emotions. The experiment was performed with 32 participants in two environment conditions, a research lab with natural noises and an anechoic chamber. Four baseline algorithms were developed to verify the database and the performances of AER methods, Identification-vector + Probabilistic Linear Discriminant Analysis (I-vector + PLDA), Temporal Convolutional Network (TCN), Extreme Learning Machine (ELM) and Multi-Layer Perception Network (MLP). Furthermore, two fusion strategies on feature-level and decision-level respectively were designed to utilize both external and internal information of human status. The results showed that EEG signals generate higher accuracy in emotion recognition than that of speech signals (achieving 88.92% in anechoic room and 89.70% in natural noisy room vs 64.67% and 58.92% respectively). Fusion strategies that combine speech and EEG signals can improve overall accuracy of emotion recognition by 25.92% when compared to speech and 1.67% when compared to EEG in anechoic room and 31.74% and 0.96% in natural noisy room. Fusion methods also enhance the robustness of AER in the noisy environment. The MED4 database will be made publicly available, in order to encourage researchers all over the world to develop and validate various advanced methods for AER. •A database, MED4, with multi-modality and multi-environment, was constructed under controlled conditions for four types of emotions.•EEG data with variable-length were tested and shown to be more efficient in emotion recognition.•Compared to speech signals, EEG signals contain more emotional information and are more robust to noise.•The decision-level fusion method of WAVER outperforms the other methods in both accuracy and noise-robustness for emotion recognition.

Journal Article

Share this book

Add to My Shelf

A survey of state-of-the-art approaches for emotion recognition in text

by Menai Mohamed El Bachir , Alswaidan Nourah in Data mining , Distance learning , Emotion recognition

2020

Emotion recognition in text is an important natural language processing (NLP) task whose solution can benefit several applications in different fields, including data mining, e-learning, information filtering systems, human–computer interaction, and psychology. Explicit emotion recognition in text is the most addressed problem in the literature. The solution to this problem is mainly based on identifying keywords. Implicit emotion recognition is the most challenging problem to solve because such emotion is typically hidden within the text, and thus, its solution requires an understanding of the context. There are four main approaches for implicit emotion recognition in text: rule-based approaches, classical learning-based approaches, deep learning approaches, and hybrid approaches. In this paper, we critically survey the state-of-the-art research for explicit and implicit emotion recognition in text. We present the different approaches found in the literature, detail their main features, discuss their advantages and limitations, and compare them within tables. This study shows that hybrid approaches and learning-based approaches that utilize traditional text representation with distributed word representation outperform the other approaches on benchmark corpora. This paper also identifies the sets of features that lead to the best-performing approaches; highlights the impacts of simple NLP tasks, such as part-of-speech tagging and parsing, on the performances of these approaches; and indicates some open problems.

Journal Article

Share this book

Add to My Shelf

A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face

by Tang, Chuangao , Zong, Yuan , Li, Sunan in Algorithms , Data collection , Datasets

2023

Multimodal emotion recognition (MER) refers to the identification and understanding of human emotional states by combining different signals, including—but not limited to—text, speech, and face cues. MER plays a crucial role in the human–computer interaction (HCI) domain. With the recent progression of deep learning technologies and the increasing availability of multimodal datasets, the MER domain has witnessed considerable development, resulting in numerous significant research breakthroughs. However, a conspicuous absence of thorough and focused reviews on these deep learning-based MER achievements is observed. This survey aims to bridge this gap by providing a comprehensive overview of the recent advancements in MER based on deep learning. For an orderly exposition, this paper first outlines a meticulous analysis of the current multimodal datasets, emphasizing their advantages and constraints. Subsequently, we thoroughly scrutinize diverse methods for multimodal emotional feature extraction, highlighting the merits and demerits of each method. Moreover, we perform an exhaustive analysis of various MER algorithms, with particular focus on the model-agnostic fusion methods (including early fusion, late fusion, and hybrid fusion) and fusion based on intermediate layers of deep models (encompassing simple concatenation fusion, utterance-level interaction fusion, and fine-grained interaction fusion). We assess the strengths and weaknesses of these fusion strategies, providing guidance to researchers to help them select the most suitable techniques for their studies. In summary, this survey aims to provide a thorough and insightful review of the field of deep learning-based MER. It is intended as a valuable guide to aid researchers in furthering the evolution of this dynamic and impactful field.

Journal Article

Share this book

Add to My Shelf

Emotion recognition for human–computer interaction using high-level descriptors

by Gared, Fikreselam , Singh, Sukhdev , Sharma, Preeti in 639/166 , 692/700 , Algorithms

2024

Recent research has focused extensively on employing Deep Learning (DL) techniques, particularly Convolutional Neural Networks (CNN), for Speech Emotion Recognition (SER). This study addresses the burgeoning interest in leveraging DL for SER, specifically focusing on Punjabi language speakers. The paper presents a novel approach to constructing and preprocessing a labeled speech corpus using diverse social media sources. By utilizing spectrograms as the primary feature representation, the proposed algorithm effectively learns discriminative patterns for emotion recognition. The method is evaluated on a custom dataset derived from various Punjabi media sources, including films and web series. Results demonstrate that the proposed approach achieves an accuracy of 69%, surpassing traditional methods like decision trees, Naïve Bayes, and random forests, which achieved accuracies of 49%, 52%, and 61% respectively. Thus, the proposed method improves accuracy in recognizing emotions from Punjabi speech signals.

Journal Article

Share this book

Add to My Shelf

Facial Emotion Recognition Using Conventional Machine Learning and Deep Learning Methods: Current Achievements, Analysis and Remaining Challenges

by Khan, Amjad Rehman in Automation , Classification , Communication

2022

Facial emotion recognition (FER) is an emerging and significant research area in the pattern recognition domain. In daily life, the role of non-verbal communication is significant, and in overall communication, its involvement is around 55% to 93%. Facial emotion analysis is efficiently used in surveillance videos, expression analysis, gesture recognition, smart homes, computer games, depression treatment, patient monitoring, anxiety, detecting lies, psychoanalysis, paralinguistic communication, detecting operator fatigue and robotics. In this paper, we present a detailed review on FER. The literature is collected from different reputable research published during the current decade. This review is based on conventional machine learning (ML) and various deep learning (DL) approaches. Further, different FER datasets for evaluation metrics that are publicly available are discussed and compared with benchmark results. This paper provides a holistic review of FER using traditional ML and DL methods to highlight the future gap in this domain for new researchers. Finally, this review work is a guidebook and very helpful for young researchers in the FER area, providing a general understating and basic knowledge of the current state-of-the-art methods, and to experienced researchers looking for productive directions for future work.

Journal Article

Share this book

Add to My Shelf

A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism

by Lieskovská, Eva , Jakubec, Maroš , Jarina, Roman in Accuracy , Aircraft , Artificial neural networks

2021

Emotions are an integral part of human interactions and are significant factors in determining user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role in the development of human–computer interaction (HCI) applications. A tremendous number of SER systems have been developed over the last decades. Attention-based deep neural networks (DNNs) have been shown as suitable tools for mining information that is unevenly time distributed in multimedia content. The attention mechanism has been recently incorporated in DNN architectures to emphasise also emotional salient information. This paper provides a review of the recent development in SER and also examines the impact of various attention mechanisms on SER performance. Overall comparison of the system accuracies is performed on a widely used IEMOCAP benchmark database.

Journal Article

Share this book

Add to My Shelf

How Can Research on Artificial Empathy Be Enhanced by Applying Deepfakes?

by Huang, Chih-Wei , Li, Yu-Chuan Jack , Rahmanti, Annisa Ristya in Acknowledgment , Artificial , Artificial intelligence

2022

We propose the idea of using an open data set of doctor-patient interactions to develop artificial empathy based on facial emotion recognition. Facial emotion recognition allows a doctor to analyze patients' emotions, so that they can reach out to their patients through empathic care. However, face recognition data sets are often difficult to acquire; many researchers struggle with small samples of face recognition data sets. Further, sharing medical images or videos has not been possible, as this approach may violate patient privacy. The use of deepfake technology is a promising approach to deidentifying video recordings of patients’ clinical encounters. Such technology can revolutionize the implementation of facial emotion recognition by replacing a patient's face in an image or video with an unrecognizable face—one with a facial expression that is similar to that of the original. This technology will further enhance the potential use of artificial empathy in helping doctors provide empathic care to achieve good doctor-patient therapeutic relationships, and this may result in better patient satisfaction and adherence to treatment.

Journal Article

Share this book

Add to My Shelf

A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

by Callejas, Zoraida , Fernández-Martínez, Fernando , Kleinlein, Ricardo in Accuracy , audio-visual emotion recognition , computational paralinguistics

2022

Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter