Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
11,982
result(s) for
"Emotion recognition"
Sort by:
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
by
Callejas, Zoraida
,
Fernández-Martínez, Fernando
,
Kleinlein, Ricardo
in
Accuracy
,
audio–visual emotion recognition
,
computational paralinguistics
2021
Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.
Journal Article
Emotion Recognition Using Different Sensors, Emotion Models, Methods and Datasets: A Comprehensive Review
2023
In recent years, the rapid development of sensors and information technology has made it possible for machines to recognize and analyze human emotions. Emotion recognition is an important research direction in various fields. Human emotions have many manifestations. Therefore, emotion recognition can be realized by analyzing facial expressions, speech, behavior, or physiological signals. These signals are collected by different sensors. Correct recognition of human emotions can promote the development of affective computing. Most existing emotion recognition surveys only focus on a single sensor. Therefore, it is more important to compare different sensors or unimodality and multimodality. In this survey, we collect and review more than 200 papers on emotion recognition by literature research methods. We categorize these papers according to different innovations. These articles mainly focus on the methods and datasets used for emotion recognition with different sensors. This survey also provides application examples and developments in emotion recognition. Furthermore, this survey compares the advantages and disadvantages of different sensors for emotion recognition. The proposed survey can help researchers gain a better understanding of existing emotion recognition systems, thus facilitating the selection of suitable sensors, algorithms, and datasets.
Journal Article
A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset
by
Callejas, Zoraida
,
Fernández-Martínez, Fernando
,
Kleinlein, Ricardo
in
Accuracy
,
audio-visual emotion recognition
,
computational paralinguistics
2022
Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.
Journal Article
A survey of state-of-the-art approaches for emotion recognition in text
by
Menai Mohamed El Bachir
,
Alswaidan Nourah
in
Data mining
,
Distance learning
,
Emotion recognition
2020
Emotion recognition in text is an important natural language processing (NLP) task whose solution can benefit several applications in different fields, including data mining, e-learning, information filtering systems, human–computer interaction, and psychology. Explicit emotion recognition in text is the most addressed problem in the literature. The solution to this problem is mainly based on identifying keywords. Implicit emotion recognition is the most challenging problem to solve because such emotion is typically hidden within the text, and thus, its solution requires an understanding of the context. There are four main approaches for implicit emotion recognition in text: rule-based approaches, classical learning-based approaches, deep learning approaches, and hybrid approaches. In this paper, we critically survey the state-of-the-art research for explicit and implicit emotion recognition in text. We present the different approaches found in the literature, detail their main features, discuss their advantages and limitations, and compare them within tables. This study shows that hybrid approaches and learning-based approaches that utilize traditional text representation with distributed word representation outperform the other approaches on benchmark corpora. This paper also identifies the sets of features that lead to the best-performing approaches; highlights the impacts of simple NLP tasks, such as part-of-speech tagging and parsing, on the performances of these approaches; and indicates some open problems.
Journal Article
Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models
2023
The dramatic impact of the COVID-19 pandemic has resulted in the closure of physical classrooms and teaching methods being shifted to the online medium.To make the online learning environment more interactive, just like traditional offline classrooms, it is essential to ensure the proper engagement of students during online learning sessions.This paper proposes a deep learning-based approach using facial emotions to detect the real-time engagement of online learners. This is done by analysing the students’ facial expressions to classify their emotions throughout the online learning session. The facial emotion recognition information is used to calculate the engagement index (EI) to predict two engagement states “Engaged” and “Disengaged”. Different deep learning models such as Inception-V3, VGG19 and ResNet-50 are evaluated and compared to get the best predictive classification model for real-time engagement detection. Varied benchmarked datasets such as FER-2013, CK+ and RAF-DB are used to gauge the overall performance and accuracy of the proposed system. Experimental results showed that the proposed system achieves an accuracy of 89.11%, 90.14% and 92.32% for Inception-V3, VGG19 and ResNet-50, respectively, on benchmarked datasets and our own created dataset. ResNet-50 outperforms all others with an accuracy of 92.3% for facial emotions classification in real-time learning scenarios.
Journal Article
Emotion recognition for human–computer interaction using high-level descriptors
2024
Recent research has focused extensively on employing Deep Learning (DL) techniques, particularly Convolutional Neural Networks (CNN), for Speech Emotion Recognition (SER). This study addresses the burgeoning interest in leveraging DL for SER, specifically focusing on Punjabi language speakers. The paper presents a novel approach to constructing and preprocessing a labeled speech corpus using diverse social media sources. By utilizing spectrograms as the primary feature representation, the proposed algorithm effectively learns discriminative patterns for emotion recognition. The method is evaluated on a custom dataset derived from various Punjabi media sources, including films and web series. Results demonstrate that the proposed approach achieves an accuracy of 69%, surpassing traditional methods like decision trees, Naïve Bayes, and random forests, which achieved accuracies of 49%, 52%, and 61% respectively. Thus, the proposed method improves accuracy in recognizing emotions from Punjabi speech signals.
Journal Article
A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
2023
Multimodal emotion recognition (MER) refers to the identification and understanding of human emotional states by combining different signals, including—but not limited to—text, speech, and face cues. MER plays a crucial role in the human–computer interaction (HCI) domain. With the recent progression of deep learning technologies and the increasing availability of multimodal datasets, the MER domain has witnessed considerable development, resulting in numerous significant research breakthroughs. However, a conspicuous absence of thorough and focused reviews on these deep learning-based MER achievements is observed. This survey aims to bridge this gap by providing a comprehensive overview of the recent advancements in MER based on deep learning. For an orderly exposition, this paper first outlines a meticulous analysis of the current multimodal datasets, emphasizing their advantages and constraints. Subsequently, we thoroughly scrutinize diverse methods for multimodal emotional feature extraction, highlighting the merits and demerits of each method. Moreover, we perform an exhaustive analysis of various MER algorithms, with particular focus on the model-agnostic fusion methods (including early fusion, late fusion, and hybrid fusion) and fusion based on intermediate layers of deep models (encompassing simple concatenation fusion, utterance-level interaction fusion, and fine-grained interaction fusion). We assess the strengths and weaknesses of these fusion strategies, providing guidance to researchers to help them select the most suitable techniques for their studies. In summary, this survey aims to provide a thorough and insightful review of the field of deep learning-based MER. It is intended as a valuable guide to aid researchers in furthering the evolution of this dynamic and impactful field.
Journal Article
Facial Emotion Recognition Using Conventional Machine Learning and Deep Learning Methods: Current Achievements, Analysis and Remaining Challenges
2022
Facial emotion recognition (FER) is an emerging and significant research area in the pattern recognition domain. In daily life, the role of non-verbal communication is significant, and in overall communication, its involvement is around 55% to 93%. Facial emotion analysis is efficiently used in surveillance videos, expression analysis, gesture recognition, smart homes, computer games, depression treatment, patient monitoring, anxiety, detecting lies, psychoanalysis, paralinguistic communication, detecting operator fatigue and robotics. In this paper, we present a detailed review on FER. The literature is collected from different reputable research published during the current decade. This review is based on conventional machine learning (ML) and various deep learning (DL) approaches. Further, different FER datasets for evaluation metrics that are publicly available are discussed and compared with benchmark results. This paper provides a holistic review of FER using traditional ML and DL methods to highlight the future gap in this domain for new researchers. Finally, this review work is a guidebook and very helpful for young researchers in the FER area, providing a general understating and basic knowledge of the current state-of-the-art methods, and to experienced researchers looking for productive directions for future work.
Journal Article
A deep learning approach to analyse stress by using voice and body posture
by
Pham, Hoang
,
Gupta, Sumita
,
Majumdar, Rana
in
Algorithms
,
Application of Soft Computing
,
Artificial Intelligence
2025
In the current scenario, where we can see young people struggling for their careers, they are even fighting a battle with their stress and tension. None of their work is done without stress to complete their task and compete with others. To overcome stress, one should have good emotional intelligence to cope with emotions and any upcoming stress. But at some point, due to lack of guidance, some people don’t know how to analyze the situations and how to handle them without taking the stress and end up with anxiety, depression, disappointment, suicide, heart attack, stroke etc. Due to the advancement of Human–Computer Interaction (HCI), medical science has leveled up to another peak. Machine Learning and Deep Learning played a major role in such interactions and predictions. Many applications have been developed in past years based on machine learning and deep learning. One of those applications is related to psychology and is still in research. These applications can be used for emotion and stress analysis among people, especially youngsters. Research in this field is being conducted using various verbal and non-verbal parameters. This paper addresses the research problem of improving emotion recognition accuracy and robustness to better analyze and manage stress. The primary objective is to develop an advanced Emotion Recognition System (ERS) that leverages deep learning algorithms to analyses both verbal and non-verbal cues—specifically, speech and body posture, including facial expressions. We have further integrated it with the Flask web framework to make an Emotion Recognition System that takes input in the form of video and audio to analyze Emotions and Stress. We have also compared our proposed ERS with existing ones and found that our ERS gives better results.
Journal Article
MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
2025
Speech emotion recognition has seen a surge in transformer models, which excel at understanding the overall message by analyzing long-term patterns in speech. However, these models come at a computational cost. In contrast, convolutional neural networks are faster but struggle with capturing these long-range relationships. Our proposed system,
MemoCMT
, tackles this challenge using a novel “cross-modal transformer” (
CMT
). This
CMT
can effectively analyze local and global speech features and their corresponding text. To boost efficiency,
MemoCMT
leverages recent advancements in pre-trained models:
HuBERT
extracts meaningful features from the audio, while
BERT
analyzes the text. The core innovation lies in how the
CMT
component utilizes and integrates these audio and text features. After this integration, different fusion techniques are applied before final emotion classification. Experiments show that
MemoCMT
achieves impressive performance, with the
CMT
using min aggregation achieving the highest unweighted accuracy (
UW-Acc
) of 81.33% and 91.93%, and weighted accuracy (
W-Acc
) of 81.85% and 91.84% respectively on benchmark IEMOCAP and ESD corpora. The results of our system demonstrate the generalization capacity and robustness for real-world industrial applications. Moreover, the implementation details of
MemoCMT
are publicly available at
https://github.com/tpnam0901/MemoCMT/
for reproducibility purposes.
Journal Article