Catalogue Search | MBRL

Combining Fixed-Weight ArcFace Loss and Vision Transformer for Facial Expression Recognition

by Xu, Yunhao , Fan, Peihao , Guo, Xiaoyu in Algorithms , ArcFace loss , Automated Facial Recognition - methods

2025

In recent years, deep learning has demonstrated remarkable capability in the broad domain of feature learning. Although the facial expression recognition task has been extensively studied, it still faces many challenges such as significant variations within the same category of expressions. ArcFace loss is a widely adopted function designed to enhance inter-class separability and improve recognition performance. However, it does not explicitly constrain the angular distribution between class centers. Therefore, this study introduces a weight-constrained ArcFace loss and integrates it into the Vision Transformer (ViT) framework. This approach not only alleviates implicit biases induced by imbalanced data distributions but also significantly reduces computational overhead by stabilizing weight optimization. In the experiments, the authors evaluated the proposed approach against standard ArcFace loss, classical loss functions, and various network structures on the RAF-DB and FER2013 datasets. Comprehensive experimental results demonstrate that the proposed approach not only improves recognition accuracy but also achieves higher computational efficiency.

Journal Article

Share this book

Add to My Shelf

Res-RBG Facial Expression Recognition in Image Sequences Based on Dual Neural Networks

by Xie, Xiuping , You, Mingxuan , Wang, Rijun in Accuracy , Algorithms , attention mechanism

2025

Facial expressions involve dynamic changes, and facial expression recognition based on static images struggles to capture the temporal information inherent in these dynamic changes. The resultant degradation in real-world performance critically impedes the integration of facial expression recognition systems into intelligent sensing applications. Therefore, this paper proposes a facial expression recognition method for image sequences based on the fusion of dual neural networks (ResNet and residual bidirectional GRU—Res-RBG). The model proposed in this paper achieves recognition accuracies of 98.10% and 88.64% on the CK+ and Oulu-CASIA datasets, respectively. Moreover, the model has a parameter size of only 64.20 M. Compared to existing methods for image sequence-based facial expression recognition, the approach presented in this paper demonstrates certain advantages, indicating strong potential for future edge sensor deployment.

Journal Article

Share this book

Add to My Shelf

Facial Landmark-Driven Keypoint Feature Extraction for Robust Facial Expression Recognition

by So, Jaehyun , Han, Youngjoon in Ablation , Algorithms , Automated Facial Recognition - methods

2025

Facial expression recognition (FER) is a core technology that enables computers to understand and react to human emotions. In particular, the use of face alignment algorithms as a preprocessing step in image-based FER is important for accurately normalizing face images in terms of scale, rotation, and translation to improve FER accuracy. Recently, FER studies have been actively leveraging feature maps computed by face alignment networks to enhance FER performance. However, previous studies were limited in their ability to effectively apply information from specific facial regions that are important for FER, as they either only used facial landmarks during the preprocessing step or relied solely on the feature maps from the face alignment networks. In this paper, we propose the use of Keypoint Features extracted from feature maps at the coordinates of facial landmarks. To effectively utilize Keypoint Features, we further propose a Keypoint Feature regularization method using landmark perturbation for robustness, and an attention mechanism that emphasizes all Keypoint Features using representative Keypoint Features derived from a nasal base landmark, which carries information for the whole face, to improve performance. We performed experiments on the AffectNet, RAF-DB, and FERPlus datasets using a simply designed network to validate the effectiveness of the proposed method. As a result, the proposed method achieved a performance of 68.17% on AffectNet-7, 64.87% on AffectNet-8, 93.16% on RAF-DB, and 91.44% on FERPlus. Furthermore, the network pretrained on AffectNet-8 had improved performances of 94.04% on RAF-DB and 91.66% on FERPlus. These results demonstrate that the proposed Keypoint Features can achieve comparable results to those of the existing methods, highlighting their potential for enhancing FER performance through the effective utilization of key facial region features.

Journal Article

Share this book

Add to My Shelf

A Student Facial Expression Recognition Model Based on Multi-Scale and Deep Fine-Grained Feature Attention Enhancement

by Shou, Zhaoyu , Li, Dongxu , Zhang, Huibing in Accuracy , Algorithms , Attention - physiology

2024

In smart classroom environments, accurately recognizing students’ facial expressions is crucial for teachers to efficiently assess students’ learning states, timely adjust teaching strategies, and enhance teaching quality and effectiveness. In this paper, we propose a student facial expression recognition model based on multi-scale and deep fine-grained feature attention enhancement (SFER-MDFAE) to address the issues of inaccurate facial feature extraction and poor robustness of facial expression recognition in smart classroom scenarios. Firstly, we construct a novel multi-scale dual-pooling feature aggregation module to capture and fuse facial information at different scales, thereby obtaining a comprehensive representation of key facial features; secondly, we design a key region-oriented attention mechanism to focus more on the nuances of facial expressions, further enhancing the representation of multi-scale deep fine-grained feature; finally, the fusion of multi-scale and deep fine-grained attention-enhanced features is used to obtain richer and more accurate facial key information and realize accurate facial expression recognition. The experimental results demonstrate that the proposed SFER-MDFAE outperforms the existing state-of-the-art methods, achieving an accuracy of 76.18% on FER2013, 92.75% on FERPlus, 92.93% on RAF-DB, 67.86% on AffectNet, and 93.74% on the real smart classroom facial expression dataset (SCFED). These results validate the effectiveness of the proposed method.

Journal Article

Share this book

Add to My Shelf

Facial Expression Recognition-You Only Look Once-Neighborhood Coordinate Attention Mamba: Facial Expression Detection and Classification Based on Neighbor and Coordinates Attention Mechanism

by Zou, Kun , Tsoi, Ah Chung , Sun, Mingqi in Algorithms , Analysis , attention

2024

In studying the joint object detection and classification problem for facial expression recognition (FER) deploying the YOLOX framework, we introduce a novel feature extractor, called neighborhood coordinate attention Mamba (NCAMamba) to substitute for the original feature extractor in the Feature Pyramid Network (FPN). NCAMamba combines the background information reduction capabilities of Mamba, the local neighborhood relationship understanding of neighborhood attention, and the directional relationship understanding of coordinate attention. The resulting FER-YOLO-NCAMamba model, when applied to two unaligned FER benchmark datasets, RAF-DB and SFEW, obtains significantly improved mean average precision (mAP) scores when compared with those obtained by other state-of-the-art methods. Moreover, in ablation studies, it is found that the NCA module is relatively more important than the Visual State Space (VSS), a version of using Mamba for image processing, and in visualization studies using the grad-CAM method, it reveals that regions around the nose tip are critical to recognizing the expression; if it is too large, it may lead to erroneous prediction, while a small focused region would lead to correct recognition; this may explain why FER of unaligned faces is such a challenging problem.

Journal Article

Share this book

Add to My Shelf

Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition

by Li, Xinyuan , Xiao, Zhiguo , Li, Nianfeng in Accuracy , Algorithms , attention module

2024

Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in real-world environments remains highly challenging. At the same time, methods solely based on CNN heavily rely on local spatial features, lack global information, and struggle to balance the relationship between computational complexity and recognition accuracy. Consequently, the CNN-based models still fall short in their ability to address FER adequately. To address these issues, we propose a lightweight facial expression recognition method based on a hybrid vision transformer. This method captures multi-scale facial features through an improved attention module, achieving richer feature integration, enhancing the network’s perception of key facial expression regions, and improving feature extraction capabilities. Additionally, to further enhance the model’s performance, we have designed the patch dropping (PD) module. This module aims to emulate the attention allocation mechanism of the human visual system for local features, guiding the network to focus on the most discriminative features, reducing the influence of irrelevant features, and intuitively lowering computational costs. Extensive experiments demonstrate that our approach significantly outperforms other methods, achieving an accuracy of 86.51% on RAF-DB and nearly 70% on FER2013, with a model size of only 3.64 MB. These results demonstrate that our method provides a new perspective for the field of facial expression recognition.

Journal Article

Share this book

Add to My Shelf

TriCAFFNet: A Tri-Cross-Attention Transformer with a Multi-Feature Fusion Network for Facial Expression Recognition

by Yao, Huang , Wang, Zhao , Tian, Yuan in Accuracy , Algorithms , Automated Facial Recognition - methods

2024

In recent years, significant progress has been made in facial expression recognition methods. However, tasks related to facial expression recognition in real environments still require further research. This paper proposes a tri-cross-attention transformer with a multi-feature fusion network (TriCAFFNet) to improve facial expression recognition performance under challenging conditions. By combining LBP (Local Binary Pattern) features, HOG (Histogram of Oriented Gradients) features, landmark features, and CNN (convolutional neural network) features from facial images, the model is provided with a rich input to improve its ability to discern subtle differences between images. Additionally, tri-cross-attention blocks are designed to facilitate information exchange between different features, enabling mutual guidance among different features to capture salient attention. Extensive experiments on several widely used datasets show that our TriCAFFNet achieves the SOTA performance on RAF-DB with 92.17%, AffectNet (7 cls) with 67.40%, and AffectNet (8 cls) with 63.49%, respectively.

Journal Article

Share this book

Add to My Shelf

Enhanced AlexNet with Gabor and Local Binary Pattern Features for Improved Facial Emotion Recognition

by Kutlimuratov, Alpamis , Abdusalomov, Akmalbek , Cho, Young-Im in Accuracy , AlexNet , Algorithms

2025

Facial emotion recognition (FER) is vital for improving human–machine interactions, serving as the foundation for AI systems that integrate cognitive and emotional intelligence. This helps bridge the gap between mechanical processes and human emotions, enhancing machine engagement with humans. Considering the constraints of low hardware specifications often encountered in real-world applications, this study leverages recent advances in deep learning to propose an enhanced model for FER. The model effectively utilizes texture information from faces through Gabor and Local Binary Pattern (LBP) feature extraction techniques. By integrating these features into a specially modified AlexNet architecture, our approach not only classifies facial emotions more accurately but also demonstrates significant improvements in performance and adaptability under various operational conditions. To validate the effectiveness of our proposed model, we conducted evaluations using the FER2013 and RAF-DB benchmark datasets, where it achieved impressive accuracies of 98.10% and 93.34% for the two datasets, with standard deviations of 1.63% and 3.62%, respectively. On the FER-2013 dataset, the model attained a precision of 98.2%, a recall of 97.9%, and an F1-score of 98.0%. Meanwhile, for the other dataset, it achieved a precision of 93.54%, a recall of 93.12%, and an F1-score of 93.34%. These results underscore the model’s robustness and its capability to deliver high-precision emotion recognition, making it an ideal solution for deployment in environments where hardware limitations are a critical concern.

Journal Article

Share this book

Add to My Shelf

Enhancing Facial Expression Recognition through Light Field Cameras

by Nicod, Lionel , Merad, Djamal , Boï, Jean-Marc in Accuracy , Algorithms , Aperture

2024

In this paper, we study facial expression recognition (FER) using three modalities obtained from a light field camera: sub-aperture (SA), depth map, and all-in-focus (AiF) images. Our objective is to construct a more comprehensive and effective FER system by investigating multimodal fusion strategies. For this purpose, we employ EfficientNetV2-S, pre-trained on AffectNet, as our primary convolutional neural network. This model, combined with a BiGRU, is used to process SA images. We evaluate various fusion techniques at both decision and feature levels to assess their effectiveness in enhancing FER accuracy. Our findings show that the model using SA images surpasses state-of-the-art performance, achieving 88.13% ± 7.42% accuracy under the subject-specific evaluation protocol and 91.88% ± 3.25% under the subject-independent evaluation protocol. These results highlight our model’s potential in enhancing FER accuracy and robustness, outperforming existing methods. Furthermore, our multimodal fusion approach, integrating SA, AiF, and depth images, demonstrates substantial improvements over unimodal models. The decision-level fusion strategy, particularly using average weights, proved most effective, achieving 90.13% ± 4.95% accuracy under the subject-specific evaluation protocol and 93.33% ± 4.92% under the subject-independent evaluation protocol. This approach leverages the complementary strengths of each modality, resulting in a more comprehensive and accurate FER system.

Journal Article

Share this book

Add to My Shelf

On-the-move heterogeneous face recognition in frequency and spatial domain using sparse representation

by Manzoor, Sajjad , Syed Muhammad, Wasif , Butt, Asif Raza in Accuracy , Algorithms , Automated Facial Recognition - methods

2024

Heterogeneity of a probe image is one of the most complex challenges faced by researchers and implementers of current surveillance systems. This is due to existence of multiple cameras working in different spectral ranges in a single surveillance setup. This paper proposes two different approaches including spatial sparse representations (SSR) and frequency sparse representation (FSR) to recognize on-the-move heterogeneous face images with database of single sample per person (SSPP). SCface database, with five visual and two Infrared (IR) cameras, is taken as a benchmark for experiments, which is further confirmed using CASIA NIR-VIS 2.0 face database with 17580 visual and IR images. Similarity, comparison is performed for different scenarios such as, variation of distances from a camera and variation in sizes of face images and various visual and infrared (IR) modalities. Least square minimization based approach for finding the solution is used to match face images as it makes the recognition process simpler. A side by side comparison of both the proposed approaches with the state-of-the-art, classical, principal component analysis (PCA), kernel fisher analysis (KFA) and coupled kernel embedding (CKE) methods, along with modern low-rank preserving projection via graph regularized reconstruction (LRPP-GRR) method, is also presented. Experimental results suggest that the proposed approaches achieve superior performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter