Catalogue Search | MBRL

Enhanced human pose estimation via feature-enriched HRNet in smart classroom scenarios

by Li, Dongxu , Shou, Zhaoyu , Wu, Ziyong in 639/705/117 , 639/705/258 , Accuracy

2025

Human pose estimation (HPE) is crucial for analyzing student behavioral dynamics and developing instructional evaluations in smart classrooms. However, in complex scenarios such as densely distributed students, existing methods often face challenges in keypoint feature extraction and localization accuracy. To address these issues, we propose a Feature-enhanced high-resolution network (FE-HRNet) for human pose estimation. The model first incorporates Res2Net modules into the backbone network, constructing a hierarchical residual connection structure to achieve fine-grained multi-scale feature representation and effectively expanding the network’s receptive field. Second, we innovatively embed a Multi-scale convolution attention (MSCA) module, which captures spatial context information at different scales through multi-branch depth-wise stripe convolutions and combines channel attention mechanisms to enhance key features, significantly improving keypoint localization capability adaptively. Finally, experimental results on the COCO public dataset and our custom-developed Smart classroom pose (SCP) dataset validate that the proposed method delivers superior pose estimation performance in complex scenarios. The code is available at https://github.com/ldxguet/FEHRNet .

Journal Article

Share this book

Add to My Shelf

Dynamic Spatial-Temporal Memory Augmentation Network for Traffic Prediction

by Xie, Qianxin , Shou, Zhaoyu , Gao, Yunhao in Design , graph convolutional network , Knowledge

2024

Traffic flow prediction plays a crucial role in the development of smart cities. However, existing studies face challenges in effectively capturing spatio-temporal contexts, handling hierarchical temporal features, and understanding spatial heterogeneity. To better manage the spatio-temporal correlations inherent in traffic flow, we present a novel model called Dynamic Spatio-Temporal Memory-Augmented Network (DSTMAN). Firstly, we design three spatial–temporal embeddings to capture dynamic spatial–temporal contexts and encode the unique characteristics of time units and spatial states. Secondly, these three spatial–temporal components are integrated to form a multi-scale spatial–temporal block, which effectively extracts hierarchical spatial–temporal dependencies. Finally, we introduce a meta-memory node bank to construct an adaptive neighborhood graph, implicitly representing spatial relationships and enhancing the learning of spatial heterogeneity through a secondary memory mechanism. Evaluation on four public datasets, including METR-LA and PEMS-BAY, demonstrates that the proposed model outperforms benchmark models such as MTGNN, DCRNN, and AGCRN. On the METR-LA dataset, our model reduces the MAE by 4% compared to MTGNN, 6.9% compared to DCRNN, and 5.8% compared to AGCRN, confirming its efficacy in traffic flow prediction.

Journal Article

Share this book

Add to My Shelf

A Student Facial Expression Recognition Model Based on Multi-Scale and Deep Fine-Grained Feature Attention Enhancement

by Shou, Zhaoyu , Li, Dongxu , Zhang, Huibing in Accuracy , Algorithms , Attention - physiology

2024

In smart classroom environments, accurately recognizing students’ facial expressions is crucial for teachers to efficiently assess students’ learning states, timely adjust teaching strategies, and enhance teaching quality and effectiveness. In this paper, we propose a student facial expression recognition model based on multi-scale and deep fine-grained feature attention enhancement (SFER-MDFAE) to address the issues of inaccurate facial feature extraction and poor robustness of facial expression recognition in smart classroom scenarios. Firstly, we construct a novel multi-scale dual-pooling feature aggregation module to capture and fuse facial information at different scales, thereby obtaining a comprehensive representation of key facial features; secondly, we design a key region-oriented attention mechanism to focus more on the nuances of facial expressions, further enhancing the representation of multi-scale deep fine-grained feature; finally, the fusion of multi-scale and deep fine-grained attention-enhanced features is used to obtain richer and more accurate facial key information and realize accurate facial expression recognition. The experimental results demonstrate that the proposed SFER-MDFAE outperforms the existing state-of-the-art methods, achieving an accuracy of 76.18% on FER2013, 92.75% on FERPlus, 92.93% on RAF-DB, 67.86% on AffectNet, and 93.74% on the real smart classroom facial expression dataset (SCFED). These results validate the effectiveness of the proposed method.

Journal Article

Share this book

Add to My Shelf

Predicting Student Performance in Online Learning: A Multidimensional Time-Series Data Analysis Approach

by Shou, Zhaoyu , Mo, Jianwen , Xie, Mingquan in Academic achievement , Accuracy , Analysis

2024

As an emerging teaching method, online learning is becoming increasingly popular among learners. However, one of the major drawbacks of this learning style is the lack of effective communication and feedback, which can lead to a higher risk of students failing or dropping out. In response to this challenge, this paper proposes a student performance prediction model based on multidimensional time-series data analysis by considering multidimensional data such as students’ learning behaviors, assessment scores, and demographic information, which is able to extract the characteristics of students’ learning behaviors and capture the connection between multiple characteristics to better explore the impact of multiple factors on students’ performance. The model proposed in this paper helps teachers to individualize education for students at different levels of proficiency and identifies at-risk students as early as possible to help teachers intervene in a timely manner. In experiments on the Open University Learning Analytics Dataset (OULAD), the model achieved 74% accuracy and 73% F1 scores in a four-category prediction task and was able to achieve 99.08% accuracy and 99.08% F1 scores in an early risk prediction task. Compared with the benchmark model, both the multi-classification prediction ability and the early prediction ability, the model in this paper has a better performance.

Journal Article

Share this book

Add to My Shelf

A learning behavior classification model based on classroom meta-action sequences

by Shou, Zhaoyu , Li, Dongxu , Mo, Jianwen in 631/114/2164 , 639/166/987 , Channel attention

2025

The individual adaptive behavioral interpretation of students’ learning behaviors is a vital link for instructional process interventions. Accurately recognizing learning behaviors and conducting a complete judgment of classroom meta-action sequences are essential for the individual adaptive behavioral interpretation of students’ learning behaviors. This paper proposes a learning behavior classification model based on classroom meta-action sequences (ConvTran-Fibo-CA-Enhanced). The model employs the Fibonacci sequence for location encoding to augment the positional attributes of classroom meta-action sequences. It also integrates Channel Attention and Data Augmentation techniques to improve the model’s ability to comprehend these sequences, thereby increasing the accuracy of learning behavior classification and verifying the completeness of classroom meta-action sequences. Experimental results show that the proposed model outperforms baseline models on human activity recognition public datasets and learning behavior classification and meta-action sequences completeness judgment datasets in smart classroom scenarios.

Journal Article

Share this book

Add to My Shelf

A Dynamic Position Embedding-Based Model for Student Classroom Complete Meta-Action Recognition

by Shou, Zhaoyu , Li, Dongxu , Mo, Jianwen in Accuracy , Classification , Classrooms

2024

The precise recognition of entire classroom meta-actions is a crucial challenge for the tailored adaptive interpretation of student behavior, given the intricacy of these actions. This paper proposes a Dynamic Position Embedding-based Model for Student Classroom Complete Meta-Action Recognition (DPE-SAR) based on the Video Swin Transformer. The model utilizes a dynamic positional embedding technique to perform conditional positional encoding. Additionally, it incorporates a deep convolutional network to improve the parsing ability of the spatial structure of meta-actions. The full attention mechanism of ViT3D is used to extract the potential spatial features of actions and capture the global spatial–temporal information of meta-actions. The proposed model exhibits exceptional performance compared to baseline models in action recognition as observed in evaluations on public datasets and smart classroom meta-action recognition datasets. The experimental results confirm the superiority of the model in meta-action recognition.

Journal Article

Share this book

Add to My Shelf

Characterization of Students’ Thinking States Active Based on Improved Bloom Classification Algorithm and Cognitive Diagnostic Model

by Shou, Zhaoyu , Mo, Jianwen , Liu, Yipeng in Accuracy , Algorithms , Classification

2025

A student’s active thinking state directly affects their learning experience in the classroom. To help teachers understand students’ active thinking states in real-time, this study aims to construct a model which characterizes their active thinking states. The main research objectives are as follows: (1) to achieve accurate classification of the cognitive levels of in-class exercises; (2) to effectively quantify the active thinking state of students through analyzing the correlation between student cognitive levels and exercise cognitive levels. The research methods used in this study to achieve these objectives are as follows: First, LSTM and Chinese-RoBERTa-wwm models are integrated to extract sequential and semantic information from plain text while TBCC is used to extract the semantic features of code text, allowing for comprehensive determination of the cognitive level of exercises. Second, a cognitive diagnosis model—namely, the QRCDM—is adopted to evaluate students’ real-time cognitive levels with respect to knowledge points. Finally, the cognitive levels of exercises and students are input into a self-attention mechanism network, their correlation is analyzed, and the thinking activity state is generated as a state representation. The proposed text classification model outperforms baseline models regarding ACC, micro-F1, and macro-F1 scores on two sets of exercise datasets in Chinese containing mixed code texts, with the highest ACC, micro-F1, and macro-F1 values reaching 0.7004, 0.6941, and 0.6912, respectively. This proves the proposed model’s effectiveness in classifying the cognitive level of exercises. The accuracy of the thinking activity state characterization model reaches 61.54%. In particular, this is higher than the random baseline, thus verifying the model’s feasibility.

Journal Article

Share this book

Add to My Shelf

CTIFERK: A Thermal Infrared Facial Expression Recognition Model with Kolmogorov–Arnold Networks for Smart Classrooms

by Shou, Zhaoyu , Li, Dongxu , Mo, Jianwen in Accuracy , Algorithms , Approximation

2025

Accurate recognition of student emotions in smart classrooms is vital for understanding learning states. Visible light-based facial expression recognition is often affected by illumination changes, making thermal infrared imaging a promising alternative due to its robust temperature distribution symmetry. This paper proposes CTIFERK, a thermal infrared facial expression recognition model integrating Kolmogorov–Arnold Networks (KANs). By incorporating multiple KAN layers, CTIFERK enhances feature extraction and fitting capabilities. It also balances pooling layer information from the MobileViT backbone to preserve symmetrical facial features, improving recognition accuracy. Experiments on the Tufts Face Database, the IRIS Database, and the self-constructed GUET thermalface dataset show that CTIFERK achieves accuracies of 81.82%, 82.19%, and 65.22%, respectively, outperforming baseline models. These results validate CTIFERK’s effectiveness and superiority for thermal infrared expression recognition in smart classrooms, enabling reliable emotion monitoring.

Journal Article

Share this book

Add to My Shelf

Educational QA System-Oriented Answer Selection Model Based on Focus Fusion of Multi-Perspective Word Matching

by Shou, Zhaoyu , Hu, Xiaoli , He, Junfei in Algorithms , answer selection , Cognition & reasoning

2025

Question-answering systems have become an important tool for learning and knowledge acquisition. However, current answer selection models often rely on representing features using whole sentences, which leads to neglecting individual words and losing important information. To address this challenge, the paper proposes a novel answer selection model based on focus fusion of multi-perspective word matching. First, according to the different combination relationships between sentences, focus distribution in terms of words is obtained from the matching perspectives of serial, parallel, and transfer. Then, the sentence’s key position information is inferred from its focus distribution. Finally, a method of aligning key information points is designed to fuse the focus distribution for each perspective, which obtains match scores for each candidate answer to the question. Experimental results show that the proposed model significantly outperforms the Transformer encoder fine-tuned model based on contextual embedding, achieving a 4.07% and 5.51% increase in MAP and a 1.63% and 4.86% increase in MRR, respectively.

Journal Article

Share this book

Add to My Shelf

A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion

by Shou, Zhaoyu , Mo, Jianwen , Lin, Yanjun in Accuracy , Analysis , Datasets

2025

The complexity of various factors influencing online learning makes it difficult to characterize learning concentration, while Accurately estimating students’ gaze points during learning video sessions represents a critical scientific challenge in assessing and enhancing the attentiveness of online learners. However, current appearance-based gaze estimation models lack a focus on extracting essential features and fail to effectively model the spatio-temporal relationships among the head, face, and eye regions, which limits their ability to achieve lower angular errors. This paper proposes an appearance-based gaze estimation model (RSP-MCGaze). The model constructs a feature extraction backbone network for gaze estimation (ResNetSC) by integrating ResNet and SCConv; this integration enhances the model’s ability to extract important features while reducing spatial and channel redundancy. Based on the ResNetSC backbone, the method for video gaze estimation was further optimized by jointly locating the head, eyes, and face. The experimental results demonstrate that our model achieves significantly higher performance compared to existing baseline models on public datasets, thereby fully confirming the superiority of our method in the gaze estimation task. The model achieves a detection error of 9.86 on the Gaze360 dataset and a detection error of 7.11 on the detectable face subset of Gaze360.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter