Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
74
result(s) for
"deep fine-grained features"
Sort by:
A Student Facial Expression Recognition Model Based on Multi-Scale and Deep Fine-Grained Feature Attention Enhancement
2024
In smart classroom environments, accurately recognizing students’ facial expressions is crucial for teachers to efficiently assess students’ learning states, timely adjust teaching strategies, and enhance teaching quality and effectiveness. In this paper, we propose a student facial expression recognition model based on multi-scale and deep fine-grained feature attention enhancement (SFER-MDFAE) to address the issues of inaccurate facial feature extraction and poor robustness of facial expression recognition in smart classroom scenarios. Firstly, we construct a novel multi-scale dual-pooling feature aggregation module to capture and fuse facial information at different scales, thereby obtaining a comprehensive representation of key facial features; secondly, we design a key region-oriented attention mechanism to focus more on the nuances of facial expressions, further enhancing the representation of multi-scale deep fine-grained feature; finally, the fusion of multi-scale and deep fine-grained attention-enhanced features is used to obtain richer and more accurate facial key information and realize accurate facial expression recognition. The experimental results demonstrate that the proposed SFER-MDFAE outperforms the existing state-of-the-art methods, achieving an accuracy of 76.18% on FER2013, 92.75% on FERPlus, 92.93% on RAF-DB, 67.86% on AffectNet, and 93.74% on the real smart classroom facial expression dataset (SCFED). These results validate the effectiveness of the proposed method.
Journal Article
A Spatial Feature-Enhanced Attention Neural Network with High-Order Pooling Representation for Application in Pest and Disease Recognition
2022
With the development of advanced information and intelligence technologies, precision agriculture has become an effective solution to monitor and prevent crop pests and diseases. However, pest and disease recognition in precision agriculture applications is essentially the fine-grained image classification task, which aims to learn effective discriminative features that can identify the subtle differences among similar visual samples. It is still challenging to solve for existing standard models troubled by oversized parameters and low accuracy performance. Therefore, in this paper, we propose a feature-enhanced attention neural network (Fe-Net) to handle the fine-grained image recognition of crop pests and diseases in innovative agronomy practices. This model is established based on an improved CSP-stage backbone network, which offers massive channel-shuffled features in various dimensions and sizes. Then, a spatial feature-enhanced attention module is added to exploit the spatial interrelationship between different semantic regions. Finally, the proposed Fe-Net employs a higher-order pooling module to mine more highly representative features by computing the square root of the covariance matrix of elements. The whole architecture is efficiently trained in an end-to-end way without additional manipulation. With comparative experiments on the CropDP-181 Dataset, the proposed Fe-Net achieves Top-1 Accuracy up to 85.29% with an average recognition time of only 71 ms, outperforming other existing methods. More experimental evidence demonstrates that our approach obtains a balance between the model’s performance and parameters, which is suitable for its practical deployment in precision agriculture art applications.
Journal Article
Improve the Security of Industrial Control System: A Fine-Grained Classification Method for DoS Attacks on Modbus/TCP
2023
With the rapid development of technology, more malicious traffic data brought negative influences on industrial areas. Modbus protocol plays a momentous role in the communications of Industrial Control Systems (ICS), but it’s vulnerable to Denial of Service attacks(DoS). Traditional detection methods cannot perform well on fine-grained detection tasks which could contribute to locating targets of attacks and preventing the destruction. Considering the temporal locality and high dimension of malicious traffic, this paper proposed a Neural Network architecture named MODLSTM, which consists of three parts: input preprocessing, feature recoding, and traffic classification. By virtue of the design, MODLSTM can form continuous stream semantics based on fragmented packets, discover potential low-dimensional features and finally classify traffic at a fine-grained level. To test the model’s performances, some experiments were conducted on industrial and public datasets, and the models achieved excellent performances in comparison with previous work(accuracy increased by 0.71% and 0.07% respectively). The results show that the proposed method has more satisfactory abilities to detect DoS attacks related to Modbus, compared with other works. It could help to build a reliable firewall to address a variety of malicious traffic in diverse situations, especially in industrial environments.
Journal Article
IGINet: integrating geometric information to enhance inter-modal interaction for fine-grained image captioning
by
Hossain, Md. Shamim
,
Gu, Naijie
,
Huang, Zhangjin
in
Datasets
,
Encoders-Decoders
,
Feature extraction
2025
Image captioning aims to generate captions that accurately describe objects, their attributes, and the relationships or interactions within the scene depicted in an image. Traditional attention-based models often struggle to capture higher-order interactions and fail to account for the geometric and positional relationships among visual objects. To address these limitations, we propose a geometric information-driven network called IGINet that introduces a novel attention mechanism, GeoAtt, to enhance image captioning from two key perspectives. First, GeoAtt employs low-rank bilinear pooling to selectively harness visual information and enable multimodal reasoning, effectively capturing inter-modal interactions through spatial and channel-wise attention distributions. Second, to improve geometric representation capabilities, we propose an innovative approach for incorporating normalized geometric features directly into the attention mechanism. The extracted features can be freely downloaded from Mendeley Data database https://data.mendeley.com/preview/sf238jg557 by anyone interested. This integration enables the generation of attention maps that focus on the most relevant image regions during captioning, ensuring the production of precise and context-rich descriptions. The GeoAtt module integrates smoothly the LSTM encoder-decoder frameworks, resulting in notable improvements in performance and efficiency. Extensive experiments on the MSCOCO benchmark dataset demonstrate that our approach substantially improves captioning performance, achieving competitive results compared to contemporary methods. Notably, the BLEU-4 score of 39.9 represents a state-of-the-art result among CNN-LSTM based single-model approaches. The code for our implementation is publicly available at https://github.com/shamimsareem/ImageCaptioning.
Journal Article
Multilayer feature fusion with parallel convolutional block for fine-grained image classification
2022
Fine-grained image classification aims at classifying the image subclass under a certain category. It is a challenging task due to the similar features, different gestures and background interference of the images. A key issue in fine-grained image classification is to extract the discriminative regions of images accurately. This paper proposed a multilayer feature fusion (MFF) network with parallel convolutional block (PCB) mechanism to solve this problem. We use the bilinear matrix product to mix different layers’ feature matrixes and then add them to the fully connection layer and the softmax function. In addition, the original convolutional blocks are replaced by the proposed PCB, which has more effective residual connection ability in extracting the region of interest (ROI) and the parallel convolutions with different sizes of kernels. Experimental results on three international available fine-grained datasets demonstrate the effectiveness of the proposed model. Quantitative and visualized experimental results show that our model has higher classification precision compared with the state-of-the-arts ones. Our classification accuracy reaches 87.1%, 91.4% and 93.4% on the dataset CUB-200-2011, FGVC Aircraft and Stanford Cars, respectively.
Journal Article
EnNet: Enhanced Interactive Information Network with Zero-Order Optimization
2024
Interactive image segmentation extremely accelerates the generation of high-quality annotation image datasets, which are the pillars of the applications of deep learning. However, these methods suffer from the insignificance of interaction information and excessively high optimization costs, resulting in unexpected segmentation outcomes and increased computational burden. To address these issues, this paper focuses on interactive information mining from the network architecture and optimization procedure. In terms of network architecture, the issue mentioned above arises from two perspectives: the less representative feature of interactive regions in each layer and the interactive information weakened by the network hierarchy structure. Therefore, the paper proposes a network called EnNet. The network addresses the two aforementioned issues by employing attention mechanisms to integrate user interaction information across the entire image and incorporating interaction information twice in a design that progresses from coarse to fine. In terms of optimization, this paper proposes a method of using zero-order optimization during the first four iterations of training. This approach can reduce computational overhead with only a minimal reduction in accuracy. The experimental results on GrabCut, Berkeley, DAVIS, and SBD datasets validate the effectiveness of the proposed method, with our approach achieving an average NOC@90 that surpasses RITM by 0.35.
Journal Article
An Efficient Fine-Grained Recognition Method Enhanced by Res2Net Based on Dynamic Sparse Attention
2025
Fine-grained recognition tasks face significant challenges in differentiating subtle, class-specific details against cluttered backgrounds. This paper presents an efficient architecture built upon the Res2Net backbone, significantly enhanced by a dynamic Sparse Attention mechanism. The core approach leverages the inherent multi-scale representation power of Res2Net to capture discriminative patterns across different granularities. Crucially, the integrated Sparse Attention module operates dynamically, selectively amplifying the most informative features while attenuating irrelevant background noise and redundant details. This combined strategy substantially improves the model’s ability to focus on pivotal regions critical for accurate classification. Furthermore, strategic architectural optimizations are applied throughout to minimize computational complexity, resulting in a model that demands significantly fewer parameters and exhibits faster inference times. Extensive evaluations on benchmark datasets demonstrate the effectiveness of the proposed method. It achieves a modest but consistent accuracy gain over strong baselines (approximately 2%) while simultaneously reducing model size by around 30% and inference latency by about 20%, proving highly effective for practical fine-grained recognition applications requiring both high accuracy and operational efficiency.
Journal Article
SD-FINE: Lightweight Object Detection Method for Critical Equipment in Substations
2025
The safe and stable operation of critical substation equipment is paramount to the power system, and its intelligent inspection relies on highly efficient and accurate object detection technology. However, the demanding requirements for both accuracy and efficiency in complex environments pose significant challenges for lightweight models. To address this, this paper proposes SD-FINE, a lightweight object detection technique specifically designed for detecting critical substation equipment. Specifically, we introduce a novel Fine-grained Distribution Refinement (FDR) approach, which fundamentally transforms the bounding box regression process in DETR from predicting coordinates to iteratively optimizing edge probability distributions. Central to the new FDR is an adaptive weight function learning mechanism that learns weights for these distributions. This mechanism is designed to enhance the model’s perception capability regarding equipment location information within complex substation environments. Additionally, this paper develops a new Efficient Hybrid Encoder that provides adaptive scale weighting for feature information at different scales during cross-scale feature fusion, enabling more flexible and efficient lightweight feature extraction. Experimental validation on a critical substation equipment detection dataset demonstrates that SD-FINE achieves an accuracy of 93.1% while maintaining model lightness. It outperforms mainstream object detection networks across various metrics, providing an efficient and reliable detection solution for intelligent substation inspection.
Journal Article
Learning optimal image representations through noise injection for fine-grained search
2025
In recent years, fine-grained image search has been an area of interest within the computer vision community. Many current works follow deep feature learning paradigms, which generally exploit the pre-trained convolutional layer’s activations as representations and learn a low-dimensional embedding. This embedding is usually learned by defining loss functions based on local structure like triplet loss. However, triplet loss requires an expensive sampling strategy. In addition, softmax-based loss (when the problem is treated as a classification task) performs faster than triplet loss but suffers from early saturation. To this end, a novel approach is proposed to enhance fine-grained representation learning by incorporating noise injection in both input and features. At the input, input image is made noised and the goal is set to reduce the distance between the L2 normalized features of input image and its noisy version in the embedding space, relative to other instances. Concurrently, noise injection in the features acts as regularization, facilitating the acquisition of generalized features and mitigating model overfitting. The proposed approach is tested on three public datasets: Oxford flower-17, Cub-200-2011 and Cars-196, and achieves better retrieval results than other existing methods. In addition, we also tested our approach in the Zero-Shot setting and got favorable results compared to the prior methods on Cars-196 and Cub-200-2011.
Journal Article
Transformer attention fusion for fine grained medical image classification
2025
Fine-grained visual classification is fundamental for medical image applications because it detects minor lesions. Diabetic retinopathy (DR) is a preventable cause of blindness, which requires exact and timely diagnosis to prevent vision damage. The challenges automated DR classification systems face include irregular lesions, uneven distributions between image classes, and inconsistent image quality that reduces diagnostic accuracy during early detection stages. Our solution to these problems includes MSCAS-Net (Multi-Scale Cross and Self-Attention Network), which uses the Swin Transformer as the backbone. It extracts features at three different resolutions (12 × 12, 24 × 24, 48 × 48), allowing it to detect subtle local features and global elements. This model uses self-attention mechanics to improve spatial connections between single scales and cross-attention to automatically match feature patterns across multiple scales, thereby developing a comprehensive information structure. The model becomes better at detecting significant lesions because of its dual mechanism, which focuses on both attention points. MSCAS-Net displays the best performance on APTOS and DDR and IDRID benchmarks by reaching accuracy levels of 93.8%, 89.80% and 86.70%, respectively. Through its algorithm, the model solves problems with imbalanced datasets and inconsistent image quality without needing data augmentation because it learns stable features. MSCAS-Net demonstrates a breakthrough in automated DR diagnostics since it combines high diagnostic precision with interpretable abilities to become an efficient AI-powered clinical decision support system. The presented research demonstrates how fine-grained visual classification methods benefit detecting and treating DR during its early stages.
Journal Article