Catalogue Search | MBRL

Residual Shuffle Attention Network for Image Super-Resolution

by Li, Zhiwei , Zhang, Yaping , Yang, Yuwei in image processing , image super resolution , Shuffle attention

2021

In order to improve the accuracy of the super-resolution network and reduce the number of model parameters, this paper improves its RCAB module on the basis of RCAN, and builds a reconstruction network RSAN that can improve the quality and efficiency of image super-resolution reconstruction. By replacing the original channel attention module with a more efficient and lightweight shuffle attention, it is mainly used to reduce the number of parameters, supplemented by improving the accuracy; and replacing part of the ordinary convolution in RCAN with split convolution is mainly used to improve accuracy, supplemented by reducing feature redundancy and parameters. The experimental results show that RSAN in this paper can not only obtain better subjective visual evaluation and objective quantitative evaluation, but also reduce the number of network model parameters and improve the efficiency of the network to a certain extent.

Journal Article

Share this book

Add to My Shelf

Hybrid attention transformer integrated YOLOV8 for fruit ripeness detection

by Tang, Jianyin , Shao, ChangShun , Yu, Zhenglin in 639/166/988 , 639/705/1042 , EIoU

2025

The complexity of the outdoor orchard environment, especially the changes in light intensity and the shadows generated by fruit clusters, present challenges in the identification and classification of mature fruits. To solve these problems, this paper proposes an innovative fruit recognition model, HAT-YOLOV8, aiming to combine the advantages of Hybrid Attention Transformer (HAT) and YOLOV8 deep learning algorithm. This model improves the ability to capture complex dependencies by integrating the Shuffle Attention (SA) module while maintaining low computational complexity. In addition, during the feature fusion stage, the Hybrid Attention Transformer (HAT) module is integrated into TopDownLayer2 to enhance the capture of long-term dependencies and the recovery of detailed information in the input data. To more accurately evaluate the similarity between the prediction box and the real bounding box, this paper uses the EIoU loss function instead of CIoU, thereby improving detection accuracy and accelerating model convergence. In terms of evaluation, this study was experimented on a dataset containing five fruit varieties, each of which was classified into three different maturity levels. The results show that the HAT-YOLOV8 model improved mAP by 11%, 10.2%, 7.6% and 7.8% on the test set, and the overall mAP reached 88.9% respectively. In addition, the HAT-YOLOV8 model demonstrates excellent generalization capabilities, indicating its potential for application in the fields of fruit recognition, maturity assessment and fruit picking automation.

Journal Article

Share this book

Add to My Shelf

Shuffle Attention-Based Pavement-Sealed Crack Distress Detection

by Yuan, Bo , Li, Wei , Zhao, Kaiyue in Automation , Concrete pavements , Cracks

2024

To enhance the detection of pavement-sealed cracks and ensure the long-term stability of pavement performance, a novel approach called the shuffle attention-based pavement-sealed crack detection is proposed. This method consists of three essential components: the feature extraction network, the detection head, and the Wise Intersection over Union loss function. Within both the feature extraction network and the detection head, the shuffle attention module is integrated to capture the high-dimensional semantic information of pavement-sealed cracks by combining spatial and channel attention in parallel. The two-way detection head with multi-scale feature fusion efficiently combines contextual information for pavement-sealed crack detection. Additionally, the Wise Intersection over Union loss function dynamically adjusts the gradient gain, enhancing the accuracy of bounding box fitting and coverage area. Experimental results highlight the superiority of our proposed method, with higher mAP@0.5 (98.02%), Recall (0.9768), and F1-score (0.9680) values compared to the one-stage state-of-the-art methods, showcasing improvements of 0.81%, 1.8%, and 2.79%, respectively.

Journal Article

Share this book

Add to My Shelf

An explainable deep learning model for diabetic foot ulcer classification using swin transformer and efficient multi-scale attention-driven network

by Karthik, R. , K, Suganthi , Ajay, Armaano in 639/166/985 , 692/700 , Algorithms

2025

Diabetic Foot Ulcer (DFU) is a severe complication of diabetes mellitus, resulting in significant health and socio-economic challenges for the diagnosed individual. Severe cases of DFU can lead to lower limb amputation in diabetic patients, making their diagnosis a complex and costly process that poses challenges for medical professionals. Manual identification of DFU is particularly difficult due to their diverse visual characteristics, leading to multiple cases going undiagnosed. To address this challenge, Deep Learning (DL) methods offer an efficient and automated approach to facilitate timely treatment and improve patient outcomes. This research proposes a novel feature fusion-based model that incorporates two parallel tracks for efficient feature extraction. The first track utilizes the Swin transformer, which captures long-range dependencies by employing shifted windows and self-attention mechanisms. The second track involves the Efficient Multi-Scale Attention-Driven Network (EMADN), which leverages Light-weight Multi-scale Deformable Shuffle (LMDS) and Global Dilated Attention (GDA) blocks to extract local features efficiently. These blocks dynamically adjust kernel sizes and leverage attention modules, enabling effective feature extraction. To the best of our knowledge, this is the first work reporting the findings of a dual track architecture for DFU classification, leveraging Swin transformer and EMADN networks. The obtained feature maps from both the networks are concatenated and subjected to shuffle attention for feature refinement at a reduced computational cost. The proposed work also incorporates Grad-CAM-based Explainable Artificial Intelligence (XAI) to visualize and interpret the decision making of the network. The proposed model demonstrated better performance on the DFUC-2021 dataset, surpassing existing works and pre-trained CNN architectures with an accuracy of 78.79% and a macro F1-score of 80%.

Journal Article

Share this book

Add to My Shelf

Rock image classification based on improved EfficientNet

by Zhang, Zhaoshuo , Jin, Siyi , Bai, Kai in 639/705/117 , 704/2151/431 , Accuracy

2025

Rock image classification plays a crucial role in geological exploration, mineral resource development, and environmental monitoring. However, rock images often exhibit high intra-class similarity and low inter-class variation, posing challenges for accurate classification. Additionally, existing models often suffer from having a large number of parameters. To address these issues, we propose an enhanced rock classification model based on EfficientNet-B0. First, the DiffuseMix algorithm is applied to the training set to increase data diversity. Second, the shuffle attention mechanism is integrated into the backbone network to enhance feature extraction while reducing model parameters. Finally, the Lion optimizer is employed to optimize the training process, improving both the accuracy and stability of the model. The experimental results demonstrate that the improved model achieves an accuracy of 94.02% on the test set, outperforming ResNet50, MobileNetV3, and SwinTransformer by 8.02%, 9.35%, and 5.02%, respectively. Additionally, the model’s parameter count is significantly reduced to 3.38 million. The proposed model reduces the number of parameters while maintaining high recognition accuracy for rock image classification, providing a novel solution for intelligent rock recognition.

Journal Article

Share this book

Add to My Shelf

Feature enhanced cascading attention network for lightweight image super-resolution

by Shen, Ying , Liu, Hongwei , Huang, Feng in 639/705/117 , 639/705/794 , Convolution neural network

2025

Attention mechanisms have been introduced to exploit deep-level information for image restoration by capturing feature dependencies. However, existing attention mechanisms often have limited perceptual capabilities and are incompatible with low-power devices due to computational resource constraints. Therefore, we propose a feature enhanced cascading attention network (FECAN) that introduces a novel feature enhanced cascading attention (FECA) mechanism, consisting of enhanced shuffle attention (ESA) and multi-scale large separable kernel attention (MLSKA). Specifically, ESA enhances high-frequency texture features in the feature maps, and MLSKA executes the further extraction. The rich and fine-grained high-frequency information are extracted and fused from multiple perceptual layers, thus improving super-resolution (SR) performance. To validate FECAN’s effectiveness, we evaluate it with different complexities by stacking different numbers of high-frequency enhancement modules (HFEM) that contain FECA. Extensive experiments on benchmark datasets demonstrate that FECAN outperforms state-of-the-art lightweight SR networks in terms of objective evaluation metrics and subjective visual quality. Specifically, at a × 4 scale with a 121 K model size, compared to the second-ranked MAN-tiny, FECAN achieves a 0.07 dB improvement in average peak signal-to-noise ratio (PSNR), while reducing network parameters by approximately 19% and FLOPs by 20%. This demonstrates a better trade-off between SR performance and model complexity.

Journal Article

Share this book

Add to My Shelf

SiamPKHT: Hyperspectral Siamese Tracking Based on Pyramid Shuffle Attention and Knowledge Distillation

by Zhang, Shoujin , Qian, Kun , Wang, Shiqing in Algorithms , hyperspectral video , knowledge distillation

2023

Hyperspectral images provide a wealth of spectral and spatial information, offering significant advantages for the purpose of tracking objects. However, Siamese trackers are unable to fully exploit spectral features due to the limited number of hyperspectral videos. The high-dimensional nature of hyperspectral images complicates the model training process. In order to address the aforementioned issues, this article proposes a hyperspectral object tracking (HOT) algorithm callled SiamPKHT, which leverages the SiamCAR model by incorporating pyramid shuffle attention (PSA) and knowledge distillation (KD). First, the PSA module employs pyramid convolutions to extract multiscale features. In addition, shuffle attention is adopted to capture relationships between different channels and spatial positions, thereby obtaining good features with a stronger classification performance. Second, KD is introduced under the guidance of a pre-trained RGB tracking model, which deals with the problem of overfitting in HOT. Experiments using HOT2022 data indicate that the designed SiamPKHT achieves better performance compared to the baseline method (SiamCAR) and other state-of-the-art HOT algorithms. It also achieves real-time requirements at 43 frames per second.

Journal Article

Share this book

Add to My Shelf

Adaptive signal recognition in mines based on deep learning

by Wang, Mingbo , Wang, Anyi , Rong, Yi in 639/166/987 , 639/705/117 , Accuracy

2025

To address the challenges of low recognition accuracy and high system complexity arising from the coexistence of multiple wireless communication technologies and severe signal interference in the complex wireless environment of coal mines, this paper proposes a deep learning-based adaptive signal recognition method. By incorporating grouped residual convolution and channel shuffling techniques, the proposed method significantly reduces the number of model parameters (37% fewer than the original WaveNet) while utilizing dilated causal convolution to capture long-range dependencies in the signal, thereby enhancing the model’s ability to discriminate multipath interference features. The introduction of a dynamic channel attention mechanism facilitates adaptive adjustment of feature weights, emphasizing key features while suppressing noise interference, thereby improving recognition accuracy. Experimental results demonstrate that the Group Residual Shuffle Attention WaveNet achieves average recognition rates of 93.2% and 94.5% on the public dataset (RML2016.10a) and a simulated dataset, respectively, outperforming other methods (such as CTDNN) by more than 1.5% in recognition accuracy, while improving inference speed by over 14%. The proposed method performs well on general datasets and effectively adapts to complex signal recognition tasks in mine environments, providing an efficient and reliable solution for intelligent mine communication.

Journal Article

Share this book

Add to My Shelf

CSSA-YOLO: Cross-Scale Spatiotemporal Attention Network for Fine-Grained Behavior Recognition in Classroom Environments

by Cheng, Yuhua , Zhou, Liuchen , Guan, Xiqiang in Accuracy , Algorithms , Artificial intelligence

2025

Under a student-centered educational paradigm, project-based learning (PBL) assessment requires accurate identification of classroom behaviors to facilitate effective teaching evaluations and the implementation of personalized learning strategies. The increasing use of visual and multi-modal sensors in smart classrooms has made it possible to continuously capture rich behavioral data. However, challenges such as lighting variations, occlusions, and diverse behaviors complicate sensor-based behavior analysis. To address these issues, we introduce CSSA-YOLO, a novel detection network that incorporates cross-scale feature optimization. First, we establish a C2fs module that captures spatiotemporal dependencies in small-scale actions such as hand-raising through hierarchical window attention. Second, a Shuffle Attention mechanism is then integrated into the neck to suppress interference from complex backgrounds, thereby enhancing the model’s ability to focus on relevant features. Finally, to further enhance the network’s ability to detect small targets and complex boundary behaviors, we utilize the WIoU loss function, which dynamically weights gradients to optimize the localization accuracy of occluded targets. Experiments involving the SCB03-S dataset showed that CSSA-YOLO outperforms traditional methods, achieving an mAP50 of 76.0%, surpassing YOLOv8m by 1.2%, particularly in complex background and occlusion scenarios. Furthermore, it reaches 78.31 FPS, meeting the requirements for real-time application. This study offers a reliable solution for precise behavior recognition in classroom settings, supporting the development of intelligent education systems.

Journal Article

Share this book

Add to My Shelf

Aero-YOLO: An Efficient Vehicle and Pedestrian Detection Algorithm Based on Unmanned Aerial Imagery

by Li, Jun , Li, Zhongheng , Shao, Yifan in Accuracy , Aerial photography , Aerial targets

2024

The cost-effectiveness, compact size, and inherent flexibility of UAV technology have garnered significant attention. Utilizing sensors, UAVs capture ground-based targets, offering a novel perspective for aerial target detection and data collection. However, traditional UAV aerial image recognition techniques suffer from various drawbacks, including limited payload capacity, resulting in insufficient computing power, low recognition accuracy due to small target sizes in images, and missed detections caused by dense target arrangements. To address these challenges, this study proposes a lightweight UAV image target detection method based on YOLOv8, named Aero-YOLO. The specific approach involves replacing the original Conv module with GSConv and substituting the C2f module with C3 to reduce model parameters, extend the receptive field, and enhance computational efficiency. Furthermore, the introduction of the CoordAtt and shuffle attention mechanisms enhances feature extraction, which is particularly beneficial for detecting small vehicles from a UAV perspective. Lastly, three new parameter specifications for YOLOv8 are proposed to meet the requirements of different application scenarios. Experimental evaluations were conducted on the UAV-ROD and VisDrone2019 datasets. The results demonstrate that the algorithm proposed in this study improves the accuracy and speed of vehicle and pedestrian detection, exhibiting robust performance across various angles, heights, and imaging conditions.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter