Catalogue Search | MBRL

Underwater object detection algorithm based on attention mechanism and cross-stage partial fast spatial pyramidal pooling

by Zhou, Zhuang , Xuanyuan, Zhe , Yan, Jinghui in ACFP-YOLO , attention , SPPFCSPC

2022

For the routine target detection algorithm in the underwater complex environment to obtain the image of the existence of blurred images, complex background and other phenomena, leading to difficulties in model feature extraction, target miss detection and other problems. Meanwhile, an improved YOLOv7 model is proposed in order to improve the accuracy and real-time performance of the underwater target detection model. The improved model is based on the single-stage target detection model YOLOv7, incorporating the CBAM attention mechanism in the model, so that the feature information of the detection target is weighted and enhanced in the spatial dimension and the channel dimension, capturing the local relevance of feature information, making the model more focused on target feature information, improved detection accuracy, and using the SPPFCSPC module, reducing the computational effort of the model while keeping the model perceptual field unchanged, improved inference speed of the model. After a large number of comparison experiments and ablation experiments, it is proved that our proposed ACFP-YOLO algorithm model has higher detection accuracy compared with Efficientdet, Faster-RCNN, SSD, YOLOv3, YOLOv4, YOLOv5 models and the latest YOLOv7 model, and is more accurate for target detection tasks in complex underwater environments advantages.

Journal Article

Share this book

Add to My Shelf

YOLOv8-MU: An Improved YOLOv8 Underwater Detector Based on a Large Kernel Block and a Multi-Branch Reparameterization Module

by Zhuang, Xiting , Zhang, Jian , Zhang, Yiwen in Accuracy , Algorithms , Analysis

2024

Underwater visual detection technology is crucial for marine exploration and monitoring. Given the growing demand for accurate underwater target recognition, this study introduces an innovative architecture, YOLOv8-MU, which significantly enhances the detection accuracy. This model incorporates the large kernel block (LarK block) from UniRepLKNet to optimize the backbone network, achieving a broader receptive field without increasing the model’s depth. Additionally, the integration of C2fSTR, which combines the Swin transformer with the C2f module, and the SPPFCSPC_EMA module, which blends Cross-Stage Partial Fast Spatial Pyramid Pooling (SPPFCSPC) with attention mechanisms, notably improves the detection accuracy and robustness for various biological targets. A fusion block from DAMO-YOLO further enhances the multi-scale feature extraction capabilities in the model’s neck. Moreover, the adoption of the MPDIoU loss function, designed around the vertex distance, effectively addresses the challenges of localization accuracy and boundary clarity in underwater organism detection. The experimental results on the URPC2019 dataset indicate that YOLOv8-MU achieves an mAP@0.5 of 78.4%, showing an improvement of 4.0% over the original YOLOv8 model. Additionally, on the URPC2020 dataset, it achieves 80.9%, and, on the Aquarium dataset, it reaches 75.5%, surpassing other models, including YOLOv5 and YOLOv8n, thus confirming the wide applicability and generalization capabilities of our proposed improved model architecture. Furthermore, an evaluation on the improved URPC2019 dataset demonstrates leading performance (SOTA), with an mAP@0.5 of 88.1%, further verifying its superiority on this dataset. These results highlight the model’s broad applicability and generalization capabilities across various underwater datasets.

Journal Article

Share this book

Add to My Shelf

Small traffic sign recognition method based on improved YOLOv7

by Meng, Bo , Shi, Weida in 639/705/1042 , 639/705/258 , Algorithms

2025

As autonomous and assisted driving technologies progress rapidly, the significance of traffic sign recognition intensifies. Currently, the detection accuracy of algorithms for traffic sign recognition remains suboptimal, particularly when identifying small traffic signs amid complex backgrounds and under inadequate lighting, leading frequently to errors in detection. This paper introduces an enhanced method for small traffic sign recognition, underpinned by an improved version of YOLOv7. Initially, The Spatial Pyramid Pooling Fast and Cross-Stage Partial Connection (SPPFCSPC) strategy was used to improve the feature extraction of small targets. Subsequently, a Shuffle Attention-CARAFE (S-CARAFE) up-sampling operator is crafted. S-CARAFE refocuses on key features within the input data, boosting the information detail and improving feature recombination. Finally, the introduction of a new Normalized Wasserstein Distance (NWD) method resolves the traditional IoU measurement’s sensitivity to small-target traffic signs. Experimental results show that the mAP@0.5 and mAP@0.5:0.9 values of the model trained on the TT100K dataset are increased by 3.48% and 2.29%, respectively. In the comparison with algorithms of a similar type, the proposed method achieved improvements of 2.61% and 2.12% in mAP@50 and mAP@50:95, respectively. Additionally, the algorithm’s improvements are validated on the small-target characteristics of the CCTSDB dataset and the sorted foreign traffic sign dataset, effectively elevating the recognition of small traffic signs across varying environments, consequently advancing the traffic sign recognition capacity of autonomous driving systems.

Journal Article

Share this book

Add to My Shelf

SE-Lightweight YOLO: Higher Accuracy in YOLO Detection for Vehicle Inspection

by Zhao, Xinyue , Song, Yunsheng , Niu, Chengwen in Accuracy , Algorithms , attention mechanism

2023

Against the backdrop of ongoing urbanization, issues such as traffic congestion and accidents are assuming heightened prominence, necessitating urgent and practical interventions to enhance the efficiency and safety of transportation systems. A paramount challenge lies in realizing real-time vehicle monitoring, flow management, and traffic safety control within the transportation infrastructure to mitigate congestion, optimize road utilization, and curb traffic accidents. In response to this challenge, the present study leverages advanced computer vision technology for vehicle detection and tracking, employing deep learning algorithms. The resultant recognition outcomes provide the traffic management domain with actionable insights for optimizing traffic flow management and signal light control through real-time data analysis. The study demonstrates the applicability of the SE-Lightweight YOLO algorithm, as presented herein, showcasing a noteworthy 95.7% accuracy in vehicle recognition. As a prospective trajectory, this research stands poised to serve as a pivotal reference for urban traffic management, laying the groundwork for a more efficient, secure, and streamlined transportation system in the future. To solve the existing vehicle detection problems in vehicle type recognition, recognition and detection accuracy need to be improved, alongside resolving the issues of slow detection speed, and others. In this paper, we made innovative changes based on the YOLOv7 framework: we added the SE attention transfer mechanism in the backbone module, and the model achieved better results, with a 1.2% improvement compared with the original YOLOv7. Meanwhile, we replaced the SPPCSPC module with the SPPFCSPC module, which enhanced the trait extraction of the model. After that, we applied the SE-Lightweight YOLO to the field of traffic monitoring. This can assist transportation-related personnel in traffic monitoring and aid in creating big data on transportation. Therefore, this research has a good application prospect.

Journal Article

Share this book

Add to My Shelf

Improved YOLOv7-based steel surface defect detection algorithm

by Han, Xiaowei , Hao, Yan , Yin, Biao in Accuracy , Algorithms , Datasets

2024

In response to the limited detection ability and low model generalization ability of the YOLOv7 algorithm for small targets, this paper proposes a detection algorithm based on the improved YOLOv7 algorithm for steel surface defect detection. First, the Transformer-InceptionDWConvolution (TI) module is designed, which combines the Transformer module and InceptionDWConvolution to increase the network's ability to detect small objects. Second, the spatial pyramid pooling fast cross-stage partial channel (SPPFCSPC) structure is introduced to enhance the network training performance. Third, a global attention mechanism (GAM) attention mechanism is designed to optimize the network structure, weaken the irrelevant information in the defect image, and increase the algorithm's ability to detect small defects. Meanwhile, the Mish function is used as the activation function of the feature extraction network to improve the model's generalization ability and feature extraction ability. Finally, a minimum partial distance intersection over union (MPDIoU) loss function is designed to locate the loss and solve the mismatch problem between the complete intersection over union (CIoU) prediction box and the real box directions. The experimental results show that on the Northeastern University Defect Detection (NEU-DET) dataset, the improved YOLOv7 network model improves the mean Average precision (mAP) performance by 6% when compared to the original algorithm, while on the VOC2012 dataset, the mAP performance improves by 2.6%. These results indicate that the proposed algorithm can effectively improve the small defect detection performance on steel surface defects.

Journal Article

Share this book

Add to My Shelf

Enhanced YOLOv8s with Multi-Teacher Distillation for Steel Cord Ply Defect Detection

by Huang, Guangzhan , Zhang, Xinlong , Long, Rui in Accuracy , Algorithms , Assembly lines

2026

To improve detection accuracy for color-sensitive and small-target defects in steel cord ply, this paper introduces an improved YOLOv8s algorithm using multi-teacher stepwise hierarchical knowledge distillation for better adaptation across production lines. The improvements include: replacing the initial backbone convolutional layer with RGBV grouped convolution to enhance color feature extraction; substituting the SPPF module with SPPFCSPC-LSKA to improve multi-scale perception; and optimizing bounding box accuracy with the WIoU loss function. The multi-teacher distillation approach first transfers color feature learning using an RGBV-only teacher, then multi-scale feature learning with an SPPFCSPC-LSKA-only teacher. Experimental results show the improved model achieved 90.4% precision, 92.0% recall, 91.2% F1-score, and 97.2% mAP@0.5, surpassing the baseline YOLOv8s by 1.9, 2.2, 2.1, and 3.4 percentage points, respectively. The proposed model also achieves an inference time of 3.9 ms, representing a 1.0 ms reduction compared to the baseline. On a smaller dataset from another production line, single-teacher distillation increased precision, recall, F1-score, and mAP@0.5 to 84.6%, 82.0%, 83.3%, and 88.8%, respectively, albeit with an increase in inference time. The multi-teacher strategy further increased metrics to 97.5% precision, 88.8% recall, 92.9% F1-score, and 94.3% mAP@0.5, providing additional gains over single-teacher distillation while maintaining the same parameter count of 11.127 M and achieving a faster inference time of 4.1 ms on the target production line.

Journal Article

Share this book

Add to My Shelf

Space to depth convolution bundled with coordinate attention for detecting surface defects

by Liu, Gang , Yu, Haoyang , Wang, Lei in Accuracy , Algorithms , Channels

2024

Surface defects of steel plates unavoidably exist during the industrial production proceeding due to the complex productive technologies and always exhibit some typical characteristics, such as irregular shape, random position, and various size. Therefore, detecting these surface defects with high performance is crucial for producing high-quality products in practice. In this paper, an improved network with high performance based on You Only Look Once version 5 (YOLOv5) is proposed for detecting surface defects of steel plates. Firstly, the Space to Depth Convolution (SPD-Conv) is utilized to make the feature information transforming from space to depth, helpful for preserving the entirety of discriminative feature information to the greatest extent under the proceeding of down-sampling. Subsequently, the coordinate attention mechanism is introduced and embedded into the bottleneck of C3 modules to effectively enhance the weights of some important feature channels, in favor of capturing more important feature information from different channels after SPD-Conv operations. Finally, the Spatial Pyramid Pooling Faster module is replaced by the Spatial Pyramid Pooling Fully Connected Spatial Pyramid Convolution module to further enhance the feature expression capability and efficiently realize the multi-scale feature fusion. The experimental results on NEU-DET dataset show that, compared with YOLOv5, the mAP and mAP50 dramatically increase from 51.7, 87.0 to 61.4, 92.6%, respectively. Meanwhile, the frame rate of 250 FPS implies that it still preserves a well real-time performance. Undoubtedly, the improved algorithm proposed in this paper exhibits outstanding performance, which may be also used to recognize the surface defects of aluminum plates, as well as plastic plates, armor plates and so on in the future.

Journal Article

Share this book

Add to My Shelf

Study on an Improved YOLOv7-Based Algorithm for Human Head Detection

by Yan, Weidong , Wu, Dong , Wang, Jingli in Accuracy , Algorithms , Artificial neural networks

2025

In response to the decreased accuracy in person detection caused by densely populated areas and mutual occlusions in public spaces, a human head-detection approach is employed to assist in detecting individuals. To address key issues in dense scenes—such as poor feature extraction, rough label assignment, and inefficient pooling—we improved the YOLOv7 network in three aspects: adding attention mechanisms, enhancing the receptive field, and applying multi-scale feature fusion. First, a large amount of surveillance video data from crowded public spaces was collected to compile a head-detection dataset. Then, based on YOLOv7, the network was optimized as follows: (1) a CBAM attention module was added to the neck section; (2) a Gaussian receptive field-based label-assignment strategy was implemented at the junction between the original feature-fusion module and the detection head; (3) the SPPFCSPC module was used to replace the multi-space pyramid pooling. By seamlessly uniting CBAM, RFLAGauss, and SPPFCSPC, we establish a novel collaborative optimization framework. Finally, experimental comparisons revealed that the improved model’s accuracy increased from 92.4% to 94.4%; recall improved from 90.5% to 93.9%; and inference speed increased from 87.2 frames per second to 94.2 frames per second. Compared with single-stage object-detection models such as YOLOv7 and YOLOv8, the model demonstrated superior accuracy and inference speed. Its inference speed also significantly outperforms that of Faster R-CNN, Mask R-CNN, DINOv2, and RT-DETRv2, markedly enhancing both small-object (head) detection performance and efficiency.

Journal Article

Share this book

Add to My Shelf

Research on Ship-Engine-Room-Equipment Detection Based on Deep Learning

by Chen, Ruoshui , Zhang, Jundong , Shen, Haosheng in Accuracy , Automation , Data augmentation

2024

The visual monitoring of ship-engine-room equipment is an essential component of ship-cabin intelligence. In response to issues such as imbalanced quantities of different categories of engine room equipment and severe occlusion, this paper presents improvements to YOLOv8-M. Firstly, the introduction of the SPPFCSPC module enhances the feature extraction capabilities of the backbone extraction network. Subsequently, improvements are implemented in the neck network to create GCFPN, facilitating further feature fusion, and introducing the Dynamic Head module, which fuses the deformable convolution, in the part of the detection head, so as to improve the performance of the network. Finally, the FOCAL EIOU LOSS is introduced, while mitigating the impact of dataset imbalance through class-wise data augmentation. In this paper, the ship cabin equipment dataset and the public dataset MS COCO2017 are evaluated. Compared with YOLOv8-M, the mAP50 of GCD-YOLOv8 is improved by 2.6% and 0.4%, respectively.

Journal Article

Share this book

Add to My Shelf

Comparative Performance Analysis of YOLOv10-Based Models with CBAM and SPPFCSPC for Body Condition Score Assessment in Beef Cattle

by Utaminingrum, F. , Ariadi, F. , Atmoko, B.A. in Accuracy , Automation , Beef

2025

Body condition score assessment serves as a critical metric for evaluating the health, nutritional status, and overall well-being of beef cattle, playing a pivotal role in herd management and productivity optimization. Traditional manual BCS assessment methods are inherently subjective, labor-intensive, and impractical for large-scale operations, thereby necessitating an automated and data-driven approach. This study investigates the performance of YOLOv10-based deep learning models, incorporating the convolutional block attention module (CBAM) and spatial pyramid pooling-fast cross-stage partial connections (SPPFCSPC) to enhance feature extraction, classification accuracy, and computational efficiency in BCS estimation. A total of 432 annotated images representing five BCS categories (1–5) were used for model training and evaluation. The models were assessed using precision, recall, and F1 Score, with expert-labeled ground truth ensuring robustness. Results show that the YOLOv10x variant achieved the highest classification accuracy of 88.2%, highlighting its superior detection capability. YOLOv10m exhibited a balanced trade-off between accuracy and computational efficiency, achieving an F1 Score of 79.2%. The integration of CBAM improved precision but slightly reduced recall, whereas SPPFCSPC enhanced recall at the expense of increased computational complexity. Notably, YOLOv10n achieved the fastest inference time of 1.0 ms but with a lower accuracy of 82.4%, underscoring the trade-off between model depth and real-time applicability. These findings validate the effectiveness of attention-based and multi-scale feature learning strategies for improving the automation of BCS classification in beef cattle.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter