Catalogue Search | MBRL

The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection

by Ali, Momina Liaqat , Zhang, Zhou in Accuracy , Algorithms , Analysis

2024

This paper provides a comprehensive review of the YOLO (You Only Look Once) framework up to its latest version, YOLO 11. As a state-of-the-art model for object detection, YOLO has revolutionized the field by achieving an optimal balance between speed and accuracy. The review traces the evolution of YOLO variants, highlighting key architectural improvements, performance benchmarks, and applications in domains such as healthcare, autonomous vehicles, and robotics. It also evaluates the framework’s strengths and limitations in practical scenarios, addressing challenges like small object detection, environmental variability, and computational constraints. By synthesizing findings from recent research, this work identifies critical gaps in the literature and outlines future directions to enhance YOLO’s adaptability, robustness, and integration into emerging technologies. This review provides researchers and practitioners with valuable insights to drive innovation in object detection and related applications.

Journal Article

Share this book

Add to My Shelf

Traditional village preservation status evaluation and optimization based on YOLOv10 and random forest model—take the Tibetan-Qiang region in northwest Sichuan as an example

by Wang, Yi , Ao, Yuxin , Huang, Zining in Preservation , Random forest model , Tibetan-Qiang region

2026

This study proposes a framework to automate the evaluation of traditional village preservation status and analyze the major influential factors (MIFs) of preservation status and influencing mechanisms through YOLOv10 model and Random Forest model, taking Tibetan-Qiang region of northwest Sichuan as the study area. The framework adopts satellite maps, based on the YOLOv10 model, to comprehensively detect the preservation status of houses in the traditional villages, and calculates the preservation score of the corresponding villages as an evaluation of their preservation status. Further, through the Feature importance of Random Forest model, the MIFs of the village preservation status are filtered from the multiple environmental factors, and the SHAP value resolves the influencing intensity of the MIFs on the preservation status. Finally, for villages with poor preservation status, targeted preservation strategies are proposed. The contribution of this framework is saving the cost of traditional field research and significantly improving the efficiency and scope of the evaluations. Besides, the results also fill the gap in evaluating the preservation status and analyzing their influencing mechanisms of the traditional villages in Tibetan-Qiang region, and support the decision makers to propose more targeted optimization strategies.

Journal Article

Share this book

Add to My Shelf

Dsup.3-YOLOv10: Improved YOLOv10-Based Lightweight Tomato Detection Algorithm Under Facility Scenario

by Wang, Qiyang , Ji, Tongtong , Li, Ao in Agriculture , Algorithms

2024

Accurate and efficient tomato detection is one of the key techniques for intelligent automatic picking in the area of precision agriculture. However, under the facility scenario, existing detection algorithms still have challenging problems such as weak feature extraction ability for occlusion conditions and different fruit sizes, low accuracy on edge location, and heavy model parameters. To address these problems, this paper proposed D[sup.3]-YOLOv10, a lightweight YOLOv10-based detection framework. Initially, a compact dynamic faster network (DyFasterNet) was developed, where multiple adaptive convolution kernels are aggregated to extract local effective features for fruit size adaption. Additionally, the deformable large kernel attention mechanism (D-LKA) was designed for the terminal phase of the neck network by adaptively adjusting the receptive field to focus on irregular tomato deformations and occlusions. Then, to further improve detection boundary accuracy and convergence, a dynamic FM-WIoU regression loss with a scaling factor was proposed. Finally, a knowledge distillation scheme using semantic frequency prompts was developed to optimize the model for lightweight deployment in practical applications. We evaluated the proposed framework using a self-made tomato dataset and designed a two-stage category balancing method based on diffusion models to address the sample class-imbalanced issue. The experimental results demonstrated that the D[sup.3]-YOLOv10 model achieved an mAP[sub.0.5] of 91.8%, with a substantial reduction of 54.0% in parameters and 64.9% in FLOPs, compared to the benchmark model. Meanwhile, the detection speed of 80.1 FPS more effectively meets the demand for real-time tomato detection. This study can effectively contribute to the advancement of smart agriculture research on the detection of fruit targets.

Journal Article

Share this book

Add to My Shelf

Effectiveness of traditional augmentation methods for rebar counting using UAV imagery with Faster R-CNN and YOLOv10-based transformer architectures

by Wang, Seunghyeon in 631/114 , 639/166 , 639/705

2025

Accurate inspection of Reinforced Concrete (RC) structures requires precise rebar counting. Although deep-learning object detectors can extract this information from drone imagery, their effectiveness depends on large, diverse, and well-labeled datasets. Image augmentation can increase data variability, yet its impact on Unmanned Aerial Vehicles (UAVs)-based rebar counting has been underexplored. This study systematically evaluates ten augmentation methods—brightness, contrast, perspective, rotation, scale, shearing, translation, blurring, a probabilistic sampling policy, and a sum of techniques composition—using Faster R-CNN and YOLOv10 across six backbones (ResNet-101, ResNet-152, MobileNetV3; ViT, PVT, Swin Transformer). Performance is reported using AP50, AP50:95, and exact-count accuracy. Results show that augmentation efficacy is both architecture and metric-dependent. The best test-set configuration is YOLOv10–PVT with shearing, which achieves AP50 = 87.71%, AP50:95 = 68.53%, and rebar-count accuracy = 86.27%—improvements of + 5.92, + 9.07, and + 5.99 percentage points, respectively, over the PVT original baseline. A probabilistic sampling policy provides consistent, policy-level gains over original data and approaches the best single transform (especially with a magnitude ramp), whereas indiscriminate a sum of techniques application does not reliably outperform the top single augmentation.

Journal Article

Share this book

Add to My Shelf

Automated non-PPE detection on construction sites using YOLOv10 and transformer architectures for surveillance and body worn cameras with benchmark datasets

by Wang, Seunghyeon in 639/166 , 639/166/986 , Accuracy

2025

Ensuring proper Personal Protective Equipment (PPE) compliance is crucial for maintaining worker safety and reducing accident risks on construction sites. Previous research has explored various object detection methodologies for automated monitoring of non-PPE compliance; however, achieving higher accuracy and computational efficiency remains critical for practical real-time applications. Addressing this challenge, the current study presents an extensive evaluation of You Only Look Once version 10 (YOLOv10)-based object detection models designed specifically to detect essential PPE items such as helmets, masks, vests, gloves, and shoes. The analysis utilized an extensive dataset gathered from multiple sources, including surveillance cameras, body-worn camera footage, and publicly accessible benchmark datasets, ensuring thorough and realistic evaluation conditions. The analysis was conducted using an extensive dataset compiled from multiple sources, including surveillance cameras, body-worn camera footage, and publicly available benchmark datasets, to ensure a thorough evaluation under realistic conditions. Experimental outcomes revealed that the Swin Transformer-based YOLOv10 model delivered the best overall performance, achieving AP50 scores of 92.4% for non-helmet, 88.17% for non-mask, 87.17% for non-vest, 85.36% for non-glove, and 83.48% for non-shoes, with an overall average AP50 of 87.32%. Additionally, these findings underscored the superior performance of transformer-based architectures compared to traditional detection methods across multiple backbone configurations. The paper concludes by discussing the practical implications, potential limitations, and broader applicability of the YOLOv10-based approach, while also highlighting opportunities and directions for future advancements.

Journal Article

Share this book

Add to My Shelf

BGF-YOLOv10: Small Object Detection Algorithm from Unmanned Aerial Vehicle Perspective Based on Improved YOLOv10

by Mei, Junhui , Zhu, Wenqiu in Accuracy , Algorithms , Artificial intelligence

2024

With the rapid development of deep learning, unmanned aerial vehicles (UAVs) have acquired intelligent perception capabilities, demonstrating efficient data collection across various fields. In UAV perspective scenarios, captured images often contain small and unevenly distributed objects, and are typically high-resolution. This makes object detection in UAV imagery more challenging compared to conventional detection tasks. To address this issue, we propose a lightweight object detection algorithm, BGF-YOLOv10, specifically designed for small object detection, based on an improved version of YOLOv10n. First, we introduce a novel YOLOv10 architecture tailored for small objects, incorporating BoTNet, variants of C2f and C3 in the backbone, along with an additional small object detection head, to enhance detection performance for small objects. Second, we embed GhostConv into both the backbone and head, effectively reducing the number of parameters by nearly half. Finally, we insert a Patch Expanding Layer module in the neck to restore the feature spatial resolution. Experimental results on the VisDrone-DET2019 and UAVDT datasets demonstrate that our method significantly improves detection accuracy compared to YOLO series networks. Moreover, when compared to other state-of-the-art networks, our approach achieves a substantial reduction in the number of parameters.

Journal Article

Share this book

Add to My Shelf

A Novel YOLOv10-Based Algorithm for Accurate Steel Surface Defect Detection

by Fu, Jianglong , Wu, Shouluan , Song, Chao in Accuracy , Algorithms , Automation

2025

To address challenges like manual processes, complicated detection methods, high false alarm rates, and frequent errors in identifying defects on steel surfaces, this research presents an innovative detection system, YOLOv10n-SFDC. The study focuses on the complex dependencies between parameters used for defect detection, particularly the interplay between feature extraction, fusion, and bounding box regression, which often leads to inefficiencies in traditional methods. YOLOv10n-SFDC incorporates advanced elements such as the DualConv module, SlimFusionCSP module, and Shape-IoU loss function, improving feature extraction, fusion, and bounding box regression to enhance accuracy. Testing on the NEU-DET dataset shows that YOLOv10n-SFDC achieves a mean average precision (mAP) of 85.5% at an Intersection over Union (IoU) threshold of 0.5, a 6.3 percentage point improvement over the baseline YOLOv10. The system uses only 2.67 million parameters, demonstrating efficiency. It excels in identifying complex defects like ’rolled in scale’ and ’inclusion’. Compared to SSD and Fast R-CNN, YOLOv10n-SFDC outperforms these models in accuracy while maintaining a lightweight architecture. This system excels in automated inspection for industrial environments, offering rapid, precise defect detection. YOLOv10n-SFDC emerges as a reliable solution for the continuous monitoring and quality assurance of steel surfaces, improving the reliability and efficiency of steel manufacturing processes.

Journal Article

Share this book

Add to My Shelf

YOLO-CBD: Classroom Behavior Detection Method Based on Behavior Feature Extraction and Aggregation

by Wang, Peng , Zhang, Xiaopei , Peng, Shuyun in Accuracy , Algorithms , classroom behavior detection

2025

Classroom behavior can effectively reflect learning states, and thus classroom behavior detection is crucial for improving teaching methods and enhancing teaching quality. To address issues such as severe occlusions and large scale variations in student behavior detection, this paper proposes a classroom behavior detection model, named YOLO-CBD (YOLOv10s Classroom Behavior Detection). Firstly, BiFormer attention is introduced to redesign the Efficientv2 network, leading to a novel backbone network for efficient feature extraction of student classroom behaviors. The proposed attention module enables accurate localization in densely populated student settings. Secondly, a novel feature aggregation module is designed for replacing the basic C2f module in the YOLOv10s neck network and enhancing the capability to detect occluded targets effectively. Additionally, a feature pyramid network with efficient feature fusion is constructed to address inconsistencies among features of different scales. Finally, the Wise-IoU loss function is incorporated to handle sample imbalance issues. Experimental results show that, compared to the baseline model, YOLO-CBD improves precision by 5.7%, recall by 3.7%, and mAP50 by 3.5%, achieving effective classroom behavior detection.

Journal Article

Share this book

Add to My Shelf

Vehicle detection in drone aerial views based on lightweight OSD-YOLOv10

by Wang, Yuanyuan , Zhang, Yang , You, Hongfeng in 639/705/1042 , 639/705/117 , 639/705/258

2025

To address the challenges of low performance in vehicle image detection from UAV aerial imagery, difficulties in small target feature extraction, and the large parameter size of existing models, we propose the OSD-YOLOv10 algorithm, an enhanced version based on YOLOv10n. The proposed algorithm incorporates several key innovations: First, we employ online convolutional reparameterization to construct the OCRConv module and design a lightweight feature extraction structure, SPCC, to replace the conventional C2f module, thereby reducing computational load and parameter count. Second, we integrate an efficient dual-layer feed-forward hybrid attention module to enhance the model’s feature extraction capabilities. We also construct a dual small-target detection layer that combines shallow and ultra-shallow features to improve small-target detection. Finally, we introduce the DySample dynamic upsampling module to enhance feature fusion in the neck network from a point sampling perspective. Extensive experiments on the VisDrone-DET2019 and UAVDT datasets demonstrate that OSD-YOLOv10 achieves a 40.7% reduction in parameter count and a 3.6% decrease in floating-point operations, while improving accuracy and mean average precision by 1.3% and 1.6%, respectively. Compared to other YOLO series and lightweight models, OSD-YOLOv10 exhibits superior detection accuracy and lower computational complexity, achieving an optimal balance between high accuracy and low resource consumption. These advancements make it particularly suitable for deployment in UAV onboard hardware for vehicle target detection tasks. Code will be available online ( https://github.com/Z76y/OSD-YOLO ).

Journal Article

Share this book

Add to My Shelf

Defect Detection in GIS X-Ray Images Based on Improved YOLOv10

by Xu, Guoliang , Huang, Menghao , Bai, Xiaolong in Accuracy , Automation , defect detection

2025

Timely and accurate detection of internal defects in Gas-Insulated Switchgear (GIS) with X-ray imaging is critical for power system reliability. However, automated detection faces significant challenges from small, low-contrast defects and complex background structures. This paper proposes an enhanced object-detection model based on the lightweight YOLOv10n framework, specifically optimized for this task. Key improvements include adopting the Normalized Wasserstein Distance (NWD) loss function for small object localization, integrating Monte Carlo (MCAttn) and Parallelized Patch-Aware (PPA) attention to enhance feature extraction, and designing a GFPN-inspired neck for improved multi-scale feature fusion. The model was rigorously evaluated on a custom GIS X-ray dataset. The final model achieved a mean Average Precision (mAP) of 0.674 (IoU 0.5:0.95), representing a 5.0 percentage point improvement over the YOLOv10n baseline and surpassing other comparative models. Qualitative results also confirmed the model’s enhanced capability in detecting challenging small and low-contrast defects. This study presents an effective approach for automated GIS defect detection, with significant potential to enhance power grid maintenance efficiency and safety.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter