Catalogue Search | MBRL

A Large Kernel Convolutional Neural Network with a Noise Transfer Mechanism for Real-Time Semantic Segmentation

by Tang, Xing , Liu, Jinhang , Du, Yuhe in Accuracy , computer vision , Design

2025

In semantic segmentation tasks, large kernels and Atrous convolution have been utilized to increase the receptive field, enabling models to achieve competitive performance with fewer parameters. However, due to the fixed size of kernel functions, networks incorporating large convolutional kernels are limited in adaptively capturing multi-scale features and fail to effectively leverage global contextual information. To address this issue, we combine Atrous convolution with large kernel convolution, using different dilation rates to compensate for the single-scale receptive field limitation of large kernels. Simultaneously, we employ a dynamic selection mechanism to adaptively highlight the most important spatial features based on global information. Additionally, to enhance the model’s ability to fit the true label distribution, we propose a Multi-Scale Contextual Noise Transfer Matrix (NTM), which uses high-order consistency information from neighborhood representations to estimate NTM and correct supervision signals, thereby improving the model’s generalization capability. Extensive experiments conducted on Cityscapes, ADE20K, and COCO-Stuff-10K demonstrate that this approach achieves a new state-of-the-art balance between speed and accuracy. Specifically, LKNTNet achieves 80.05% mIoU on Cityscapes with an inference speed of 80.7 FPS and 42.7% mIoU on ADE20K with an inference speed of 143.6 FPS.

Journal Article

Share this book

Add to My Shelf

MS-YOLOv8-Based Object Detection Method for Pavement Diseases

by Lin, Ciyun , Han, Zhibin , Liu, Anqi in Accuracy , Algorithms , Cracks

2024

Detection of pavement diseases is crucial for road maintenance. Traditional methods are costly, time-consuming, and less accurate. This paper introduces an enhanced pavement disease recognition algorithm, MS-YOLOv8, which modifies the YOLOv8 model by incorporating three novel mechanisms to improve detection accuracy and adaptability to varied pavement conditions. The Deformable Large Kernel Attention (DLKA) mechanism adjusts convolution kernels dynamically, adapting to multi-scale targets. The Large Separable Kernel Attention (LSKA) enhances the SPPF feature extractor, boosting multi-scale feature extraction capabilities. Additionally, Multi-Scale Dilated Attention in the network’s neck performs Spatially Weighted Dilated Convolution (SWDA) across different dilatation rates, enhancing background distinction and detection precision. Experimental results show that MS-YOLOv8 increases background classification accuracy by 6%, overall precision by 1.9%, and mAP by 1.4%, with specific disease detection mAP up by 2.9%. Our model maintains comparable detection speeds. This method offers a significant reference for automatic road defect detection.

Journal Article

Share this book

Add to My Shelf

Supervised pre-stack seismic reflection pattern analysis based on physics-attribute guidance and active learning data augmentation

by Wang, Yaojun , Feng, Qingyu , Chen, Yuxi

2025

Abstract Seismic reflection pattern analysis or seismic facies analysis is crucial for subsurface reservoir prediction. Supervised pre-stack reflection pattern analysis using well-logging data can fully utilize abundant reservoir information in pre-stack data, and provide clearer physical interpretations than unsupervised methods. However, pre-stack seismic data are high-dimensional, and the well-logging data labels are limited. Traditional convolutional neural network-based approaches face challenges in capturing long-range dependencies across different angle gathers in pre-stack seismic data due to the limitations of their receptive fields. Additionally, existing data augmentation methods lack constraints and physical guidance. To tackle these problems, we introduce a supervised pre-stack seismic reflection pattern analysis method based on the ConvNext network and incorporating physics-guided and active learning for label augmentation. The ConvNext model incorporates the large-kernel attention mechanism, enhancing the model's sensitivity to stratigraphic and spatial features in pre-stack seismic data. To reduce model ambiguity, we develop a label augmentation algorithm that combines active learning and physical attributes. The experiments on synthetic data and real data demonstrate that our method has better performance than the traditional approaches in pre-stack seismic reflection analysis.

Journal Article

Share this book

Add to My Shelf

BL-YOLOv8: An Improved Road Defect Detection Model Based on YOLOv8

by Wang, Xueqiu , Jia, Zemeng , Li, Zijian in Accuracy , Algorithms , BiFPN

2023

Road defect detection is a crucial task for promptly repairing road damage and ensuring road safety. Traditional manual detection methods are inefficient and costly. To overcome this issue, we propose an enhanced road defect detection algorithm called BL-YOLOv8, which is based on YOLOv8s. In this study, we optimized the YOLOv8s model by reconstructing its neck structure through the integration of the BiFPN concept. This optimization reduces the model’s parameters, computational load, and overall size. Furthermore, to enhance the model’s operation, we optimized the feature pyramid layer by introducing the SimSPPF module, which improves its speed. Moreover, we introduced LSK-attention, a dynamic large convolutional kernel attention mechanism, to expand the model’s receptive field and enhance the accuracy of object detection. Finally, we compared the enhanced YOLOv8 model with other existing models to validate the effectiveness of our proposed improvements. The experimental results confirmed the effective recognition of road defects by the improved YOLOv8 algorithm. In comparison to the original model, an improvement of 3.3% in average precision mAP@0.5 was observed. Moreover, a reduction of 29.92% in parameter volume and a decrease of 11.45% in computational load were achieved. This proposed approach can serve as a valuable reference for the development of automatic road defect detection methods.

Journal Article

Share this book

Add to My Shelf

Dual-Path Large Kernel Learning and Its Applications in Single-Image Super-Resolution

by Su, Zhen , Kou, Qiqi , Cheng, Deqiang in Algorithms , Artificial intelligence , Comparative analysis

2024

To enhance the performance of super-resolution models, neural networks frequently employ module stacking. However, this approach inevitably results in an excessive proliferation of parameter counts and information redundancy, ultimately constraining the deployment of these models on mobile devices. To surmount this limitation, this study introduces the application of Dual-path Large Kernel Learning (DLKL) to the task of image super-resolution. Within the DLKL framework, we harness a multiscale large kernel decomposition technique to efficiently establish long-range dependencies among pixels. This network not only maintains excellent performance but also significantly mitigates the parameter burden, achieving an optimal balance between network performance and efficiency. When compared with other prevalent algorithms, DLKL exhibits remarkable proficiency in generating images with sharper textures and structures that are more akin to natural ones. It is particularly noteworthy that on the challenging texture dataset Urban100, the network proposed in this study achieved a significant improvement in Peak Signal-to-Noise Ratio (PSNR) for the ×4 upscaling task, with an increase of 0.32 dB and 0.19 dB compared with the state-of-the-art HAFRN and MICU networks, respectively. This remarkable result not only validates the effectiveness of the present model in complex image super-resolution tasks but also highlights its superior performance and unique advantages in the field.

Journal Article

Share this book

Add to My Shelf

Towards more accurate object detection via encoding reinforcement and multi-channel enhancement

by Jumahong, Huxidan , Li, Shuangyong , Wang, Weina in Accuracy , Coding , Computer vision

2025

The existing object detection networks typically apply small kernel convolution that can extract sufficient features for recognizing targets but have poor long-range dependency capability and smaller receptive fields. This paper proposes an object detection network with structure featuring large kernel convolutions and multiple channels. Firstly, the encoding reinforcement module using large kernel convolutions is designed to enlarge the receptive field and improve global feature extraction. Then, the channel enhancement module is constructed to enhance structural information learning. In addition, the encoding reinforcement and channel enhancement are designed in a lightweight way. Finally, the WIOU loss function is introduced to enhance the model’s robustness in poor-quality datasets. In the experiments, the proposed model can achieve optimal performance with similar parameters or computational complexity to existing CNN-based lightweight models.

Journal Article

Share this book

Add to My Shelf

SwinCLNet: a robust framework for brain tumor segmentation via shifted window attention and cross-scale fusion

by Noh, Wonjong , Jin, Seyong , Moon, Hyeonjoon in 3D-U-Net , 631/114 , 631/67

2025

Despite significant breakthroughs in deep learning, brain tumor segmentation remains a challenging task due to the unclear tumor borders and the high degree of accuracy required. To overcome these concerns, we propose a new segmentation model, SwinCLNet, which integrates window-based multi-head self-attention, shifted window multi-head self-attention, cross-scale dual fusion, and residual large-kernel attention into the 3D U-Net architecture. First, the encoder employs the window-based multi-head and shifted window multi-head self-attention modules to capture rich contextual information. Second, the decoder employs the cross-scale dual fusion module, which precisely complements tumor boundary representation by fusing these enhanced features. Third, the SwinCLNet employs the residual large-kernel attention module over skip connections, using large-kernel attention to expand the receptive field and capture long-range spatial dependencies. Testing using the BraTS2023 and 2024 datasets demonstrated that the proposed SwinCLNet model has excellent performance in terms of the Dice score and Hausdorff distance for all brain tumor segmentation areas. In particular, the proposed model increased the average Dice score by approximately 4.53% and reduced the Hausdorff distance 95th percentile by approximately 30.89% compared with the average of benchmark models. These data demonstrate that the SwinCLNet model is particularly efficient in the difficult tumor core and enhancing tumor regions.

Journal Article

Share this book

Add to My Shelf

Smartphone screen surface defect detection using dynamic large separable kernel attention and multi-scale feature bi-directional path aggregation network

by Bao, Nengsheng , Long, Huadiao , Huang, Yi in 639/166 , 639/301 , 639/705

2025

In the smartphone manufacturing industry, detecting cover glass defects is crucial to product quality. To address this, this paper proposes DY-YOLO, an enhanced YOLOv8-based model for defect detection on smartphone cover glass. The model improves the accuracy and efficiency of detecting defects on cover glass surfaces in complex production environments. Specifically, the proposed Dynamic-Large Separable Kernel Attention (Dynamic-LSKA) module effectively suppresses interference from complex backgrounds, such as glass reflections, thereby reducing false detections. DY-YOLO integrates several innovations: the Dynamic-LSKA module for enhanced multi-scale perception, the Dynamic-C2f module for enhanced feature extraction, and the Advanced Screening Feature Bidirectional Path Aggregation Network (HSF-BPAN) for efficient fusion of advanced screening features. Additionally, DySample is used as a lightweight dynamic up-sampler to reduce computational cost. Extensive evaluations were conducted using two public benchmarks, Mobile Phone Screen Surface Defect Dataset (MSD) and Smartphone Screen Glass Dataset (SSGD). Results demonstrate that, compared to the baseline model, the proposed method achieves improvements of 1% and 0.6% in mAP@0.5 and mAP@0.5:0.95, respectively, on MSD, reaching 99.3% and 70.9%. On SSGD, the improvements are 4.8% and 2.6%, reaching 46% and 20.2%, respectively, surpassing the state-of-the-art methods in detection accuracy. Moreover, DY-YOLO achieves an excellent balance between performance and efficiency. With a parameter count comparable to the baseline but 33.3% lower computational cost, the model achieves an inference speed of 121.8 FPS, demonstrating its strong potential for real-time edge deployment on production lines. These results confirm the model’s effectiveness and potential for industrial applications.

Journal Article

Share this book

Add to My Shelf

YOLOv8-MCDE for lightweight detection of small instruments in complex backgrounds from inspection robots’ perspective

by Ling, Ding , Shi, Qingwu , Jiang, Tianyue in 639/166/987 , 639/166/988 , Accuracy

2025

This paper addresses the challenges of equipment inspection in complex substation environments by proposing a lightweight small object detection algorithm, YOLOv8-MCDE, specifically designed for instrument recognition and suitable for deployment on inspection robots. Through model structure optimization, the proposed method significantly enhances both the small object detection performance and real-time efficiency of instrument detection on edge computing devices. YOLOv8-MCDE adopts the lightweight MobileNetV3 architecture as its backbone, effectively reducing model complexity and improving operational efficiency. The neck integrates a CNN-based Cross-scale Feature Fusion (CCFF) algorithm, which further lowers computational overhead while enhancing detection capability for small objects. In addition, a Deformable Large Kernel Attention (D-LKA) mechanism is integrated to increase the model’s sensitivity to small objects within complex backgrounds. The conventional CIOU loss function is also replaced with the more efficient EIOU loss function, significantly improving bounding box localization accuracy and accelerating model convergence. Experimental results demonstrate that YOLOv8-MCDE achieves a Precision of 92.80% and an mAP50 of 91.36%, representing improvements of 2.38% and 1.27%, respectively, compared to the original YOLOv8. Furthermore, the proposed algorithm reduces FLOPs by 37.68% and model size by 36%. These enhancements substantially reduce computational resource demands while significantly improving the real-time detection capabilities and small object recognition performance of inspection robots operating in complex environments.

Journal Article

Share this book

Add to My Shelf

Aero-Engine Ablation Defect Detection with Improved CLR-YOLOv11 Algorithm

by Qian, Jide , Xu, Yaxi , Liu, Jiatian in Ablation , Accuracy , aero-engine

2025

Aero-engine ablation detection is a critical task in aircraft health management, yet existing rotation-based object detection methods often face challenges of high computational complexity and insufficient local feature extraction. This paper proposes an improved YOLOv11 algorithm incorporating Context-guided Large-kernel attention and Rotated detection head, called CLR-YOLOv11. The model achieves synergistic improvement in both detection efficiency and accuracy through dual structural optimization, with its innovations primarily embodied in the following three tightly coupled strategies: (1) Targeted Data Preprocessing Pipeline Design: To address challenges such as limited sample size, low overall image brightness, and noise interference, we designed an ordered data augmentation and normalization pipeline. This pipeline is not a mere stacking of techniques but strategically enhances sample diversity through geometric transformations (random flipping, rotation), hybrid augmentations (Mixup, Mosaic), and pixel-value transformations (histogram equalization, Gaussian filtering). All processed images subsequently undergo Z-Score normalization. This order-aware pipeline design effectively improves the quality, diversity, and consistency of the input data. (2) Context-Guided Feature Fusion Mechanism: To overcome the limitations of traditional Convolutional Neural Networks in modeling long-range contextual dependencies between ablation areas and surrounding structures, we replaced the original C3k2 layer with the C3K2CG module. This module adaptively fuses local textural details with global semantic information through a context-guided mechanism, enabling the model to more accurately understand the gradual boundaries and spatial context of ablation regions. (3) Efficiency-Oriented Large-Kernel Attention Optimization: To expand the receptive field while strictly controlling the additional computational overhead introduced by rotated detection, we replaced the C2PSA module with the C2PSLA module. By employing large-kernel decomposition and a spatial selective focusing strategy, this module significantly reduces computational load while maintaining multi-scale feature perception capability, ensuring the model meets the demands of high real-time applications. Experiments on a self-built aero-engine ablation dataset demonstrate that the improved model achieves 78.5% mAP@0.5:0.95, representing a 4.2% improvement over the YOLOv11-obb which model without the specialized data augmentation. This study provides an effective solution for high-precision real-time aviation inspection tasks.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter