Catalogue Search | MBRL

Real-time stereo matching with high accuracy via Spatial Attention-Guided Upsampling

by Wu, Wenhuan , Zhao, Qiang , Shi, Jing in Accuracy , Artificial Intelligence , Computer Science

2023

Deep learning-based stereo matching methods have made remarkable progress in recent years. However, it is still a challenging task to achieve high accuracy in real time. In response to this challenge, we propose a Spatial Attention-Guided Upsampling network (SAGU-Net) for accurate and real-time stereo matching. First, a Spatial Attention-Guided Cost Volume Upsampling (SAG-CVU) module is proposed for upsampling the low-resolution cost volume, which calculates each upsampled matching cost as the sum of neighboring coarse costs under the guidance of spatial attention. Different from the recently popular coarse-to-fine (CTF) strategy that prefers upsampling the coarse disparity map, the low-resolution cost volume is upsampled by the SAG-CVU module which allows more raw information to propagate to subsequent procedures and can alleviate the problem of losing high-frequency information. To ensure fast running speed, a medium-resolution disparity map is directly regressed from the upsampled cost volume and then upsampled to full resolution with a Spatial Attention-Guided Disparity Map Upsampling (SAG-DMU) module. Unlike most CTF-based methods which usually build and aggregate narrow cost volumes iteratively until a full-resolution disparity map is obtained, the SAG-DMU module helps the proposed network avoid the iterative procedure to ensure fast running speed. In addition, we propose a simple yet effective gradient loss function that plays the role of a discontinuity-preserving regularizer, which further improves the overall accuracy, especially at depth discontinuities. These design choices lead to the proposed SAGU-Net which can obtain accurate results in real time. Extensive experimental results demonstrate that SAGU-Net and its variants outperform not only state-of-the-art real-time networks but also many accuracy-oriented models on multiple datasets.

Journal Article

Share this book

Add to My Shelf

Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels

by Imani, Mehdi , Arabnia, Hamid Reza , Beikmohammadi, Ali in Accuracy , Algorithms , Classification

2025

This study examines the efficacy of Random Forest and XGBoost classifiers in conjunction with three upsampling techniques—SMOTE, ADASYN, and Gaussian noise upsampling (GNUS)—across datasets with varying class imbalance levels, ranging from moderate to extreme (15% to 1% churn rate). Employing metrics such as F1 score, ROC AUC, PR AUC, Matthews Correlation Coefficient (MCC), and Cohen’s Kappa, this research provides a comprehensive evaluation of classifier performance under different imbalance scenarios, focusing on applications in the telecommunications domain. The findings highlight that tuned XGBoost paired with SMOTE (Tuned_XGB_SMOTE) consistently achieves the highest F1 score and robust performance across all imbalance levels. SMOTE emerged as the most effective upsampling method, particularly when used with XGBoost, whereas Random Forest performed poorly under severe imbalance. ADASYN showed moderate effectiveness with XGBoost but underperformed with Random Forest, and GNUS produced inconsistent results. This study underscores the impact of data imbalance, with MCC, Kappa, and F1 scores fluctuating significantly, whereas ROC AUC and PR AUC remained relatively stable. Moreover, rigorous statistical analyses employing the Friedman test and Nemenyi post hoc comparisons confirmed that the observed improvements in F1 score, PR-AUC, Kappa, and MCC were statistically significant (p < 0.05), with Tuned_XGB_SMOTE significantly outperforming Tuned_RF_GNUS. While differences in ROC-AUC were not significant, the consistency of these results across multiple performance metrics underscores the reliability of our framework, offering a statistically validated and attractive solution for model selection in imbalanced classification scenarios.

Journal Article

Share this book

Add to My Shelf

Deformable Kernel Networks for Joint Image Filtering

by Ponce, Jean , Kim Beomjun , Ham Bumsub in Artificial neural networks , Deformation , Formability

2021

Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size 640×480. We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled 3×3 kernels outperforms the state of the art by a significant margin in all cases.

Journal Article

Share this book

Add to My Shelf

An improved YOLO-Pose model for pose estimation on blurred images generated to protect personal privacy

by Haipeng Yu , Patrick Goh in Data augmentation , Deconvolution upsampling , Human pose estimation

2026

Recent years have witnessed significant advancements in multi-person pose estimation within the You Only Look Once (YOLO) framework. However, human body images are frequently blurred and anonymized to address privacy concerns, which significantly undermines the accuracy and reliability of pose estimation. To overcome these limitations, this article proposes an optimization program for YOLO-Pose, enabled by flexible structural configurations and custom training parameters to enhance adaptability. Specifically, a deconvolution-based upsampling module and a specialized blurred data augmentation strategy are introduced to improve the model’s robustness and generalization. Notably, the proposed model, even when trained exclusively on sharp images, demonstrates superior predictive performance on blurred inputs. Furthermore, we design a universal skeleton connection method that enables YOLO-Pose to seamlessly adapt to datasets with varying numbers of key points, significantly increasing its versatility across diverse annotation standards. Experimental results on the CrowdPose dataset demonstrate the superiority of the proposed method. While maintaining a parameter count nearly identical to that of the self-trained YOLO12n-Pose baseline, our model achieves relative improvements of +4.1%, +15.4%, and +6.8% in mAP@50:95 on test sets corrupted by Gaussian blur, motion blur, and defocus blur respectively, under the most severe degradation levels. The optimized model demonstrates robust and accurate pose estimation directly on blurred input images with varying intensities, highlighting its strong generalization capability under privacy-preserving visual conditions.

Journal Article

Share this book

Add to My Shelf

Image Enhancement-Based Detection with Small Infrared Targets

by Liu, Shuai , Chen, Pengfei , Woźniak, Marcin in Algorithms , autonomous systems , Background noise

2022

Today, target detection has an indispensable application in various fields. Infrared small-target detection, as a branch of target detection, can improve the perception capability of autonomous systems, and it has good application prospects in infrared alarm, automatic driving and other fields. There are many well-established algorithms that perform well in infrared small-target detection. Nevertheless, the current algorithms cannot achieve the expected detection effect in complex environments, such as background clutter, noise inundation or very small targets. We have designed an image enhancement-based detection algorithm to solve both problems through detail enhancement and target expansion. This method first enhances the mutation information, detail and edge information of the image and then improves the contrast between the target edge and the adjacent pixels to make the target more prominent. The enhancement improves the robustness of detection with background clutter or noise-flooded scenes. Moreover, bicubic interpolation is used on the input image, and the target pixels are expanded with upsampling, which enhances the detection effectiveness for tiny targets. From the results of qualitative and quantitative experiments, the algorithm proposed in this paper outperforms the existing work on various evaluation indicators.

Journal Article

Share this book

Add to My Shelf

Forest Fire Detection Algorithm Based on Improved YOLOv11n

by Jiang, Shuihai , Zhou, Kangqian in Accuracy , Algorithms , attention mechanism

2025

To address issues in traditional forest fire detection models, such as large parameter sizes, slow detection speed, and unsuitability for resource-constrained devices, this paper proposes a forest fire detection method, FEDS-YOLOv11n, based on an improved YOLOv11n model. First, the C3k2 module was redesigned using the FasterBlock module, replacing C3k2 with C3k2-Faster in both the Backbone network and Neck section to achieve a lightweight model design. Second, an EMA attention mechanism was introduced into the C3k2-Faster module in the Backbone, replacing C3k2-Faster with C3k2-Faster-EMA to compensate for the accuracy loss in small-object detection caused by the lightweight design. Third, the original upsampling module in the Neck was replaced with the lightweight dynamic upsampling operator DySample. Finally, the detection head was improved using the SEAM attention module, replacing the original Detect head with SEAMHead, which enables better handling of occluded objects. The experimental results show that compared to YOLOv11n, the proposed FEDS-YOLOv11n achieves improvements of 0.9% in precision (P), 1.9% in recall (R), 2.1% in mean precision at IoU 0.5 (mAP@0.5), and 2.3% in mean precision at IoU 0.5–0.95 (mAP@0.5–0.95). Additionally, the number of parameters is reduced by 21.32%, GFLOPs are reduced by 26.98%, and FPS increases from 48.2 to 71.8. The FEDS-YOLOv11n model ensures high accuracy while maintaining lower computational complexity and faster inference speed, making it suitable for real-time forest fire detection applications.

Journal Article

Share this book

Add to My Shelf

FADE: A Task-Agnostic Upsampling Operator for Encoder–Decoder Architectures

by Liu, Wenze , Fu, Hongtao , Lu, Hao in Artificial Intelligence , Computer Imaging , Computer Science

2025

The goal of this work is to develop a task-agnostic feature upsampling operator for dense prediction where the operator is required to facilitate not only region-sensitive tasks like semantic segmentation but also detail-sensitive tasks such as image matting. Prior upsampling operators often can work well in either type of the tasks, but not both. We argue that task-agnostic upsampling should dynamically trade off between semantic preservation and detail delineation, instead of having a bias between the two properties. In this paper, we present FADE, a novel, plug-and-play, lightweight, and task-agnostic upsampling operator by fusing the assets of decoder and encoder features at three levels: (i) considering both the encoder and decoder feature in upsampling kernel generation; (ii) controlling the per-point contribution of the encoder/decoder feature in upsampling kernels with an efficient semi-shift convolutional operator; and (iii) enabling the selective pass of encoder features with a decoder-dependent gating mechanism for compensating details. To improve the practicality of FADE, we additionally study parameter- and memory-efficient implementations of semi-shift convolution. We analyze the upsampling behavior of FADE on toy data and show through large-scale experiments that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks with little extra cost. For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully. Code is made available at: https://github.com/poppinace/fade

Journal Article

Share this book

Add to My Shelf

Up-Sampling Method for Low-Resolution LiDAR Point Cloud to Enhance 3D Object Detection in an Autonomous Driving Environment

by You, Jihwan , Kim, Young-Keun in 3D object detection , 3D upsampling , Accuracy

2022

Automobile datasets for 3D object detection are typically obtained using expensive high-resolution rotating LiDAR with 64 or more channels (Chs). However, the research budget may be limited such that only a low-resolution LiDAR of 32-Ch or lower can be used. The lower the resolution of the point cloud, the lower the detection accuracy. This study proposes a simple and effective method to up-sample low-resolution point cloud input that enhances the 3D object detection output by reconstructing objects in the sparse point cloud data to produce more dense data. First, the 3D point cloud dataset is converted into a 2D range image with four channels: x, y, z, and intensity. The interpolation on the empty space is calculated based on both the pixel distance and range values of six neighbor points to conserve the shapes of the original object during the reconstruction process. This method solves the over-smoothing problem faced by the conventional interpolation methods, and improves the operational speed and object detection performance when compared to the recent deep-learning-based super-resolution methods. Furthermore, the effectiveness of the up-sampling method on the 3D detection was validated by applying it to baseline 32-Ch point cloud data, which were then selected as the input to a point-pillar detection model. The 3D object detection result on the KITTI dataset demonstrates that the proposed method could increase the mAP (mean average precision) of pedestrians, cyclists, and cars by 9.2%p, 6.3%p, and 5.9%p, respectively, when compared to the baseline of the low-resolution 32-Ch LiDAR input. In future works, various dataset environments apart from autonomous driving will be analyzed.

Journal Article

Share this book

Add to My Shelf

Tomato leaf disease detection method based on improved YOLOv8n

by Chen, Ming , Yuan, Yuan , Zhang, Kaisheng in 631/114/1564 , 639/705/117 , Algorithms

2025

With the increasing demand for precision agriculture, automatic detection of tomato leaf diseases has become a critical technological challenge in smart agriculture. Among various diseases, Tomato Yellow Virus Leaf, due to its unique pathological characteristics, presents a particularly challenging identification target. Traditional image recognition methods often fail to meet the high-precision detection requirements for this disease, leading to delayed responses in disease control by farmers, which severely impacts tomato yield and quality. To address this issue, this paper proposes an optimized YOLOv8n algorithm, incorporating a C2f-DynamicConv optimization module. By dynamically adjusting the weights of convolutional kernels, the model can adapt to the characteristics of different input data, thereby enhancing its ability to represent diverse features. Additionally, we introduce the SimAM attention mechanism, which enhances the model’s focus on key areas by weighting the feature map, significantly improving the accuracy of disease detection while filtering out irrelevant features and enhancing sensitivity. During the upsampling process, we adopt the Dysample upsampling operator, optimizing the quality of feature map reconstruction and improving detection resolution through a refined upsampling strategy. To better address the bounding box regression problem in object detection, we incorporate the GIoU loss function. Compared to traditional loss functions, GIoU performs excellently in handling bounding box overlap and positional accuracy, further improving the model’s detection performance. Experimental results show that the improved model achieves an average precision of 81.8%, precision of 77.1%, and recall of 77.4%. Compared to existing methods, our approach shows significant advantages in detection accuracy, localization precision, and model computational efficiency, achieving improved detection performance on the tomato leaf disease dataset.

Journal Article

Share this book

Add to My Shelf

A novel convolutional neural network for enhancing the continuity of pavement crack detection

by Sun, Shangyu , Song, Weidong , Teng, Qiaoshuang in 639/166/986 , 639/705/117 , Accuracy

2024

Pavement cracks affect the structural stability and safety of roads, making accurate identification of crack for assessing the extent of damage and evaluating road health. However, traditional convolutional neural networks often struggle with issues such as missed detection and false detection when extracting cracks. This paper introduces a network called CPCDNet, designed to maintain continuous extraction of pavement cracks. The model incorporates a Crack align module (CAM) and a Weighted Edge Cross Entropy Loss Function (WECEL) to enhance the continuity of crack extraction in complex environments. Experimental results show that the proposed model achieves mIoU scores of 77.71%, 80.36%, 91.19%, and 71.16% on the public datasets CFD, Crack500, Deepcrack537, and Gaps384, respectively. Compared to other networks, the proposed method improves the continuity and accuracy of crack extraction.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter