Catalogue Search | MBRL

Enhanced blur‐robust monocular depth estimation via self‐supervised learning

by Kim, Seong‐Yeol , Shin, Ho‐Ju , Lee, Se‐Ho in computer vision , Image and Vision Processing and Display Technology , image processing

2024

This letter presents a novel self‐supervised learning strategy to improve the robustness of a monocular depth estimation (MDE) network against motion blur. Motion blur, a common problem in real‐world applications like autonomous driving and scene reconstruction, often hinders accurate depth perception. Conventional MDE methods are effective under controlled conditions but struggle to generalise their performance to blurred images. To address this problem, we generate blur‐synthesised data to train a robust MDE model without the need for preprocessing, such as deblurring. By incorporating self‐distillation techniques and using blur‐synthesised data, the depth estimation accuracy for blurred images is significantly enhanced without additional computational or memory overhead. Extensive experimental results demonstrate the effectiveness of the proposed method, enhancing existing MDE models to accurately estimate depth information across various blur conditions. This study introduces a novel self‐supervised learning approach for enhancing the robustness of monocular depth estimation networks against motion blur. By leveraging blur‐synthesised data and self‐distillation techniques, our method significantly improves depth estimation accuracy for blurred images without extra computational or memory costs. Extensive experiments validate the proposed strategy, showing its ability to accurately estimate depth across various blur conditions.

Journal Article

Share this book

Add to My Shelf

A Fusion of ReLU and upSample Function With Store in VTA for Higher Throughput of Network Inferencing

by Zhang, Guohe , Wang, Xuhui , He, Zheng

2025

The TVM–versatile tensor accelerator (VTA) stack combines hardware–software co‐design with operator‐level optimizations but relies on ARM processors for auxiliary functions like ReLU and upSample, causing data‐transfer bottlenecks and inefficiencies. To address this, we propose fusion VTA (FVTA), integrating ReLU and upSample into the RTL‐based Store module with a newly designed instruction set and lightweight C++ runtime. This ensures seamless compatibility with existing VTA modules and eliminates ARM dependence. Evaluated on YOLOv3 with a Xilinx ZCU104 board, FVTA achieves a 195 ms frame processing time for 256 × 256 RGB images—4% faster than EVTA. This work highlights how combining the flexible TVM–VTA stack with optimized circuit‐level design can significantly enhance inference efficiency.

Journal Article

Share this book

Add to My Shelf

UAV‐Based Real‐Time Object Detection Network Using Structured Pruning Strategy

by Zhao, Donghui , Mo, Bo

2025

Real‐time object detection networks based on UAV have been used in various fields. However, some challenges need to be solved: (1) Conventional detection algorithms are not suitable for small targets; (2) The computational capacity of the UAV platform is limited; (3) The sample distribution in the aerial dataset shows the characteristics of long‐tail distribution. Categories at the tail end often need to be better learned. To address these challenges, we propose the AIR‐YOLO‐pruned method, a lightweight UAV‐based object detection method built on the YOLOv8. In this paper, we propose the AIR‐YOLO which is suitable for small object detection. We introduce the gradient adaptive allocation loss to enhance the model's learning ability for tail categories. To eliminate redundant components in AIR‐YOLO, we design a kind of structured pruning strategy. Experiment results indicate that our AIR‐YOLOn‐pruned method, with competitive computational cost, achieves a 17% improvement in accuracy compared to YOLOv8n.

Journal Article

Share this book

Add to My Shelf

Global feature fusion generative adversarial network for underwater image enhancement

by Liu, Chunyou , Qi, Ping , Tang, Zhibin in Image and vision processing and display technology , image enhancement

2024

Most CNN‐based networks treat features in different channels and pixels similarly. However, the dispersion of light underwater results in uneven haze distribution throughout the images. To alleviate these problems, the global feature fusion generative adversarial network is proposed, to separate and enhance the global feature by channels and pixels simultaneously. The key novelty of our method is utilizing the attention mechanism to emphasize features in different channels and pixels in fusion block, which allocates higher weights to more important features. The network's reliability is further optimized through the incorporation of condition information to restrict the training of the network. Both qualitative and quantitative evaluations verify that the proposed method is capable of greater visual quality compared with other classic and state‐of‐the‐art methods. The global feature fusion generative adversarial network is proposed, to separate and enhance the global feature by channels and pixels simultaneously. The key novelty of our method is utilizing the attention mechanism to emphasize features in different channels and pixels in fusion block, which allocates higher weights to more important features. The network's reliability is further optimized through the incorporation of condition information to restrict the training of the network.

Journal Article

Share this book

Add to My Shelf

Semantically enhanced attention map‐driven occluded person re‐identification

by Ge, Yiyuan , Shi, Huiyu , Lu, Wenshuai in artificial intelligence , computer vision , image and vision processing and display technology

2024

Occluded person re‐identification (Re‐ID) is to identify a particular person when the person's body parts are occluded. However, challenges remain in enhancing effective information representation and suppressing background clutter when considering occlusion scenes. This paper proposes a novel attention map‐driven network (AMD‐Net) for occluded person Re‐ID. In AMD‐Net, human parsing labels are introduced to supervise the generation of partial attention maps, while a spatial‐frequency interaction module is suggested to complement the higher‐order semantic information from the frequency domain. Furthermore, a Taylor‐inspired feature filter for mitigating background disturbance and extracting fine‐grained features is proposed. Moreover, a part‐soft triplet loss, which is robust to non‐discriminative body partial features is also designed. Experimental results on Occluded‐Duke, Occluded‐Reid, Market‐1501, and Duke‐MTMC datasets show that this method outperforms existing state‐of‐the‐art methods. The code is available at: https://github.com/ISCLab‐Bistu/SA‐ReID. This paper proposes an attention map‐driven network (AMD‐Net) for occluded person Re‐ID. To begin with, human parsing labels are utilized to establish more precise feature extraction regions. Subsequently, the spatial‐frequency interaction module and the Taylor‐inspired feature filter are introduced to add valid information and suppress background clutter. Lastly, a part‐soft triplet loss is suggested to increase the model's inclusiveness of the non‐discriminative body partial features.

Journal Article

Share this book

Add to My Shelf

Deep learning cigarette defect detection method based on saliency feature guidance

by Liu, Benxue , Yang, Zhen , Chen, Liyan in computer vision , image and vision processing and display technology

2024

Cigarette defect detection is important in industrial production. Existing methods extract features for defect detection manually or using deep learning. However, due to the small size of cigarette defects, these methods are unable to effectively extract discriminative features, limiting detection performance. Hence, a deep learning‐based method called significant feature‐guided cigarette defect detection (SFGCD) is proposed, which combines saliency feature extraction methods with deep learning to enhance feature representation and improve detection. First, edge saliency features are extracted using the proposed target gradient saliency feature extraction (GSFE) strategy. Then, a dense multi‐level feature fusion network is designed to combine the original features with the saliency features obtained from the target gradient saliency feature extraction strategy. This network enriches feature representation and improves detection by fusing original and saliency features at different levels and scales. Experimental results demonstrate that the proposed method achieves a higher accuracy of 0.02 mean average precision (MAP) value and a detection speed of 5 frames per second (FPS) on the authors' own labeled cigarette defect dataset compared to existing state‐of‐the‐art methods. We propose a deep learning‐based cigarette defect detection method guided by saliency features, termed as SFGSD, which combines traditional saliency feature extraction methods with deep learning methods. The core of this method lies in enriching the feature representation of the targets and improving the detection performance by fusing saliency features extracted from both traditional saliency feature extraction methods and deep learning.

Journal Article

Share this book

Add to My Shelf

Research and design of image style transfer technology based on multi‐scale convolutional neural network feature fusion

by Nie, Tian , Liang, Yongzhen , Xie, Wensheng in image and vision processing and display technology , image fusion , image processing

2024

To reduce the occurrence of information loss and distortion in image style transfer, a method is proposed for researching and designing image style transfer technology based on multi‐scale convolutional neural networks (CNNs) feature fusion. Initially, the VGG19 model is designed for coarse and fine‐scale networks to achieve multi‐scale CNN feature extraction of target image information. Subsequently, while setting the corresponding feature loss function, an additional least‐squares penalty parameter is introduced to balance the optimal total loss function. Finally, leveraging the characteristics of stochastic gradient descent iteration, image features are fused and reconstructed to obtain better style transfer images. Experimental evaluations utilize peak signal‐to‐noise ratio (PSNR), structural similarity index (SSIM), information entropy (IE), and mean squared error (MSE) as metrics for assessing the transferred images, comparing them with three typical image style transfer methods. Results demonstrate that the proposed method achieves optimal performance across all metrics, realizing superior image style transfer effects. This article conducts theoretical research on the research and design of image style transfer technology based on multi‐scale convolutional neural network (CNN) feature fusion, and compares and analyzes the implementation effects of various experimental methods. Based on the original CNN image style transfer method, improvements and designs are made, and the image style transfer effect is well achieved.

Journal Article

Share this book

Add to My Shelf

Deformable channel non‐local network for crowd counting

by Wang, Huake , Zhang, Ting , Hou, Xingsong in Deformation , Feature maps , Formability

2023

Both global dependency and local correlation are crucial for solving the scale variation of crowd. However, most of previous methods fail to take two factors into consideration simultaneously. Against the aforementioned issue, a deformable channel non‐local network, abbreviated as DCNLNet for crowd counting, which can simultaneously learn global context information and adaptive local receptive field is proposed. Specifically, the proposed DCNLNet consists of two well‐crafted designed modules: deformable channel non‐local block (DCNL) and spatial attention feature fusion block (SAFF). The DCNL encodes long‐range dependencies between pixels and the adaptive local correlation with channel non‐local and deformable convolution, respectively, benefiting for improving the spatial discrimination of features. While the SAFF aims to aggregate the cross‐level information, which interacts these features from different depths and learns specific weights for the feature maps with spatial attention. Extensive experiments are performed on three crowd counting benchmark datasets and experimental results indicate that the proposed DCNLNet achieves compelling performance compared to other representative counting models. In the letter, a deformable channel non‐local network (DCNLNet) has been proposed for crowd counting. In order to explore global and local information, we develop a deformable channel non‐local module, which contains two branches, deformable convolution branch and channel non‐local branch, to learn adaptive local correlation and long‐range dependency. Moreover, we introduce a spatial attention feature fusion module to aggregate cross‐level features obtained from the encoder and the decoder.

Journal Article

Share this book

Add to My Shelf

Robust Text‐Based Person Search via Noisy Pair Identification and Pseudo‐Text Augmentation

by Chen, Huaixin , Xiong, Lian , Li, Wangdong in Accuracy , Annotations , Datasets

2025

Text‐based person search (TBPS) aims to retrieve pedestrian images from a database based on natural language descriptions. However, existing TBPS methods often assume that image‐text pairs in training datasets are perfectly aligned, neglecting noisy annotations characterized by coarse‐grained or mismatched descriptions. This letter presents a robust TBPS framework that addresses these challenges through two key innovations. First, we design a dual‐channel Gaussian mixture model (GMM) to identify noisy image‐text pairs by leveraging both global and local feature‐level alignment losses. Second, for the detected noisy samples, we generate pseudo‐texts using a multimodal large language model (MLLM) and filter them via a dynamic semantic consistency scoring mechanism to ensure high‐quality supervision. Extensive experiments on ICFG‐PEDES and RSTPReid demonstrate that our method consistently improves top‐k retrieval metrics.

Journal Article

Share this book

Add to My Shelf

Global strategy for robust air‐light estimation in dehazing

by Zhu, Zhu , Zhang, Xiaoguo in image and vision processing and display technology , Image enhancement , image processing

2024

This letter proposes a novel approach to enhance the robustness of air‐light estimation for dehazing. Unlike most existing methods, it employs a global strategy, considering all pixels instead of specific individual ones, for recovering air‐light. Through an iterative algorithm via the Gray World (GW), the authors extract the air‐light orientation from the entire image. Next, a global detail‐preserving algorithm is designed to determine the optimal magnitude of air‐light. Experimental results on a diverse set of hazy images reveal that the authors’ method outperforms other state‐of‐the‐art alternatives, highlighting the advantage of air‐light estimation using the entire image information. Unlike most existing methods, the authors’ method employs a global strategy, considering all pixels instead of specific individual ones, to recover air‐light. Through an iterative algorithm utilizing the Gray World (GW), the authors extract the orientation of air‐light from the entire image. Next, a global optimization method is designed to determine the optimal magnitude of air‐light.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter