Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
481 result(s) for "image and vision processing and display technology"
Sort by:
Enhanced blur‐robust monocular depth estimation via self‐supervised learning
This letter presents a novel self‐supervised learning strategy to improve the robustness of a monocular depth estimation (MDE) network against motion blur. Motion blur, a common problem in real‐world applications like autonomous driving and scene reconstruction, often hinders accurate depth perception. Conventional MDE methods are effective under controlled conditions but struggle to generalise their performance to blurred images. To address this problem, we generate blur‐synthesised data to train a robust MDE model without the need for preprocessing, such as deblurring. By incorporating self‐distillation techniques and using blur‐synthesised data, the depth estimation accuracy for blurred images is significantly enhanced without additional computational or memory overhead. Extensive experimental results demonstrate the effectiveness of the proposed method, enhancing existing MDE models to accurately estimate depth information across various blur conditions. This study introduces a novel self‐supervised learning approach for enhancing the robustness of monocular depth estimation networks against motion blur. By leveraging blur‐synthesised data and self‐distillation techniques, our method significantly improves depth estimation accuracy for blurred images without extra computational or memory costs. Extensive experiments validate the proposed strategy, showing its ability to accurately estimate depth across various blur conditions.
A Fusion of ReLU and upSample Function With Store in VTA for Higher Throughput of Network Inferencing
The TVM–versatile tensor accelerator (VTA) stack combines hardware–software co‐design with operator‐level optimizations but relies on ARM processors for auxiliary functions like ReLU and upSample, causing data‐transfer bottlenecks and inefficiencies. To address this, we propose fusion VTA (FVTA), integrating ReLU and upSample into the RTL‐based Store module with a newly designed instruction set and lightweight C++ runtime. This ensures seamless compatibility with existing VTA modules and eliminates ARM dependence. Evaluated on YOLOv3 with a Xilinx ZCU104 board, FVTA achieves a 195 ms frame processing time for 256 × 256 RGB images—4% faster than EVTA. This work highlights how combining the flexible TVM–VTA stack with optimized circuit‐level design can significantly enhance inference efficiency.
UAV‐Based Real‐Time Object Detection Network Using Structured Pruning Strategy
Real‐time object detection networks based on UAV have been used in various fields. However, some challenges need to be solved: (1) Conventional detection algorithms are not suitable for small targets; (2) The computational capacity of the UAV platform is limited; (3) The sample distribution in the aerial dataset shows the characteristics of long‐tail distribution. Categories at the tail end often need to be better learned. To address these challenges, we propose the AIR‐YOLO‐pruned method, a lightweight UAV‐based object detection method built on the YOLOv8. In this paper, we propose the AIR‐YOLO which is suitable for small object detection. We introduce the gradient adaptive allocation loss to enhance the model's learning ability for tail categories. To eliminate redundant components in AIR‐YOLO, we design a kind of structured pruning strategy. Experiment results indicate that our AIR‐YOLOn‐pruned method, with competitive computational cost, achieves a 17% improvement in accuracy compared to YOLOv8n.
Global feature fusion generative adversarial network for underwater image enhancement
Most CNN‐based networks treat features in different channels and pixels similarly. However, the dispersion of light underwater results in uneven haze distribution throughout the images. To alleviate these problems, the global feature fusion generative adversarial network is proposed, to separate and enhance the global feature by channels and pixels simultaneously. The key novelty of our method is utilizing the attention mechanism to emphasize features in different channels and pixels in fusion block, which allocates higher weights to more important features. The network's reliability is further optimized through the incorporation of condition information to restrict the training of the network. Both qualitative and quantitative evaluations verify that the proposed method is capable of greater visual quality compared with other classic and state‐of‐the‐art methods. The global feature fusion generative adversarial network is proposed, to separate and enhance the global feature by channels and pixels simultaneously. The key novelty of our method is utilizing the attention mechanism to emphasize features in different channels and pixels in fusion block, which allocates higher weights to more important features. The network's reliability is further optimized through the incorporation of condition information to restrict the training of the network.
Semantically enhanced attention map‐driven occluded person re‐identification
Occluded person re‐identification (Re‐ID) is to identify a particular person when the person's body parts are occluded. However, challenges remain in enhancing effective information representation and suppressing background clutter when considering occlusion scenes. This paper proposes a novel attention map‐driven network (AMD‐Net) for occluded person Re‐ID. In AMD‐Net, human parsing labels are introduced to supervise the generation of partial attention maps, while a spatial‐frequency interaction module is suggested to complement the higher‐order semantic information from the frequency domain. Furthermore, a Taylor‐inspired feature filter for mitigating background disturbance and extracting fine‐grained features is proposed. Moreover, a part‐soft triplet loss, which is robust to non‐discriminative body partial features is also designed. Experimental results on Occluded‐Duke, Occluded‐Reid, Market‐1501, and Duke‐MTMC datasets show that this method outperforms existing state‐of‐the‐art methods. The code is available at: https://github.com/ISCLab‐Bistu/SA‐ReID. This paper proposes an attention map‐driven network (AMD‐Net) for occluded person Re‐ID. To begin with, human parsing labels are utilized to establish more precise feature extraction regions. Subsequently, the spatial‐frequency interaction module and the Taylor‐inspired feature filter are introduced to add valid information and suppress background clutter. Lastly, a part‐soft triplet loss is suggested to increase the model's inclusiveness of the non‐discriminative body partial features.
Deep learning cigarette defect detection method based on saliency feature guidance
Cigarette defect detection is important in industrial production. Existing methods extract features for defect detection manually or using deep learning. However, due to the small size of cigarette defects, these methods are unable to effectively extract discriminative features, limiting detection performance. Hence, a deep learning‐based method called significant feature‐guided cigarette defect detection (SFGCD) is proposed, which combines saliency feature extraction methods with deep learning to enhance feature representation and improve detection. First, edge saliency features are extracted using the proposed target gradient saliency feature extraction (GSFE) strategy. Then, a dense multi‐level feature fusion network is designed to combine the original features with the saliency features obtained from the target gradient saliency feature extraction strategy. This network enriches feature representation and improves detection by fusing original and saliency features at different levels and scales. Experimental results demonstrate that the proposed method achieves a higher accuracy of 0.02 mean average precision (MAP) value and a detection speed of 5 frames per second (FPS) on the authors' own labeled cigarette defect dataset compared to existing state‐of‐the‐art methods. We propose a deep learning‐based cigarette defect detection method guided by saliency features, termed as SFGSD, which combines traditional saliency feature extraction methods with deep learning methods. The core of this method lies in enriching the feature representation of the targets and improving the detection performance by fusing saliency features extracted from both traditional saliency feature extraction methods and deep learning.
Research and design of image style transfer technology based on multi‐scale convolutional neural network feature fusion
To reduce the occurrence of information loss and distortion in image style transfer, a method is proposed for researching and designing image style transfer technology based on multi‐scale convolutional neural networks (CNNs) feature fusion. Initially, the VGG19 model is designed for coarse and fine‐scale networks to achieve multi‐scale CNN feature extraction of target image information. Subsequently, while setting the corresponding feature loss function, an additional least‐squares penalty parameter is introduced to balance the optimal total loss function. Finally, leveraging the characteristics of stochastic gradient descent iteration, image features are fused and reconstructed to obtain better style transfer images. Experimental evaluations utilize peak signal‐to‐noise ratio (PSNR), structural similarity index (SSIM), information entropy (IE), and mean squared error (MSE) as metrics for assessing the transferred images, comparing them with three typical image style transfer methods. Results demonstrate that the proposed method achieves optimal performance across all metrics, realizing superior image style transfer effects. This article conducts theoretical research on the research and design of image style transfer technology based on multi‐scale convolutional neural network (CNN) feature fusion, and compares and analyzes the implementation effects of various experimental methods. Based on the original CNN image style transfer method, improvements and designs are made, and the image style transfer effect is well achieved.
Deformable channel non‐local network for crowd counting
Both global dependency and local correlation are crucial for solving the scale variation of crowd. However, most of previous methods fail to take two factors into consideration simultaneously. Against the aforementioned issue, a deformable channel non‐local network, abbreviated as DCNLNet for crowd counting, which can simultaneously learn global context information and adaptive local receptive field is proposed. Specifically, the proposed DCNLNet consists of two well‐crafted designed modules: deformable channel non‐local block (DCNL) and spatial attention feature fusion block (SAFF). The DCNL encodes long‐range dependencies between pixels and the adaptive local correlation with channel non‐local and deformable convolution, respectively, benefiting for improving the spatial discrimination of features. While the SAFF aims to aggregate the cross‐level information, which interacts these features from different depths and learns specific weights for the feature maps with spatial attention. Extensive experiments are performed on three crowd counting benchmark datasets and experimental results indicate that the proposed DCNLNet achieves compelling performance compared to other representative counting models. In the letter, a deformable channel non‐local network (DCNLNet) has been proposed for crowd counting. In order to explore global and local information, we develop a deformable channel non‐local module, which contains two branches, deformable convolution branch and channel non‐local branch, to learn adaptive local correlation and long‐range dependency. Moreover, we introduce a spatial attention feature fusion module to aggregate cross‐level features obtained from the encoder and the decoder.
Robust Text‐Based Person Search via Noisy Pair Identification and Pseudo‐Text Augmentation
Text‐based person search (TBPS) aims to retrieve pedestrian images from a database based on natural language descriptions. However, existing TBPS methods often assume that image‐text pairs in training datasets are perfectly aligned, neglecting noisy annotations characterized by coarse‐grained or mismatched descriptions. This letter presents a robust TBPS framework that addresses these challenges through two key innovations. First, we design a dual‐channel Gaussian mixture model (GMM) to identify noisy image‐text pairs by leveraging both global and local feature‐level alignment losses. Second, for the detected noisy samples, we generate pseudo‐texts using a multimodal large language model (MLLM) and filter them via a dynamic semantic consistency scoring mechanism to ensure high‐quality supervision. Extensive experiments on ICFG‐PEDES and RSTPReid demonstrate that our method consistently improves top‐k retrieval metrics.
Global strategy for robust air‐light estimation in dehazing
This letter proposes a novel approach to enhance the robustness of air‐light estimation for dehazing. Unlike most existing methods, it employs a global strategy, considering all pixels instead of specific individual ones, for recovering air‐light. Through an iterative algorithm via the Gray World (GW), the authors extract the air‐light orientation from the entire image. Next, a global detail‐preserving algorithm is designed to determine the optimal magnitude of air‐light. Experimental results on a diverse set of hazy images reveal that the authors’ method outperforms other state‐of‐the‐art alternatives, highlighting the advantage of air‐light estimation using the entire image information. Unlike most existing methods, the authors’ method employs a global strategy, considering all pixels instead of specific individual ones, to recover air‐light. Through an iterative algorithm utilizing the Gray World (GW), the authors extract the orientation of air‐light from the entire image. Next, a global optimization method is designed to determine the optimal magnitude of air‐light.