Catalogue Search | MBRL

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

by Stenger, Bjorn , Lu, Tong , Liu, Wei in Attention , Computer vision , Design

2024

Image restoration in adverse weather conditions is a difficult task in computer vision. In this paper, we propose a novel transformer-based framework called GridFormer which serves as a backbone for image restoration under adverse weather conditions. GridFormer is designed in a grid structure using a residual dense transformer block, and it introduces two core designs. First, it uses an enhanced attention mechanism in the transformer layer. The mechanism includes stages of the sampler and compact self-attention to improve efficiency, and a local enhancement stage to strengthen local information. Second, we introduce a residual dense transformer block (RDTB) as the final GridFormer layer. This design further improves the network’s ability to learn effective features from both preceding and current local features. The GridFormer framework achieves state-of-the-art results on five diverse image restoration tasks in adverse weather conditions, including image deraining, dehazing, deraining & dehazing, desnowing, and multi-weather restoration. The source code and pre-trained models will be released.

Journal Article

Share this book

Add to My Shelf

Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior

by Yu, Yanjiang , Li, Changsheng , Liu, Wei in Algorithms , Computer vision , Datasets

2022

Rain is a common natural phenomenon. Taking images in the rain however often results in degraded quality of images, thus compromises the performance of many computer vision systems. Most existing de-rain algorithms use only one single input image and aim to recover a clean image. Few work has exploited stereo images. Moreover, even for single image based monocular deraining, many current methods fail to complete the task satisfactorily because they mostly rely on per pixel loss functions and ignore semantic information. In this paper, we present a Paired Rain Removal Network (PRRNet), which exploits both stereo images and semantic information. Specifically, we develop a Semantic-Aware Deraining Module (SADM) which solves both tasks of semantic segmentation and deraining of scenes, and a Semantic-Fusion Network (SFNet) and a View-Fusion Network (VFNet) which fuse semantic information and multi-view information respectively. In addition, we also introduce an Enhanced Paired Rain Removal Network (EPRRNet) which exploits semantic prior to remove rain streaks from stereo images. We first use a coarse deraining network to reduce the rain streaks on the input images, and then adopt a pre-trained semantic segmentation network to extract semantic features from the coarse derained image. Finally, a parallel stereo deraining network fuses semantic and multi-view information to restore finer results. We also propose new stereo based rainy datasets for benchmarking. Experiments on both monocular and the newly proposed stereo rainy datasets demonstrate that the proposed method achieves the state-of-the-art performance. https://github.com/HDCVLab/Stereo-Image-Deraining.

Journal Article

Share this book

Add to My Shelf

A Comprehensive Benchmark Analysis of Single Image Deraining: Current Challenges and Future Perspectives

by Hirata, Junior Roberto , Tokuda, Eric K , Ren Wenqi in Algorithms , Benchmarks , Datasets

2021

The capability of image deraining is a highly desirable component of intelligent decision-making in autonomous driving and outdoor surveillance systems. Image deraining aims to restore the clean scene from the degraded image captured in a rainy day. Although numerous single image deraining algorithms have been recently proposed, these algorithms are mainly evaluated using certain type of synthetic images, assuming a specific rain model, plus a few real images. It remains unclear how these algorithms would perform on rainy images acquired “in the wild” and how we could gauge the progress in the field. This paper aims to bridge this gap. We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images of various rain types. This dataset highlights diverse rain models (rain streak, rain drop, rain and mist), as well as a rich variety of evaluation criteria (full- and no-reference objective, subjective, and task-specific). We further provide a comprehensive suite of criteria for deraining algorithm evaluation, including full- and no-reference metrics, subjective evaluation, and the novel task-driven evaluation. The proposed benchmark is accompanied with extensive experimental results that facilitate the assessment of the state-of-the-arts on a quantitative basis. Our evaluation and analysis indicate the gap between the achievable performance on synthetic rainy images and the practical demand on real-world images. We show that, despite many advances, image deraining is still a largely open problem. The paper is concluded by summarizing our general observations, identifying open research challenges and pointing out future directions. Our code and dataset is publicly available at http://uee.me/ddQsw.

Journal Article

Share this book

Add to My Shelf

Two-stage Mamba-based diffusion model for image restoration

by Melo, Silas N. , Wang, Shuai , Wang, Jun in 639/166 , 639/166/987 , Diffusion Mamba

2025

Image restoration is fundamental in computer vision to restore high-quality images from degraded ones. Recently, models such as the transformer and diffusion have shown notable success in addressing this challenge. However, transformer-based methods face high computational costs due to quadratic complexity, while diffusion-based methods often struggle with suboptimal results due to inaccurate noise estimation. This study proposes Diff-Mamba, a two-stage adaptive Mamba-based diffusion model for image restoration. Diff-Mamba integrates the linear complexity state space model (SSM, also known as Mamba) into image restoration, expanding its applicability to visual data generation. Diff-Mamba mainly consists of two parts: the diffusion state space model (DSSM) and the diffusion feedforward neural network (DFNN). DSSM combines Mamba’s high efficiency with the representative power of diffusion models, enhancing both inference and training. DFNN regulates the information flow, enabling each depthwise convolutional layer to focus on the details of image, thus learning more effective local structures for image restoration. The study’s findings, verified through extensive experiments, indicate that Diff-Mamba outperforms both diffusion-based and transformer-based methods in image deraining, denoising, and deblurring, demonstrating competitive restoration performance with various commonly used datasets. Code is available at https://github.com/maluan-ml/Diff-Mamba.

Journal Article

Share this book

Add to My Shelf

EAFormer: Edge-Aware Guided Adaptive Frequency-Navigator Network for Image Restoration

by Wang, Wenrui , Xie, Wenjie , Zhang, Wenshuai in image deburring , image denoising , image deraining

2025

Although many deep learning-based image restoration networks have emerged in various image restoration tasks, most can only perform well in a specific type of restoration task and still face challenges in the general performance of image restoration. The fundamental reason for this problem is that different types of degradation require different frequency features, and the image needs to be adaptively reconstructed according to the characteristics of input degradation. At the same time, we noticed that the previous image restoration network ignored the reconstruction of the edge contour details of the image, resulting in unclear contours of the restored image. Therefore, we proposed an edge-aware guided adaptive frequency navigation network, EAFormer, which extracts edge information in the image by applying different edge detection operators and reconstructs the edge contour details of the image more accurately during the restoration process. The adaptive frequency navigation perceives different frequency components in the image and interactively participates in the subsequent restoration process with high- and low-frequency feature information, better retaining the global structural information of the image and making the restored image more visually coherent and realistic. We verified the versatility of EAFormer in five classic image restoration tasks, and many experimental results also show that our model has advanced performance.

Journal Article

Share this book

Add to My Shelf

PVformer: Pedestrian and Vehicle Detection Algorithm Based on Swin Transformer in Rainy Scenes

by Sun, Zaiming , Liu, Chang’an , Xie, Guangda in Accuracy , Algorithms , Deep learning

2022

Pedestrian and vehicle detection plays a key role in the safe driving of autonomous vehicles. Although transformer-based object detection algorithms have made great progress, the accuracy of detection in rainy scenarios is still challenging. Based on the Swin Transformer, this paper proposes an end-to-end pedestrian and vehicle detection algorithm (PVformer) with deraining module, which improves the image quality and detection accuracy in rainy scenes. Based on Transformer blocks, a four-branch feature mapping model was introduced to achieve deraining from a single image, thereby mitigating the influence of rain streak occlusion on the detector performance. According to the trouble of small object detection only by visual transformer, we designed a local enhancement perception block based on CNN and Transformer. In addition, the deraining module and the detection module were combined to train the PVformer model through transfer learning. The experimental results show that the algorithm performed well on rainy days and significantly improved the accuracy of pedestrian and vehicle detection.

Journal Article

Share this book

Add to My Shelf

Context-Enhanced Representation Learning for Single Image Deraining

by Sun Changming , Wang, Guoqing , Sowmya Arcot in Ablation , Algorithms , Coders

2021

Perception of content and structure in images with rainstreaks or raindrops is challenging, and it often calls for robust deraining algorithms to remove the diversified rainy effects. Much progress has been made on the design of advanced encoder–decoder single image deraining networks. However, most of the existing networks are built in a blind manner and often produce over/under-deraining artefacts. In this paper, we point out, for the first time, that the unsatisfactory results are caused by the highly imbalanced distribution between rainy effects and varied background scenes. Ignoring this phenomenon results in the representation learned by the encoder being biased towards rainy regions, while paying less attention to the valuable contextual regions. To resolve this, a context-enhanced representation learning and deraining network is proposed with a novel two-branch encoder design. Specifically, one branch takes the rainy image directly as input for learning a mixed representation depicting the variation of both rainy regions and contextual regions, and another branch is guided by a carefully learned soft attention mask to learn an embedding only depicting the contextual regions. By combining the embeddings from these two branches with a carefully designed co-occurrence modelling module, and then improving the semantic property of the co-occurrence features via a bi-directional attention layer, the underlying imbalanced learning problem is resolved. Extensive experiments are carried out for removing rainstreaks and raindrops from both synthetic and real rainy images, and the proposed model is demonstrated to produce significantly better results than state-of-the-art models. In addition, comprehensive ablation studies are also performed to analyze the contributions of different designs. Code and pre-trained models will be publicly available at https://github.com/RobinCSIRO/CERLD-Net.git.

Journal Article

Share this book

Add to My Shelf

Cross-domain attention-guided domain adaptive method for image real rain removal

by Wan, Yecong , Han, Minggui , Shao, Mingwen in Adaptation , Computer Communication Networks , Computer Science

2025

Existing image deraining methods often rely on synthetic data, but the domain gap between synthetic and real data causes significant performance degradation in real-world scenarios. To address this issue, we propose a Cross-Domain Attention-Guided domain adaptive deraining network (CDAG-network) that learns rainfall characteristics from both synthetic and real data to achieve better generalizability. Firstly, we introduce cross-attention as a fine-grained domain adaptation constraint into the CDAG-network, to enhance its capability in analyzing features from real and synthetic domains and aligning their distributions. Secondly, in light of the complex nature of rain artifacts, we propose the Mixed-Scale Convolutional Transformer (MSCT) block that effectively captures features from both global and local perspectives and improves the spatial perception of the model. With the two key designs, the CDAG-network demonstrates enhanced efficiency in domain adaptation and degradation modeling. Furthermore, we present a novel model for synthesizing rain images, which more accurately emulates rain effects in real-world scenes. Based on this model, we synthesize 9K synthetic rain images that along with 6K real rain images collected from real scenes constitute a new domain adaptive deraining dataset. Extensive experimental results demonstrate that our approach outperforms recent state-of-the-art methods in real-world rain removal task.

Journal Article

Share this book

Add to My Shelf

A Lightweight Fusion Distillation Network for Image Deblurring and Deraining

by Liu, Yiming , Li, Qiang , Zhang, Yanni in fusion mechanism , image deblurring , image deraining

2021

Recently, deep learning-based image deblurring and deraining have been well developed. However, most of these methods fail to distill the useful features. What is more, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from a high computational burden. We propose a lightweight fusion distillation network (LFDN) for image deblurring and deraining to solve the above problems. The proposed LFDN is designed as an encoder–decoder architecture. In the encoding stage, the image feature is reduced to various small-scale spaces for multi-scale information extraction and fusion without much information loss. Then, a feature distillation normalization block is designed at the beginning of the decoding stage, which enables the network to distill and screen valuable channel information of feature maps continuously. Besides, an information fusion strategy between distillation modules and feature channels is also carried out by the attention mechanism. By fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring and deraining results with a smaller number of parameters and outperform the existing methods in model complexity.

Journal Article

Share this book

Add to My Shelf

Exploring high-quality image deraining Transformer via effective large kernel attention

by Qi, Xuanyu , Jin, Guiyue , Dong, Haobo in Artificial Intelligence , Attention , Computer Graphics

2025

In recent years, Transformer has demonstrated significant performance in single image deraining tasks. However, the standard self-attention in the Transformer makes it difficult to model local features of images effectively. To alleviate the above problem, this paper proposes a high-quality deraining Transformer with e ffective l arge k ernel a ttention, named as ELKAformer. The network employs the Transformer-Style Effective Large Kernel Conv-Block (ELKB), which contains 3 key designs: Large Kernel Attention Block (LKAB), Dynamical Enhancement Feed-forward Network (DEFN), and Edge Squeeze Recovery Block (ESRB) to guide the extraction of rich features. To be specific, LKAB introduces convolutional modulation to substitute vanilla self-attention and achieve better local representations. The designed DEFN refines the most valuable attention values in LKAB, allowing the overall design to better preserve pixel-wise information. Additionally, we develop ESRB to obtain long-range dependencies of different positional information. Massive experimental results demonstrate that this method achieves favorable effects while effectively saving computational costs. Our code is available at github

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter