Catalogue Search | MBRL

InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

by Park, Jihye , Jung, Homin , Kim, Soohyun in Ablation , Annotations , Labels

2024

We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.

Journal Article

Share this book

Add to My Shelf

Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory

by Mowry, Ellen M. , Resnick, Susan M. , Dewey, Blake E. in Algorithms , Anatomy , Disentangle

2021

•Unsupervised MR harmonization without traveling subjects.•Unified latent space for MR contrast synthesis.•A novel framework for disentangling contrast and anatomy in MR images.•Downstream segmentation consistency shows significant improvements after harmonization. In magnetic resonance (MR) imaging, a lack of standardization in acquisition often causes pulse sequence-based contrast variations in MR images from site to site, which impedes consistent measurements in automatic analyses. In this paper, we propose an unsupervised MR image harmonization approach, CALAMITI (Contrast Anatomy Learning and Analysis for MR Intensity Translation and Integration), which aims to alleviate contrast variations in multi-site MR imaging. Designed using information bottleneck theory, CALAMITI learns a globally disentangled latent space containing both anatomical and contrast information, which permits harmonization. In contrast to supervised harmonization methods, our approach does not need a sample population to be imaged across sites. Unlike traditional unsupervised harmonization approaches which often suffer from geometry shifts, CALAMITI better preserves anatomy by design. The proposed method is also able to adapt to a new testing site with a straightforward fine-tuning process. Experiments on MR images acquired from ten sites show that CALAMITI achieves superior performance compared with other harmonization approaches.

Journal Article

Share this book

Add to My Shelf

Unsupervised Image-to-Image Translation: A Review

by Stricker, Didier , Schockaert, Cédric , Hoyez, Henri in computer vision , Datasets , deep learning

2022

Supervised image-to-image translation has been proven to generate realistic images with sharp details and to have good quantitative performance. Such methods are trained on a paired dataset, where an image from the source domain already has a corresponding translated image in the target domain. However, this paired dataset requirement imposes a huge practical constraint, requires domain knowledge or is even impossible to obtain in certain cases. Due to these problems, unsupervised image-to-image translation has been proposed, which does not require domain expertise and can take advantage of a large unlabeled dataset. Although such models perform well, they are hard to train due to the major constraints induced in their loss functions, which make training unstable. Since CycleGAN has been released, numerous methods have been proposed which try to address various problems from different perspectives. In this review, we firstly describe the general image-to-image translation framework and discuss the datasets and metrics involved in the topic. Furthermore, we revise the current state-of-the-art with a classification of existing works. This part is followed by a small quantitative evaluation, for which results were taken from papers.

Journal Article

Share this book

Add to My Shelf

Image Generation: A Review

by Al-Maadeed, Somaya , Elasri, Mohamed , Elharrouss, Omar in Algorithms , Artificial Intelligence , Complex Systems

2022

The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different views for generating an object or a product can be exhaustive and expansive to do manually. Now, using deep learning and artificial intelligence techniques, the generation of new images from different type of data has become possible. For that, a significant effort has been devoted recently to develop image generation strategies with a great achievement. To that end, we present in this paper, to the best of the authors’ knowledge, the first comprehensive overview of existing image generation methods. Accordingly, a description of each image generation technique is performed based on the nature of the adopted algorithms, type of data used, and main objective. Moreover, each image generation category is discussed by presenting the proposed approaches. In addition, a presentation of existing image generation datasets is given. The evaluation metrics that are suitable for each image generation category are discussed and a comparison of the performance of existing solutions is provided to better inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges that are facing this subject are presented.

Journal Article

Share this book

Add to My Shelf

Deep Generative Adversarial Networks for Image-to-Image Translation: A Review

by Alotaibi, Aziz in Algorithms , Computer graphics , Computer vision

2020

Many image processing, computer graphics, and computer vision problems can be treated as image-to-image translation tasks. Such translation entails learning to map one visual representation of a given input to another representation. Image-to-image translation with generative adversarial networks (GANs) has been intensively studied and applied to various tasks, such as multimodal image-to-image translation, super-resolution translation, object transfiguration-related translation, etc. However, image-to-image translation techniques suffer from some problems, such as mode collapse, instability, and a lack of diversity. This article provides a comprehensive overview of image-to-image translation based on GAN algorithms and its variants. It also discusses and analyzes current state-of-the-art image-to-image translation techniques that are based on multimodal and multidomain representations. Finally, open issues and future research directions utilizing reinforcement learning and three-dimensional (3D) modal translation are summarized and discussed.

Journal Article

Share this book

Add to My Shelf

Rain Rendering for Evaluating and Improving Robustness to Bad Weather

by Tremblay Maxime , de Charette Raoul , Lalonde Jean-François in Algorithms , Atmospheric models , Computer vision

2021

Rain fills the atmosphere with water particles, which breaks the common assumption that light travels unaltered from the scene to the camera. While it is well-known that rain affects computer vision algorithms, quantifying its impact is difficult. In this context, we present a rain rendering pipeline that enables the systematic evaluation of common computer vision algorithms to controlled amounts of rain. We present three different ways to add synthetic rain to existing images datasets: completely physic-based; completely data-driven; and a combination of both. The physic-based rain augmentation combines a physical particle simulator and accurate rain photometric modeling. We validate our rendering methods with a user study, demonstrating our rain is judged as much as 73% more realistic than the state-of-the-art. Using our generated rain-augmented KITTI, Cityscapes, and nuScenes datasets, we conduct a thorough evaluation of object detection, semantic segmentation, and depth estimation algorithms and show that their performance decreases in degraded weather, on the order of 15% for object detection, 60% for semantic segmentation, and 6-fold increase in depth estimation error. Finetuning on our augmented synthetic data results in improvements of 21% on object detection, 37% on semantic segmentation, and 8% on depth estimation.

Journal Article

Share this book

Add to My Shelf

Attn‐DeCGAN: A Diversity‐Enhanced CycleGAN With Attention for High‐Fidelity Medical Image Translation

by Selasi Agbemenu, Andrew , Salifu, Amina , Dede, Albert in Attn‐DeCGAN , mode collapse , structural fidelity

2025

Unpaired image‐to‐image translation has emerged as a transformative paradigm in medical imaging, enabling unpaired image translation without the need for aligned datasets. While cycle‐consistent generative adversarial networks (CycleGANs) have shown considerable promise in this domain, they remain inherently constrained by the locality of convolutional operations, resulting in global structural inconsistencies, and by mode collapse, which restricts generative diversity. To overcome these limitations, we propose Attn‐DeCGAN, a novel attention‐augmented, diversity‐aware CycleGAN framework designed to enhance both structural fidelity and perceptual diversity in CT‐MRI translation tasks. Attn‐DeCGAN replaces conventional ResNet‐based generators with Hybrid Perception Blocks (HPBs), which synergise depthwise convolutions for spatially efficient local feature extraction with a Dual‐Pruned Self‐Attention (DPSA) mechanism that enables sparse, content‐adaptive modeling of long‐range dependencies at linear complexity. This architectural innovation facilitates the modeling of anatomically distant relationships while maintaining inference efficiency. The model is trained using a composite loss function incorporating adversarial, cycle‐consistency, identity, and VGG19‐based structural consistency losses to preserve both realism and anatomical detail. Extensive empirical evaluations demonstrate that Attn‐DeCGAN achieves superior performance across key metrics, including the lowest FID scores (60, 58), highest PSNR (27, 33), and statistically significant improvements in perceptual diversity (LPIPS, p<0.05 $$ p<0.05 $$ ) compared to state‐of‐the‐art baselines. Ablation studies underscore the critical role of spectral normalization in stabilizing adversarial training and enhancing attention effectiveness. Expert radiologist assessments confirmed the clinical superiority of Attn‐DeCGAN over the next best baseline, DeCGAN, with 100% real classifications and higher confidence scores in CT synthesis, and more anatomically convincing outputs in MRI translation. This has particular utility in low‐resource clinical environments where MRI is scarce, supporting synthetic MRI generation for diagnosis, radiotherapy planning, and medical image dataset augmentation. Despite increased training complexity, Attn‐DeCGAN retains efficient inference, positioning it as a technically robust and clinically deployable solution for high‐fidelity unpaired medical image translation. A hybrid CycleGAN framework combining CNNs and transformer‐based modules captures local and long‐range anatomical features for improved image translation. With a Dual‐Pruned Self‐Attention mechanism for efficient region‐focused attention, the model demonstrates high clinical fidelity on CT‐MRI tasks, validated through both quantitative metrics and expert radiologist assessments.

Journal Article

Share this book

Add to My Shelf

Spectral normalization and dual contrastive regularization for image-to-image translation

by Cai, Wei-Ling , Zhao, Chen , Yuan, Zheng in Ablation , Design , Generative adversarial networks

2025

Existing image-to-image (I2I) translation methods achieve state-of-the-art performance by incorporating the patch-wise contrastive learning into generative adversarial networks. However, patch-wise contrastive learning only focuses on the local content similarity but neglects the global structure constraint, which affects the quality of the generated images. In this paper, we propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization, namely SN-DCR. To maintain consistency of the global structure and texture, we design the dual contrastive regularization using different deep feature spaces respectively. In order to improve the global structure information of the generated images, we formulate a semantic contrastive loss to make the global semantic structure of the generated images similar to the real images from the target domain in the semantic feature space. We use gram matrices to extract the style of texture from images. Similarly, we design a style contrastive loss to improve the global texture information of the generated images. Moreover, to enhance the stability of the model, we employ the spectral normalized convolutional network in the design of our generator. We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks. The code and pretrained models are available at https://github.com/zhihefang/SN-DCR.

Journal Article

Share this book

Add to My Shelf

Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

by Ming-Hsuan, Yang , Mao Qi , Ma, Siwei in Algorithms , Computer science , Domains

2022

Recent image-to-image (I2I) translation algorithms focus on learning the mapping from a source to a target domain. However, the continuous translation problem that synthesizes intermediate results between two domains has not been well-studied in the literature. Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains. Existing I2I approaches are limited to either intra-domain or deterministic inter-domain continuous translation. In this work, we present an effectively signed attribute vector, which enables continuous translation on diverse mapping paths across various domains. In particular, we introduce a unified attribute space shared by all domains that utilize the sign operation to encode the domain information, thereby allowing the interpolation on attribute vectors of different domains. To enhance the visual quality of continuous translation results, we generate a trajectory between two sign-symmetrical attribute vectors and leverage the domain information of the interpolated results along the trajectory for adversarial training. We evaluate the proposed method on a wide range of I2I translation tasks. Both qualitative and quantitative results demonstrate that the proposed framework generates more high-quality continuous translation results against the state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

Screen-Cam Imitation Module for Improving Data Hiding Robustness

by Evsutin, Oleg , Dzhanashia, Kristina , Fedosov, Aleksandr in Computer-aided manufacturing , data hiding , Digital watermarks

2025

Using an attack-simulation module is a well-recognized approach to improving the robustness of end-to-end neural-network-based data-hiding schemes. However, most proposed attack simulators are limited in the types of attacks they cover, usually handling only a basic set of digital transformations. Real, in-demand use cases for data-hiding methods may involve modifications that cannot be modeled by basic digital transformations such as filtering, noise, or compression. In the screen-cam scenario, when an image containing hidden data is displayed on a screen and captured by a camera, the distortions are much more complex and typically require manual experiments that manipulate physical objects in order to replicate. This hinders both the process of creating applicable data-hiding schemes for this scenario and evaluating their effectiveness. In this work, we propose a generator neural network to simulate screen-cam distortions that can replace the manual, time-consuming operations of replicating this attack in the real world, and we show how it can be used to improve the robustness of an existing data-hiding scheme. In our example, we increased robustness by 15% in terms of bit error rate.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter