Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
4,971
result(s) for
"image generation"
Sort by:
Image Generation: A Review
by
Al-Maadeed, Somaya
,
Elasri, Mohamed
,
Elharrouss, Omar
in
Algorithms
,
Artificial Intelligence
,
Complex Systems
2022
The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different views for generating an object or a product can be exhaustive and expansive to do manually. Now, using deep learning and artificial intelligence techniques, the generation of new images from different type of data has become possible. For that, a significant effort has been devoted recently to develop image generation strategies with a great achievement. To that end, we present in this paper, to the best of the authors’ knowledge, the first comprehensive overview of existing image generation methods. Accordingly, a description of each image generation technique is performed based on the nature of the adopted algorithms, type of data used, and main objective. Moreover, each image generation category is discussed by presenting the proposed approaches. In addition, a presentation of existing image generation datasets is given. The evaluation metrics that are suitable for each image generation category are discussed and a comparison of the performance of existing solutions is provided to better inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges that are facing this subject are presented.
Journal Article
Image Synthesis Under Limited Data: A Survey and Taxonomy
2025
Deep generative models, which target reproducing the data distribution to produce novel images, have made unprecedented advancements in recent years. However, one critical prerequisite for their tremendous success is the availability of a sufficient number of training samples, which requires massive computation resources. When trained on limited data, generative models tend to suffer from severe performance deterioration due to overfitting and memorization. Accordingly, researchers have devoted considerable attention to develop novel models that are capable of generating plausible and diverse images from limited training data recently. Despite numerous efforts to enhance training stability and synthesis quality in the limited data scenarios, there is a lack of a systematic survey that provides (1) a clear problem definition, challenges, and taxonomy of various tasks; (2) an in-depth analysis on the pros, cons, and limitations of existing literature; and (3) a thorough discussion on the potential applications and future directions in this field. To fill this gap and provide an informative introduction to researchers who are new to this topic, this survey offers a comprehensive review and a novel taxonomy on the development of image synthesis under limited data. In particular, it covers the problem definition, requirements, main solutions, popular benchmarks, and remaining challenges in a comprehensive and all-around manner. We hope this survey can provide an informative overview and a valuable resource for researchers and practitioners. Apart from the relevant references, we aim to constantly maintain a timely up-to-date repository to track the latest advances at
awesome-few-shot-generation
.
Journal Article
Open-Vocabulary Text-Driven Human Image Generation
2024
Generating human images from open-vocabulary text descriptions is an exciting but challenging task. Previous methods (i.e., Text2Human) face two challenging problems: (1) they cannot well handle the open-vocabulary setting by arbitrary text inputs (i.e., unseen clothing appearances) and heavily rely on limited preset words (i.e., pattern styles of clothing appearances); (2) the generated human image is inaccuracy in open-vocabulary settings. To alleviate these drawbacks, we propose a flexible diffusion-based framework, namely HumanDiffusion, for open-vocabulary text-driven human image generation (HIG). The proposed framework mainly consists of two novel modules: the Stylized Memory Retrieval (SMR) module and the Multi-scale Feature Mapping (MFM) module. Encoded by the vision-language pretrained CLIP model, we obtain coarse features of the local human appearance. Then, the SMR module utilizes an external database that contains clothing texture details to refine the initial coarse features. Through SMR refreshing, we can achieve the HIG task with arbitrary text inputs, and the range of expression styles is greatly expanded. Later, the MFM module embedding in the diffusion backbone can learn fine-grained appearance features, which effectively achieves precise semantic-coherence alignment of different body parts with appearance features and realizes the accurate expression of desired human appearance. The seamless combination of the proposed novel modules in HumanDiffusion realizes the freestyle and high accuracy of text-guided HIG and editing tasks. Extensive experiments demonstrate that the proposed method can achieve state-of-the-art (SOTA) performance, especially in the open-vocabulary setting.
Journal Article
FineDiffusion: scaling up diffusion models for fine-grained image generation with 10,000 classes
2025
The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A more challenging task, large-scale fine-grained image generation, remains the boundary to explore. In this work, we present a parameter-efficient strategy, called FineDiffusion, to fine-tune large pre-trained diffusion models scaling to large-scale fine-grained image generation with 10,000 categories. FineDiffusion significantly accelerates training and reduces storage overhead by only fine-tuning tiered class embedder, bias terms, and normalization layers’ parameters. To further improve the image generation quality of fine-grained categories, we propose a novel sampling method for fine-grained image generation, which utilizes superclass-conditioned guidance, specifically tailored for fine-grained categories, to replace the conventional classifier-free guidance sampling. Compared to full fine-tuning, FineDiffusion achieves a remarkable 1.56× training speed-up and requires storing merely 1.77% of the total model parameters, while achieving state-of-the-art FID of 9.776 on image generation of 10,000 classes. Extensive qualitative and quantitative experiments demonstrate the superiority of our method compared to other parameter-efficient fine-tuning methods. The code and more generated results are available at our project website: https://finediffusion.github.io/.
Journal Article
Positional Component-Guided Hangul Font Image Generation via Deep Semantic Segmentation and Adversarial Style Transfer
by
Sami, Abdul
,
Kumar, Avinash
,
Memon, Irfanullah
in
Ablation
,
Automation
,
Equipment and supplies
2025
Automated font generation for complex, compositional scripts like Korean Hangul presents a significant challenge due to the 11,172 characters and their complicated component-based structure. While existing component-based methods for font image generation acknowledge the compositional nature of Hangul, they often fail to explicitly leverage the crucial positional semantics of its basic elements as initial, middle, and final components, known as Jamo. This oversight can lead to structural inconsistencies and artifacts in the generated glyphs. This paper introduces a novel two-stage framework that directly addresses this gap by imposing a strong, linguistically informed structural principle on the font image generation process. In the first stage, we employ a You Only Look Once version 8 for Segmentation (YOLOv8-Seg) model, a state-of-the-art instance segmentation network, to decompose Hangul characters into their basic components. Notably, this process generates a dataset of position-aware semantic components, categorizing each jamo according to its structural role within the syllabic block. In the second stage, a conditional Generative Adversarial Network (cGAN) is explicitly conditioned on these extracted positional components to perform style transfer with high structural information. The generator learns to synthesize a character’s appearance by referencing the style of the target components while preserving the content structure of a source character. Our model achieves state-of-the-art performance, reducing L1 loss to 0.2991 and improving the Structural Similarity Index (SSIM) to 0.9798, quantitatively outperforming existing methods like MX-Font and CKFont. This position-guided approach demonstrates significant quantitative and qualitative improvements over existing methods in structured script generation, offering enhanced control over glyph structure and a promising approach for generating font images for other complex, structured scripts.
Journal Article
SSSGAN: Satellite Style and Structure Generative Adversarial Networks
2021
This work presents Satellite Style and Structure Generative Adversarial Network (SSGAN), a generative model of high resolution satellite imagery to support image segmentation. Based on spatially adaptive denormalization modules (SPADE) that modulate the activations with respect to segmentation map structure, in addition to global descriptor vectors that capture the semantic information in a vector with respect to Open Street Maps (OSM) classes, this model is able to produce consistent aerial imagery. By decoupling the generation of aerial images into a structure map and a carefully defined style vector, we were able to improve the realism and geodiversity of the synthesis with respect to the state-of-the-art baseline. Therefore, the proposed model allows us to control the generation not only with respect to the desired structure, but also with respect to a geographic area.
Journal Article
Dual-Branch Feature Decoupling GAN with Wavelet Constraint for Azimuth-Controllable SAR Image Simulation
2026
What are the main findings? * A wavelet-constrained dual-branch framework is proposed to enhance high-frequency scattering detail modeling for SAR images. * A joint control mechanism is introduced to realize the regulation of semantic categories and continuous azimuth. A wavelet-constrained dual-branch framework is proposed to enhance high-frequency scattering detail modeling for SAR images. A joint control mechanism is introduced to realize the regulation of semantic categories and continuous azimuth. What are the implications of the main findings? * The proposed framework can effectively model the high-frequency components of SAR images. * This work presents an effective data augmentation solution with controllable parameters for SAR image interpretation tasks. The proposed framework can effectively model the high-frequency components of SAR images. This work presents an effective data augmentation solution with controllable parameters for SAR image interpretation tasks. Synthetic aperture radar (SAR) is of great value in intelligent image interpretation. However, the acquisition of real SAR data is costly, and manual annotation heavily relies on expert experience. These factors severely restrict the development of SAR intelligent interpretation algorithms. Meanwhile, the high-frequency details of SAR images contain rich target information. Traditional generation methods cannot effectively capture these key features. To address the above issues, this paper proposes a dual-branch feature decoupling generative adversarial network (GAN) with wavelet constraint designed to achieve high-quality and parameter-controllable SAR image generation. The framework leverages discrete wavelet transform (DWT) to separate spatial structure from high-frequency details, which are independently modeled by a structure branch and a detail branch, respectively. A wavelet consistency loss function is introduced to constrain the distribution of generated and real images in high-frequency subbands, thereby enhancing the model’s capability to model scattering details. To fuse features from the two branches, a cross-attention fusion module is adopted to realize the adaptive compensation of structural features with texture details. Furthermore, to achieve joint control over the semantic attributes and azimuth of generated samples, the framework further integrates auxiliary classification and azimuth regression tasks. A multi-task learning mechanism is constructed to realize precise control over the target’s semantic category and azimuth. For the continuous variable of azimuth, an angle-aware hypernetwork transform module is introduced to perform dynamic convolution modulation on the structure branch at the feature map scale, which improves the model’s fine control capability over continuous azimuth variations. Experimental results on the MSTAR dataset demonstrate that the proposed model can significantly improve the semantic consistency and visual fidelity of the generated samples. The generated samples exhibit high statistical alignment with real data distributions, confirming the model’s effectiveness in characterizing the feature space of SAR imagery and enabling controllable SAR data simulation, thereby augmenting datasets for image interpretation tasks.
Journal Article
Optimizing and interpreting the latent space of the conditional text-to-image GANs
by
Schomaker, Lambert
,
Zhang, Zhenxing
in
Algorithms
,
Artificial Intelligence
,
Automatic text generation
2024
Text-to-image generation intends to automatically produce a photo-realistic image, conditioned on a textual description. To facilitate the real-world applications of text-to-image synthesis, we focus on studying the following three issues: (1) How to ensure that generated samples are believable, realistic or natural? (2) How to exploit the latent space of the generator to edit a synthesized image? (3) How to improve the explainability of a text-to-image generation framework? We introduce two new data sets for benchmarking, i.e., the
Good
&
Bad
, bird and face, data sets consisting of successful as well as unsuccessful generated samples. This data set can be used to effectively and efficiently acquire high-quality images by increasing the probability of generating
Good
latent codes with a separate, new classifier. Additionally, we present a novel algorithm which identifies semantically understandable directions in the latent space of a conditional text-to-image GAN architecture by performing independent component analysis on the pre-trained weight values of the generator. Furthermore, we develop a background-flattening loss (BFL), to improve the background appearance in the generated images. Subsequently, we introduce linear-interpolation analysis between pairs of text keywords. This is extended into a similar triangular ‘linguistic’ interpolation. The visual array of interpolation results gives users a deep look into what the text-to-image synthesis model has learned within the linguistic embeddings. Experimental results on the recent DiverGAN generator, pre-trained on three common benchmark data sets demonstrate that our classifier achieves a better than 98% accuracy in predicting Good/Bad classes for synthetic samples and our proposed approach is able to derive various interpretable semantic properties for the text-to-image GAN model.
Journal Article
AI-Image Generation in Research Interviews: Opportunities and Challenges
by
Brüning, Camila
,
Busato, Manuela
,
Valgardsson, Sasha
in
Artificial intelligence
,
Conversation
,
Image generation
2025
Drawing on our experience developing a visual polyvocal narrative of the immigration system in Canada and Brazil, we explore the role of artificial intelligence (AI) image generation as a tool for supporting interview participants in articulating their experiences. We found that the AI image generation process supported participants’ ability to reflect and express their experiences. However, there were several challenges due to technological limitations and inherent biases embedded in the AI, which resulted in unsatisfactory images and repeated image generation attempts. We came to conceptualize the AI image generation tool as a third agent in the interview process, facilitating access to artistic expression yet introducing content into the conversation. We identified five primary roles played by the AI image generation tool in the interview process: Helper (supported the image generation process), Distractor (transferred attention from the topic of study to prompt engineering), Motivator (motivated participants to better articulate their vision), Influencer (introduced content in the conversation), and Facilitator (facilitated reflection and sensemaking). We discuss avenues for maximizing the benefits of AI image generation in interviewing and mitigating its challenges. We contribute to a growing body of research on reflective and arts-based interventions in interviewing by illustrating the role new technologies can play in advancing the potential of interview-based research.
Journal Article
Generating 3D images of material microstructures from a single 2D image: a denoising diffusion approach
by
Sarmad, Muhammad
,
Kiss, Gabriel
,
Lindseth, Frank
in
3D image generation
,
639/301
,
639/705/117
2024
Three-dimensional (3D) images provide a comprehensive view of material microstructures, enabling numerical simulations unachievable with two-dimensional (2D) imaging alone. However, obtaining these 3D images can be costly and constrained by resolution limitations. We introduce a novel method capable of generating large-scale 3D images of material microstructures, such as metal or rock, from a single 2D image. Our approach circumvents the need for 3D image data while offering a cost-effective, high-resolution alternative to existing imaging techniques. Our method combines a denoising diffusion probabilistic model with a generative adversarial network framework. To compensate for the lack of 3D training data, we implement chain sampling, a technique that utilizes the 3D intermediate outputs obtained by reversing the diffusion process. During the training phase, these intermediate outputs are guided by a 2D discriminator. This technique facilitates our method’s ability to gradually generate 3D images that accurately capture the geometric properties and statistical characteristics of the original 2D input. This study features a comparative analysis of the 3D images generated by our method,
SliceGAN
(the current state-of-the-art method), and actual 3D micro-CT images, spanning a diverse set of rock and metal types. The results shown an improvement of up to three times in the Frechet inception distance score, a typical metric for evaluating the performance of image generative models, and enhanced accuracy in derived properties compared to
SliceGAN
. The potential of our method to produce high-resolution and statistically representative 3D images paves the way for new applications in material characterization and analysis domains.
Journal Article