Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
195
result(s) for
"conditional generation"
Sort by:
Diff-Font: Diffusion Model for Robust One-Shot Font Generation
2024
Font generation presents a significant challenge due to the intricate details needed, especially for languages with complex ideograms and numerous characters, such as Chinese and Korean. Although various few-shot (or even one-shot) font generation methods have been introduced, most of them rely on GAN-based image-to-image translation frameworks that still face (i) unstable training issues, (ii) limited fidelity in replicating font styles, and (iii) imprecise generation of complex characters. To tackle these problems, we propose a unified one-shot font generation framework called Diff-Font, based on the diffusion model. In particular, we approach font generation as a conditional generation task, where the content of characters is managed through predefined embedding tokens and the desired font style is extracted from a one-shot reference image. For glyph-rich characters such as Chinese and Korean, we incorporate additional inputs for strokes or components as fine-grained conditions. Owing to the proposed diffusion training process, these three types of information can be effectively modeled, resulting in stable training. Simultaneously, the integrity of character structures can be learned and preserved. To the best of our knowledge, Diff-Font is the first work to utilize a diffusion model for font generation tasks. Comprehensive experiments demonstrate that Diff-Font outperforms prior font generation methods in both high-fidelity font style replication and the generation of intricate characters. Our method achieves state-of-the-art results in both qualitative and quantitative aspects.
Journal Article
Evaluation Metrics for Conditional Image Generation
by
Galanti Tomer
,
Yaniv, Benny
,
Wolf, Lior
in
Empirical analysis
,
Image processing
,
Upper bounds
2021
We present two new metrics for evaluating generative models in the class-conditional image generation setting. These metrics are obtained by generalizing the two most popular unconditional metrics: the Inception Score (IS) and the Fréchet Inception Distance (FID). A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts. The link takes the form of a product in the case of IS or an upper bound in the FID case. We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models, thus providing additional insights about their performance, from unlearned classes to mode collapse.
Journal Article
Multimodal diffusion framework for collaborative text image audio generation and applications
2025
This paper presents a novel framework for collaborative generation across text, image, and audio modalities using an enhanced diffusion model architecture. We introduce a Hierarchical Cross-modal Alignment Network that establishes unified representations while preserving modality-specific characteristics, and a Cross-modal Conditional Diffusion Model that enables flexible generation pathways through innovative conditional embedding and attention-guided mechanisms. Our approach implements cross-modal mutual guidance and consistency optimization to ensure semantic coherence across generated modalities. Experimental evaluations demonstrate significant improvements over state-of-the-art baselines, with an average 11.65% increase in tri-modal semantic alignment. Applications in media content creation, assistive technology, and education show particular promise, with user evaluations confirming enhanced information accessibility and learning experiences. While computational efficiency and domain adaptation remain challenges, this work establishes a foundation for tri-modal collaborative generation that advances multimodal content creation capabilities.
Journal Article
MelodyDiffusion: Chord-Conditioned Melody Generation Using a Transformer-Based Diffusion Model
2023
Artificial intelligence, particularly machine learning, has begun to permeate various real-world applications and is continually being explored in automatic music generation. The approaches to music generation can be broadly divided into two categories: rule-based and data-driven methods. Rule-based approaches rely on substantial prior knowledge and may struggle to handle large datasets, whereas data-driven approaches can solve these problems and have become increasingly popular. However, data-driven approaches still face challenges such as the difficulty of considering long-distance dependencies when handling discrete-sequence data and convergence during model training. Although the diffusion model has been introduced as a generative model to solve the convergence problem in generative adversarial networks, it has not yet been applied to discrete-sequence data. This paper proposes a transformer-based diffusion model known as MelodyDiffusion to handle discrete musical data and realize chord-conditioned melody generation. MelodyDiffusion replaces the U-nets used in traditional diffusion models with transformers to consider the long-distance dependencies using attention and parallel mechanisms. Moreover, a transformer-based encoder is designed to extract contextual information from chords as a condition to guide melody generation. MelodyDiffusion can automatically generate diverse melodies based on the provided chords in practical applications. The evaluation experiments, in which Hits@k was used as a metric to evaluate the restored melodies, demonstrate that the large-scale version of MelodyDiffusion achieves an accuracy of 72.41% (k = 1).
Journal Article
RockGPT: reconstructing three-dimensional digital rocks from single two-dimensional slice with deep learning
by
Zheng, Qiang
,
Zhang, Dongxiao
in
Deep learning
,
Earth and Environmental Science
,
Earth Sciences
2022
Random reconstruction of three-dimensional (3D) digital rocks from two-dimensional (2D) slices is crucial for elucidating the microstructure of rocks and its effects on pore-scale flow in terms of numerical modeling, since massive samples are usually required to handle intrinsic uncertainties. Despite remarkable advances achieved by traditional process-based methods, statistical approaches and recently famous deep learning-based models, few works have focused on producing several kinds of rocks with one trained model and allowing the reconstructed samples to approximately satisfy certain given properties, such as porosity. To fill this gap, we propose a new framework with deep learning, named RockGPT, which is composed of VQ-VAE and conditional GPT, to synthesize 3D samples based on a single 2D slice from the perspective of video generation. The VQ-VAE is utilized to compress high-dimensional input video, i.e., the sequence of continuous rock slices, to discrete latent codes and reconstruct them. In order to obtain diverse reconstructions, the discrete latent codes are modeled using conditional GPT in an autoregressive manner, while incorporating conditional information from a given slice, rock type, and porosity. We conduct two experiments on five kinds of rocks, and the results demonstrate that RockGPT can produce different kinds of rocks with a single model, and the porosities of reconstructed samples can distribute around specified targets with a narrow range. In a broader sense, through leveraging the proposed conditioning scheme, RockGPT constitutes an effective way to build a general model to produce multiple kinds of rocks simultaneously that also satisfy user-defined properties.
Journal Article
Molecular Generation for Desired Transcriptome Changes With Adversarial Autoencoders
by
Zhebrak, Alexander
,
Shayakhmetov, Rim
,
Aliper, Alexander
in
adversarial autoencoders
,
conditional generation
,
Datasets
2020
Gene expression profiles are useful for assessing the efficacy and side effects of drugs. In this paper, we propose a new generative model that infers drug molecules that could induce a desired change in gene expression. Our model-the Bidirectional Adversarial Autoencoder-explicitly separates cellular processes captured in gene expression changes into two feature sets: those
and
to the drug incubation. The model uses
features to produce a drug hypothesis. We have validated our model on the LINCS L1000 dataset by generating molecular structures in the SMILES format for the desired transcriptional response. In the experiments, we have shown that the proposed model can generate novel molecular structures that could induce a given gene expression change or predict a gene expression difference after incubation of a given molecular structure. The code of the model is available at https://github.com/insilicomedicine/BiAAE.
Journal Article
Optimal Allocation of Energy Storage Capacity in Microgrids Considering the Uncertainty of Renewable Energy Generation
2023
The high dimensionality and uncertainty of renewable energy generation restrict the ability of the microgrid to consume renewable energy. Therefore, it is necessary to fully consider the renewable energy generation of each day and time period in a long dispatching period during the deployment of energy storage in the microgrid. To this end, a typical multi-day scenario set is used as the simulation operation scenario, and an optimal allocation method of microgrid energy storage capacity considering the uncertainty of renewable energy generation is designed. Firstly, the historical scenarios are clustered into K types of daily state types using the K-means algorithm, and the corresponding probability distribution is obtained. Secondly, the Latin hypercube sampling method is used to obtain the state type of each day in a multi-day scenario set. Then, the daily scenario generation method based on conditional generative adversarial networks is used to generate a multi-day scenario set, combining the day state type as a condition, and then the typical scenario set is obtained using scenario reduction. Furthermore, a double-layer optimization allocation model for the energy storage capacity of microgrids is constructed, in which the upper layer optimizes the energy storage allocation capacity and the lower layer optimizes the operation plans of microgrids in each typical scenario. Finally, the proposed model is solved using the PSO algorithm nested with the CPLEX solver. In the microgrid example, the proposed method reduces the expected annual total cost by 19.66% compared with the stochastic optimal allocation method that assumes the scenic power obeys a specific distribution, proving that it can better cope with the uncertainty of renewable energy generation. At the same time, the expected annual total cost is reduced by 6.99% compared with the optimal allocation method that generates typical daily scenarios based on generative adversarial networks, which proves that it can better cope with the high dimensionality of renewable energy generation.
Journal Article
Image generation step by step: animation generation-image translation
2022
Generative adversarial networks play an important role in image generation, but the successful generation of high-resolution images from complex data sets remains a challenging goal. In this paper, we propose the LGAN (Link Generative Adversarial Networks) model, which can effectively enhance the quality of the synthesized images. The LGAN model consists of two parts, G1 and G2. G1 is responsible for the unconditional generation part, which generates anime images with highly abstract features containing few coefficients but continuous image elements covering the overall image features. Moreover, G2 is responsible for the conditional generation part (image translation), consisting of mapping and Superresolution networks. The mapping network fills the output of G1 into the real-world image after semantic segmentation or edge detection processing; the Superresolution network super-resolves the actual picture after completing mapping to improve the image’s resolution. In the comparison test with WGAN, SAGAN, WGAN-GP and PG-GAN, this paper’s LGAN(SEG) leads 64.36 and 12.28, respectively, fully proving the model’s superiority.
Journal Article
Noise Reduction Power Stealing Detection Model Based on Self-Balanced Data Set
2020
In recent years, various types of power theft incidents have occurred frequently, and the training of the power-stealing detection model is susceptible to the influence of the imbalanced data set and the data noise, which leads to errors in power-stealing detection. Therefore, a power-stealing detection model is proposed, which is based on Improved Conditional Generation Adversarial Network (CWGAN), Stacked Convolution Noise Reduction Autoencoder (SCDAE) and Lightweight Gradient Boosting Decision Machine (LightGBM). The model performs Generation- Adversarial operations on the original unbalanced power consumption data to achieve the balance of electricity data, and avoids the interference of the imbalanced data set on classifier training. In addition, the convolution method is used to stack the noise reduction auto-encoder to achieve dimension reduction of power consumption data, extract data features and reduce the impact of random noise. Finally, LightGBM is used for power theft detection. The experiments show that CWGAN can effectively balance the distribution of power consumption data. Comparing the detection indicators of the power-stealing model with various advanced power-stealing models on the same data set, it is finally proved that the proposed model is superior to other models in the detection of power stealing.
Journal Article
Set-conditional set generation for particle physics
by
Kakati, Nilotpal
,
Ganguly, Sanmay
,
Dreyer, Etienne
in
conditional generation
,
fast simulation
,
graph networks
2023
The simulation of particle physics data is a fundamental but computationally intensive ingredient for physics analysis at the large Hadron collider, where observational set-valued data is generated conditional on a set of incoming particles. To accelerate this task, we present a novel generative model based on a graph neural network and slot-attention components, which exceeds the performance of pre-existing baselines.
Journal Article