Catalogue Search | MBRL

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

by Liu, Juhua , Tao, Dacheng , Yu, Qiao in Accuracy , Computer science , Computer vision

2024

Font generation presents a significant challenge due to the intricate details needed, especially for languages with complex ideograms and numerous characters, such as Chinese and Korean. Although various few-shot (or even one-shot) font generation methods have been introduced, most of them rely on GAN-based image-to-image translation frameworks that still face (i) unstable training issues, (ii) limited fidelity in replicating font styles, and (iii) imprecise generation of complex characters. To tackle these problems, we propose a unified one-shot font generation framework called Diff-Font, based on the diffusion model. In particular, we approach font generation as a conditional generation task, where the content of characters is managed through predefined embedding tokens and the desired font style is extracted from a one-shot reference image. For glyph-rich characters such as Chinese and Korean, we incorporate additional inputs for strokes or components as fine-grained conditions. Owing to the proposed diffusion training process, these three types of information can be effectively modeled, resulting in stable training. Simultaneously, the integrity of character structures can be learned and preserved. To the best of our knowledge, Diff-Font is the first work to utilize a diffusion model for font generation tasks. Comprehensive experiments demonstrate that Diff-Font outperforms prior font generation methods in both high-fidelity font style replication and the generation of intricate characters. Our method achieves state-of-the-art results in both qualitative and quantitative aspects.

Journal Article

Share this book

Add to My Shelf

Evaluation Metrics for Conditional Image Generation

by Galanti Tomer , Yaniv, Benny , Wolf, Lior in Empirical analysis , Image processing , Upper bounds

2021

We present two new metrics for evaluating generative models in the class-conditional image generation setting. These metrics are obtained by generalizing the two most popular unconditional metrics: the Inception Score (IS) and the Fréchet Inception Distance (FID). A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts. The link takes the form of a product in the case of IS or an upper bound in the FID case. We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models, thus providing additional insights about their performance, from unlearned classes to mode collapse.

Journal Article

Share this book

Add to My Shelf

Multimodal diffusion framework for collaborative text image audio generation and applications

by Wang, Junhua , Zhang, Ouya , Jiang, Yuan in 639/705/1042 , 639/705/258 , Assistive technology

2025

This paper presents a novel framework for collaborative generation across text, image, and audio modalities using an enhanced diffusion model architecture. We introduce a Hierarchical Cross-modal Alignment Network that establishes unified representations while preserving modality-specific characteristics, and a Cross-modal Conditional Diffusion Model that enables flexible generation pathways through innovative conditional embedding and attention-guided mechanisms. Our approach implements cross-modal mutual guidance and consistency optimization to ensure semantic coherence across generated modalities. Experimental evaluations demonstrate significant improvements over state-of-the-art baselines, with an average 11.65% increase in tri-modal semantic alignment. Applications in media content creation, assistive technology, and education show particular promise, with user evaluations confirming enhanced information accessibility and learning experiences. While computational efficiency and domain adaptation remain challenges, this work establishes a foundation for tri-modal collaborative generation that advances multimodal content creation capabilities.

Journal Article

Share this book

Add to My Shelf

MelodyDiffusion: Chord-Conditioned Melody Generation Using a Transformer-Based Diffusion Model

by Sung, Yunsick , Li, Shuyu in Algorithms , Artificial intelligence , Chords (Music)

2023

Artificial intelligence, particularly machine learning, has begun to permeate various real-world applications and is continually being explored in automatic music generation. The approaches to music generation can be broadly divided into two categories: rule-based and data-driven methods. Rule-based approaches rely on substantial prior knowledge and may struggle to handle large datasets, whereas data-driven approaches can solve these problems and have become increasingly popular. However, data-driven approaches still face challenges such as the difficulty of considering long-distance dependencies when handling discrete-sequence data and convergence during model training. Although the diffusion model has been introduced as a generative model to solve the convergence problem in generative adversarial networks, it has not yet been applied to discrete-sequence data. This paper proposes a transformer-based diffusion model known as MelodyDiffusion to handle discrete musical data and realize chord-conditioned melody generation. MelodyDiffusion replaces the U-nets used in traditional diffusion models with transformers to consider the long-distance dependencies using attention and parallel mechanisms. Moreover, a transformer-based encoder is designed to extract contextual information from chords as a condition to guide melody generation. MelodyDiffusion can automatically generate diverse melodies based on the provided chords in practical applications. The evaluation experiments, in which Hits@k was used as a metric to evaluate the restored melodies, demonstrate that the large-scale version of MelodyDiffusion achieves an accuracy of 72.41% (k = 1).

Journal Article

Share this book

Add to My Shelf

RockGPT: reconstructing three-dimensional digital rocks from single two-dimensional slice with deep learning

by Zheng, Qiang , Zhang, Dongxiao in Deep learning , Earth and Environmental Science , Earth Sciences

2022

Random reconstruction of three-dimensional (3D) digital rocks from two-dimensional (2D) slices is crucial for elucidating the microstructure of rocks and its effects on pore-scale flow in terms of numerical modeling, since massive samples are usually required to handle intrinsic uncertainties. Despite remarkable advances achieved by traditional process-based methods, statistical approaches and recently famous deep learning-based models, few works have focused on producing several kinds of rocks with one trained model and allowing the reconstructed samples to approximately satisfy certain given properties, such as porosity. To fill this gap, we propose a new framework with deep learning, named RockGPT, which is composed of VQ-VAE and conditional GPT, to synthesize 3D samples based on a single 2D slice from the perspective of video generation. The VQ-VAE is utilized to compress high-dimensional input video, i.e., the sequence of continuous rock slices, to discrete latent codes and reconstruct them. In order to obtain diverse reconstructions, the discrete latent codes are modeled using conditional GPT in an autoregressive manner, while incorporating conditional information from a given slice, rock type, and porosity. We conduct two experiments on five kinds of rocks, and the results demonstrate that RockGPT can produce different kinds of rocks with a single model, and the porosities of reconstructed samples can distribute around specified targets with a narrow range. In a broader sense, through leveraging the proposed conditioning scheme, RockGPT constitutes an effective way to build a general model to produce multiple kinds of rocks simultaneously that also satisfy user-defined properties.

Journal Article

Share this book

Add to My Shelf

Molecular Generation for Desired Transcriptome Changes With Adversarial Autoencoders

by Zhebrak, Alexander , Shayakhmetov, Rim , Aliper, Alexander in adversarial autoencoders , conditional generation , Datasets

2020

Gene expression profiles are useful for assessing the efficacy and side effects of drugs. In this paper, we propose a new generative model that infers drug molecules that could induce a desired change in gene expression. Our model-the Bidirectional Adversarial Autoencoder-explicitly separates cellular processes captured in gene expression changes into two feature sets: those and to the drug incubation. The model uses features to produce a drug hypothesis. We have validated our model on the LINCS L1000 dataset by generating molecular structures in the SMILES format for the desired transcriptional response. In the experiments, we have shown that the proposed model can generate novel molecular structures that could induce a given gene expression change or predict a gene expression difference after incubation of a given molecular structure. The code of the model is available at https://github.com/insilicomedicine/BiAAE.

Journal Article

Share this book

Add to My Shelf

Optimal Allocation of Energy Storage Capacity in Microgrids Considering the Uncertainty of Renewable Energy Generation

by Wei, Wei , Chen, Xi , Fang, Yi in Adaptability , Algorithms , Alternative energy sources

2023

The high dimensionality and uncertainty of renewable energy generation restrict the ability of the microgrid to consume renewable energy. Therefore, it is necessary to fully consider the renewable energy generation of each day and time period in a long dispatching period during the deployment of energy storage in the microgrid. To this end, a typical multi-day scenario set is used as the simulation operation scenario, and an optimal allocation method of microgrid energy storage capacity considering the uncertainty of renewable energy generation is designed. Firstly, the historical scenarios are clustered into K types of daily state types using the K-means algorithm, and the corresponding probability distribution is obtained. Secondly, the Latin hypercube sampling method is used to obtain the state type of each day in a multi-day scenario set. Then, the daily scenario generation method based on conditional generative adversarial networks is used to generate a multi-day scenario set, combining the day state type as a condition, and then the typical scenario set is obtained using scenario reduction. Furthermore, a double-layer optimization allocation model for the energy storage capacity of microgrids is constructed, in which the upper layer optimizes the energy storage allocation capacity and the lower layer optimizes the operation plans of microgrids in each typical scenario. Finally, the proposed model is solved using the PSO algorithm nested with the CPLEX solver. In the microgrid example, the proposed method reduces the expected annual total cost by 19.66% compared with the stochastic optimal allocation method that assumes the scenic power obeys a specific distribution, proving that it can better cope with the uncertainty of renewable energy generation. At the same time, the expected annual total cost is reduced by 6.99% compared with the optimal allocation method that generates typical daily scenarios based on generative adversarial networks, which proves that it can better cope with the high dimensionality of renewable energy generation.

Journal Article

Share this book

Add to My Shelf

Image generation step by step: animation generation-image translation

by Li, Bo , Ding Hongwei , Jing Beibei in Animation , Anime , Edge detection

2022

Generative adversarial networks play an important role in image generation, but the successful generation of high-resolution images from complex data sets remains a challenging goal. In this paper, we propose the LGAN (Link Generative Adversarial Networks) model, which can effectively enhance the quality of the synthesized images. The LGAN model consists of two parts, G1 and G2. G1 is responsible for the unconditional generation part, which generates anime images with highly abstract features containing few coefficients but continuous image elements covering the overall image features. Moreover, G2 is responsible for the conditional generation part (image translation), consisting of mapping and Superresolution networks. The mapping network fills the output of G1 into the real-world image after semantic segmentation or edge detection processing; the Superresolution network super-resolves the actual picture after completing mapping to improve the image’s resolution. In the comparison test with WGAN, SAGAN, WGAN-GP and PG-GAN, this paper’s LGAN(SEG) leads 64.36 and 12.28, respectively, fully proving the model’s superiority.

Journal Article

Share this book

Add to My Shelf

Noise Reduction Power Stealing Detection Model Based on Self-Balanced Data Set

by Liu, Haiqing , Li, Zhiqiao , Li, Yuancheng in Accuracy , Algorithms , Classification

2020

In recent years, various types of power theft incidents have occurred frequently, and the training of the power-stealing detection model is susceptible to the influence of the imbalanced data set and the data noise, which leads to errors in power-stealing detection. Therefore, a power-stealing detection model is proposed, which is based on Improved Conditional Generation Adversarial Network (CWGAN), Stacked Convolution Noise Reduction Autoencoder (SCDAE) and Lightweight Gradient Boosting Decision Machine (LightGBM). The model performs Generation- Adversarial operations on the original unbalanced power consumption data to achieve the balance of electricity data, and avoids the interference of the imbalanced data set on classifier training. In addition, the convolution method is used to stack the noise reduction auto-encoder to achieve dimension reduction of power consumption data, extract data features and reduce the impact of random noise. Finally, LightGBM is used for power theft detection. The experiments show that CWGAN can effectively balance the distribution of power consumption data. Comparing the detection indicators of the power-stealing model with various advanced power-stealing models on the same data set, it is finally proved that the proposed model is superior to other models in the detection of power stealing.

Journal Article

Share this book

Add to My Shelf

Set-conditional set generation for particle physics

by Kakati, Nilotpal , Ganguly, Sanmay , Dreyer, Etienne in conditional generation , fast simulation , graph networks

2023

The simulation of particle physics data is a fundamental but computationally intensive ingredient for physics analysis at the large Hadron collider, where observational set-valued data is generated conditional on a set of incoming particles. To accelerate this task, we present a novel generative model based on a graph neural network and slot-attention components, which exceeds the performance of pre-existing baselines.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter