Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
28 result(s) for "gradient normalization"
Sort by:
SUGAN: A Stable U-Net Based Generative Adversarial Network
As one of the representative models in the field of image generation, generative adversarial networks (GANs) face a significant challenge: how to make the best trade-off between the quality of generated images and training stability. The U-Net based GAN (U-Net GAN), a recently developed approach, can generate high-quality synthetic images by using a U-Net architecture for the discriminator. However, this model may suffer from severe mode collapse. In this study, a stable U-Net GAN (SUGAN) is proposed to mainly solve this problem. First, a gradient normalization module is introduced to the discriminator of U-Net GAN. This module effectively reduces gradient magnitudes, thereby greatly alleviating the problems of gradient instability and overfitting. As a result, the training stability of the GAN model is improved. Additionally, in order to solve the problem of blurred edges of the generated images, a modified residual network is used in the generator. This modification enhances its ability to capture image details, leading to higher-definition generated images. Extensive experiments conducted on several datasets show that the proposed SUGAN significantly improves over the Inception Score (IS) and Fréchet Inception Distance (FID) metrics compared with several state-of-the-art and classic GANs. The training process of our SUGAN is stable, and the quality and diversity of the generated samples are higher. This clearly demonstrates the effectiveness of our approach for image generation tasks. The source code and trained model of our SUGAN have been publicly released.
Optimizing Recurrent Neural Networks: A Study on Gradient Normalization of Weights for Enhanced Training Efficiency
Recurrent Neural Networks (RNNs) are classical models for processing sequential data, demonstrating excellent performance in tasks such as natural language processing and time series prediction. However, during the training of RNNs, the issues of vanishing and exploding gradients often arise, significantly impacting the model’s performance and efficiency. In this paper, we investigate why RNNs are more prone to gradient problems compared to other common sequential networks. To address this issue and enhance network performance, we propose a method for gradient normalization of network weights. This method suppresses the occurrence of gradient problems by altering the statistical properties of RNN weights, thereby improving training effectiveness. Additionally, we analyze the impact of weight gradient normalization on the probability-distribution characteristics of model weights and validate the sensitivity of this method to hyperparameters such as learning rate. The experimental results demonstrate that gradient normalization enhances the stability of model training and reduces the frequency of gradient issues. On the Penn Treebank dataset, this method achieves a perplexity level of 110.89, representing an 11.48% improvement over conventional gradient descent methods. For prediction lengths of 24 and 96 on the ETTm1 dataset, Mean Absolute Error (MAE) values of 0.778 and 0.592 are attained, respectively, resulting in 3.00% and 6.77% improvement over conventional gradient descent methods. Moreover, selected subsets of the UCR dataset show an increase in accuracy ranging from 0.4% to 6.0%. The gradient normalization method enhances the ability of RNNs to learn from sequential and causal data, thereby holding significant implications for optimizing the training effectiveness of RNN-based models.
Restoring Spectral Symmetry in Gradients: A Normalization Approach for Efficient Neural Network Training
Neural network training often suffers from spectral asymmetry, where gradient energy is disproportionately allocated to high-frequency components, leading to suboptimal convergence and reduced efficiency. This paper introduces Gradient Spectral Normalization (GSN), a novel optimization technique designed to restore spectral symmetry by dynamically reshaping gradient distributions in the frequency domain. GSN transforms gradients using FFT, applies layer-specific energy redistribution to enforce a symmetric balance between low- and high-frequency components, and reconstructs the gradients for parameter updates. By tailoring normalization schedules for attention and MLP layers, GSN enhances inference performance and improves model accuracy with minimal overhead. Our approach leverages the principle of symmetry to create more stable and efficient neural systems, offering a practical solution for resource-constrained environments. This frequency-domain paradigm, grounded in symmetry restoration, opens new directions for neural network optimization with broad implications for large-scale AI systems.
Dynamic Simulation Model-Driven Fault Diagnosis Method for Bearing under Missing Fault-Type Samples
Existing generative adversarial networks (GAN) have potential in data augmentation and in the intelligent fault diagnosis of bearings. However, most relevant studies only focus on the fault diagnosis of rotating machines with sufficient fault-type samples, and some rare fault-type samples may be missing in training in practical engineering. To address those deficiencies, this paper presents an intelligent fault diagnosis method based on the dynamic simulation model and Wasserstein generative adversarial network with gradient normalization (WGAN-GN). The dynamic simulation model of bearing faults is constructed to obtaining simulation signals to replace and complement the missing fault samples, which are combined with the measured signals as training data and then input into the proposed WGAN-GN model for expanding and enhancing the data. To test the effectiveness of the simulated samples, a fault classification model constructed by stacked autoencoders (SAE) is used to classify the enhanced dataset. According to the results, the proposed model performs well when used to diagnose faults under missing samples and is preferable to other methods.
Cycle Generative Adversarial Network Based on Gradient Normalization for Infrared Image Generation
Image generation technology is currently one of the popular directions in computer vision research, especially regarding infrared imaging, bearing critical applications in the military field. Existing algorithms for generating infrared images from visible images are usually weak in perceiving the salient regions of images and cannot effectively highlight the ability to generate texture details in infrared images, resulting in less texture details and poorer generated image quality. In this study, a cycle generative adversarial network method based on gradient normalization was proposed to address the current problems of poor infrared image generation, lack of texture detail and unstable models. First, to address the problem of limited feature extraction capability of the UNet generator network that makes the generated IR images blurred and of low quality, the use of the residual network with better feature extraction capability in the generator was employed to make the generated infrared images highly defined. Secondly, in order to solve issues concerning severe lack of detailed information in the generated infrared images, channel attention and spatial attention mechanisms were introduced into the ResNet with the attention mechanism used to weight the generated infrared image features in order to enhance feature perception of the prominent regions of the image, helping to generate image details. Finally, to tackle the problem where the current training models of adversarial generator networks are insufficiently stable, which leads to easy collapse of the model, a gradient normalization module was introduced in the discriminator network to stabilize the model and render it less prone to collapse during the training process. The experimental results on several datasets showed that the proposed method obtained satisfactory data in terms of objective evaluation metrics. Compared with the cycle generative adversarial network method, the proposed method in this work exhibited significant improvement in data validity on multiple datasets.
An Adaptive Weight Physics-Informed Neural Network for Vortex-Induced Vibration Problems
Vortex-induced vibration (VIV) is a common fluid–structure interaction phenomenon in practical engineering with significant research value. Traditional methods to solve VIV issues include experimental studies and numerical simulations. However, experimental studies are costly and time-consuming, while numerical simulations are constrained by low Reynolds numbers and simplified models. Deep learning (DL) can successfully capture VIV patterns and generate accurate predictions by using a large amount of training data. The Physics-Informed Neural Network (PINN), a subfield of DL, introduces physics equations into the loss function to reduce the need for large data. Nevertheless, PINN loss functions often include multiple loss terms, which may interact with each other, causing imbalanced training speeds and a potentially inferior overall performance. To address this issue, this study proposes an Adaptive Weight Physics-Informed Neural Network (AW-PINN) algorithm built upon a gradient normalization method (GradNorm) from multi-task learning. The AW-PINN regulates the weights of each loss term by computing the gradient norms on the network weights, ensuring the norms of the loss terms match predefined target values. This ensures balanced training speeds for each loss term and improves both the prediction precision and robustness of the network model. In this study, a VIV dataset of a cylindrical body with different degrees of freedom is used to compare the performance of the PINN and three PINN optimization algorithms. The findings suggest that, compared to a standard PINN, the AW-PINN lowers the mean squared error (MSE) on the test set by 50%, significantly improving the prediction accuracy. The AW-PINN also demonstrates an enhanced stability across different datasets, confirming its robustness and reliability for VIV modeling. Compared with existing methods in the literature, the AW-PINN achieves a comparable lift prediction accuracy using merely 1% of the training data, while simultaneously improving the prediction accuracy of the peak lift.
Research on mobile traffic data augmentation methods based on SA-ACGAN-GN
With the rapid development and application of the mobile Internet, it is necessary to analyze and classify mobile traffic to meet the needs of users. Due to the difficulty in collecting some application data, the mobile traffic data presents a long-tailed distribution, resulting in a decrease in classification accuracy. In addition, the original GAN is difficult to train, and it is prone to \"mode collapse\". Therefore, this paper introduces the self-attention mechanism and gradient normalization into the auxiliary classifier generative adversarial network to form SA-ACGAN-GN model to solve the long-tailed distribution and training stability problems of mobile traffic data. This method firstly converts the traffic into images; secondly, to improve the quality of the generated images, the self-attention mechanism is introduced into the ACGAN model to obtain the global geometric features of the images; finally, the gradient normalization strategy is added to SA-ACGAN to further improve the data augmentation effect and improve the training stability. It can be seen from the cross-validation experimental data that, on the basis of using the same classifier, the SA-ACGAN-GN algorithm proposed in this paper, compared with other comparison algorithms, has the best precision reaching 93.8%; after adding gradient normalization, during the training process of the model, the classification loss decreases rapidly and the loss curve fluctuates less, indicating that the method proposed in this paper can not only effectively improve the long-tail problem of the dataset, but also enhance the stability of the model training.
ANALYSIS AND COMPUTATION FOR GROUND STATE SOLUTIONS OF BOSE—FERMI MIXTURES AT ZERO TEMPERATURE
Previous numerical studies on the ground state structure of Bose—Fermi mixtures mostly relied on Thomas—Fermi (TF) approximation for the Fermi gas. In this paper, we establish the existence and uniqueness of ground state solutions of Bose—Fermi mixtures at zero temperature for both a coupled Gross—Pitaevskii (GP) equations model and a model with TF approximation for fermions. To prove the uniqueness, the key is to estimate the L ∞ bounds of the ground state solution. By implementing an efficient method—gradient flow with discrete normalization with backward Euler finite difference discretization—to compute the coupled GP equations, we report extensive numerical results in one and two dimensions. The numerical experiments show that we can also extract many interesting phenomena without reference to TF approximation for the fermions. Finally, we numerically compare the ground state solutions for the coupled GP equations model and the model with TF approximation for fermions as well as for the model with TF approximations for both bosons and fermions.
Generalized Gradient Flow Based Saliency for Pruning Deep Convolutional Neural Networks
Model filter pruning has shown efficiency in compressing deep convolutional neural networks by removing unimportant filters without sacrificing the performance. However, most existing criteria are empirical, and overlook the relationship between channel saliencies and the non-linear activation functions within the networks. To address these problems, we propose a novel channel pruning method coined gradient flow based saliency (GFBS). Instead of relying on the magnitudes of the entire feature maps, GFBS evaluates the channel saliencies from the gradient flow perspective and only requires the information in normalization and activation layers. Concretely, we first integrate the effects of normalization and ReLU activation layers into convolutional layers based on Taylor expansion. Then, through backpropagation, the derived channel saliency of each layer is indicated by of the first-order Taylor polynomial of the scaling parameter and the signed shifting parameter in the normalization layers. To validate the efficiency and generalization ability of GFBS, we conduct extensive experiments on various tasks, including image classification (CIFAR, ImageNet), image denoising, object detection, and 3D object classification. GFBS could feasibly cooperate with the baseline networks and compress them with only negligible performance drop. Moreover, we extended our method to pruning scratch networks and GFBS is capable to identify subnetworks with comparable performance with the baseline model at an early training stage. Our code has been released at https://github.com/CUHK-AIM-Group/GFBS.
AI-enabled Barilai–Borwein–Blinder–Oaxaca–Bernoulli Deep Classifier for Enhanced Crop Yield Prediction
This article explores the integration of advanced Artificial Intelligence (AI) enabled deep learning methods with accurate crop yield prediction. The objective of the work is to enhance the accuracy, sensitivity, and specificity of crop yield prediction. Also, false positive and false negative cases are minimized in crop yield prediction. AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) is proposed including preprocessing, feature selection, and crop yield prediction. First, the raw samples were collected from the crop yield prediction dataset. Barilai–Borwein Gradient Min–max Normalization-based preprocessing is applied to eliminate all missing values. Second, to provide fine-grained feature subsets, the Blinder–Oaxaca Statistical Decomposition-based feature selection method is used. Finally, an AI-enabled Bernoulli Deep Belief Network is designed to predict the crop yield. The empirical results demonstrate that the BBO-BDC technique significantly improves the accuracy up to 12%, specificity up to 15%, and sensitivity up to 3% with feature selection. Furthermore, the BBO-BDC technique realizes a substantial reduction in convergence speed by 29% and 51% reduction in overhead compared to conventional methods with feature selection. The study of innovative AI method integrated with Min–max Normalization, Barilai–Borwein gradient, Blinder–Oaxaca decomposition function, Deep Belief Network, Xavier Initialization function, Bernoulli distribution function, and Principal Components for achieving better performance in crop yield prediction.