Catalogue Search | MBRL

A review on the long short-term memory model

in Artificial intelligence , Artificial neural networks , Computer vision

2020

Long short-term memory (LSTM) has transformed both machine learning and neurocomputing fields. According to several online sources, this model has improved Google’s speech recognition, greatly improved machine translations on Google Translate, and the answers of Amazon’s Alexa. This neural system is also employed by Facebook, reaching over 4 billion LSTM-based translations per day as of 2017. Interestingly, recurrent neural networks had shown a rather discrete performance until LSTM showed up. One reason for the success of this recurrent network lies in its ability to handle the exploding/vanishing gradient problem, which stands as a difficult issue to be circumvented when training recurrent or very deep neural networks. In this paper, we present a comprehensive review that covers LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example.

Journal Article

Share this book

Add to My Shelf

Enhanced copy-move forgery detection using deep convolutional neural network (DCNN) employing the ResNet-101 transfer learning model

by Vaishali, Sharma , Neetu, Singh in Computer Communication Networks , Computer Science , Data Structures and Information Theory

2024

The rapid proliferation of high-quality false images on social media sites calls for research on legitimate image recognition systems. Copy-move forgery (CMF), which involves copying portions of an image, is one of the most commonly used image altering methods. Due to the problem of exploding and vanishing gradients, the present Convolutional Neural Network (CNN) model must be trained for up to 100 epochs to achieve the greatest accuracy. In this work, a deep CNN (DCNN) model using the residual network with 101 deep layers has been used. In order to solve the problem of exploding and disappearing gradients, the concept of skip connections has been included in the residual network. In addition, in order to maximize the performance of the suggested ResNet-101 model, the cyclical learning rate (CLR) hyper-parameter is utilized to further tune the model. The model was trained and evaluated using a variety of datasets, including MICC-F600, MICC-F2000, MICC-F220, and CoMoFoD v2. Accuracy, error rate, true positive rate (TPR), false positive rate (FPR), true negative rate (TNR), and false negative rate (FNR) were analyzed quantitatively. The proposed model achieves highest accuracy of 97.75% only after training the model for 5 epochs only for CoMoFoD v2 dataset. For MICC-F220, MICC-F600 and MICC-F2000 datasets the achieved accuracy was 96.09%, 97.63% and 96.87% respectively only after training the model up to 10 epochs. In order to demonstrate the efficacy of the suggested approach, a comparative study with various state-of-the-art-models available in the literature has been presented.

Journal Article

Share this book

Add to My Shelf

Optimizing Recurrent Neural Networks: A Study on Gradient Normalization of Weights for Enhanced Training Efficiency

by Xiang, Bingjie , Huang, Xingwang , Wu, Xinyi in Comparative analysis , Computational linguistics , Deep learning

2024

Recurrent Neural Networks (RNNs) are classical models for processing sequential data, demonstrating excellent performance in tasks such as natural language processing and time series prediction. However, during the training of RNNs, the issues of vanishing and exploding gradients often arise, significantly impacting the model’s performance and efficiency. In this paper, we investigate why RNNs are more prone to gradient problems compared to other common sequential networks. To address this issue and enhance network performance, we propose a method for gradient normalization of network weights. This method suppresses the occurrence of gradient problems by altering the statistical properties of RNN weights, thereby improving training effectiveness. Additionally, we analyze the impact of weight gradient normalization on the probability-distribution characteristics of model weights and validate the sensitivity of this method to hyperparameters such as learning rate. The experimental results demonstrate that gradient normalization enhances the stability of model training and reduces the frequency of gradient issues. On the Penn Treebank dataset, this method achieves a perplexity level of 110.89, representing an 11.48% improvement over conventional gradient descent methods. For prediction lengths of 24 and 96 on the ETTm1 dataset, Mean Absolute Error (MAE) values of 0.778 and 0.592 are attained, respectively, resulting in 3.00% and 6.77% improvement over conventional gradient descent methods. Moreover, selected subsets of the UCR dataset show an increase in accuracy ranging from 0.4% to 6.0%. The gradient normalization method enhances the ability of RNNs to learn from sequential and causal data, thereby holding significant implications for optimizing the training effectiveness of RNN-based models.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter