Catalogue Search | MBRL

Stochastic Gradient Descent with Polyak’s Learning Rate

by Prazeres, Mariana , Oberman, Adam M. in Algorithms , Artificial neural networks , Asymptotic methods

2021

Stochastic gradient descent (SGD) for strongly convex functions converges at the rate O ( 1 / k ) . However, achieving good results in practice requires tuning the parameters (for example the learning rate) of the algorithm. In this paper we propose a generalization of the Polyak step size, used for subgradient methods, to stochastic gradient descent. We prove a non-asymptotic convergence at the rate O ( 1 / k ) with a rate constant which can be better than the corresponding rate constant for optimally scheduled SGD. We demonstrate that the method is effective in practice, and on convex optimization problems and on training deep neural networks, and compare to the theoretical rate.

Journal Article

Share this book

Add to My Shelf

Aerial Scene Classification through Fine-Tuning with Adaptive Learning Rates and Label Smoothing

by Atanasova-Pacemska, Tatjana , Mignone, Paolo , Corizzo, Roberto in convolutional neural network , cyclical learning rates , fine-tuning

2020

Remote Sensing (RS) image classification has recently attracted great attention for its application in different tasks, including environmental monitoring, battlefield surveillance, and geospatial object detection. The best practices for these tasks often involve transfer learning from pre-trained Convolutional Neural Networks (CNNs). A common approach in the literature is employing CNNs for feature extraction, and subsequently train classifiers exploiting such features. In this paper, we propose the adoption of transfer learning by fine-tuning pre-trained CNNs for end-to-end aerial image classification. Our approach performs feature extraction from the fine-tuned neural networks and remote sensing image classification with a Support Vector Machine (SVM) model with linear and Radial Basis Function (RBF) kernels. To tune the learning rate hyperparameter, we employ a linear decay learning rate scheduler as well as cyclical learning rates. Moreover, in order to mitigate the overfitting problem of pre-trained models, we apply label smoothing regularization. For the fine-tuning and feature extraction process, we adopt the Inception-v3 and Xception inception-based CNNs, as well the residual-based networks ResNet50 and DenseNet121. We present extensive experiments on two real-world remote sensing image datasets: AID and NWPU-RESISC45. The results show that the proposed method exhibits classification accuracy of up to 98%, outperforming other state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

Adaptive hierarchical hyper-gradient descent

by Tran, Minh-Ngoc , Vasnev, Andrey , Gao, Junbin in Adaptation , Adaptive learning , Artificial Intelligence

2022

Adaptive learning rate strategies can lead to faster convergence and better performance for deep learning models. There are some widely known human-designed adaptive optimizers such as Adam and RMSProp, gradient based adaptive methods such as hyper-descent and practical loss-based stepsize adaptation (L4), and meta learning approaches including learning to learn. However, the existing studies did not take into account the hierarchical structures of deep neural networks in designing the adaptation strategies. Meanwhile, the issue of balancing adaptiveness and convergence is still an open question to be answered. In this study, we investigate novel adaptive learning rate strategies at different levels based on the hyper-gradient descent framework and propose a method that adaptively learns the optimizer parameters by combining adaptive information at different levels. In addition, we show the relationship between regularizing over-parameterized learning rates and building combinations of adaptive learning rates at different levels. Moreover, two heuristics are introduced to guarantee the convergence of the proposed optimizers. The experiments on several network architectures, including feed-forward networks, LeNet-5 and ResNet-18/34, show that the proposed multi-level adaptive approach can significantly outperform many baseline adaptive methods in a variety of circumstances.

Journal Article

Share this book

Add to My Shelf

Fast Rates for Support Vector Machines Using Gaussian Kernels

by Scovel, Clint , Steinwart, Ingo in 41A46 , 41A99 , 62G20

2007

For binary classification we establish learning rates up to the order of n⁻¹ for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptions on the considered distributions: Tsybakov's noise assumption to establish a small estimation error, and a new geometric noise condition which is used to bound the approximation error. Unlike previously proposed concepts for bounding the approximation error, the geometric noise assumption does not employ any smoothness assumption.

Journal Article

Share this book

Add to My Shelf

Dynamics Learning Rate Bias in Pigeons: Insights from Reinforcement Learning and Neural Correlates

by Li, Mengmeng , Shang, Zhigang , Yang, Lifang in Animals , artificial intelligence , asymmetric learning rate

2024

Research in reinforcement learning indicates that animals respond differently to positive and negative reward prediction errors, which can be calculated by assuming learning rate bias. Many studies have shown that humans and other animals have learning rate bias during learning, but it is unclear whether and how the bias changes throughout the entire learning process. Here, we recorded the behavior data and the local field potentials (LFPs) in the striatum of five pigeons performing a probabilistic learning task. Reinforcement learning models with and without learning rate biases were used to dynamically fit the pigeons’ choice behavior and estimate the option values. Furthemore, the correlation between the striatal LFPs power and the model-estimated option values was explored. We found that the pigeons’ learning rate bias shifted from negative to positive during the learning process, and the striatal Gamma (31 to 80 Hz) power correlated with the option values modulated by dynamic learning rate bias. In conclusion, our results support the hypothesis that pigeons employ a dynamic learning strategy in the learning process from both behavioral and neural aspects, providing valuable insights into reinforcement learning mechanisms of non-human animals.

Journal Article

Share this book

Add to My Shelf

Tomato Crop Disease Classification Using Semantic Segmentation Algorithm in Deep Learning

by Ramesh Babu Padamata , Atluri, Sri Krishna in Accuracy , Agriculture , Algorithms

2023

Agriculture is critical to human survival. Almost 70% of the population is involved in agriculture, either directly or indirectly. There are no technologies in the old system to identify diseases in diverse crops in an agricultural environment, which is why farmers are not interested in expanding their agricultural productivity all days. Crop diseases control the growth and production of their particular species; hence early detection is also essential. There have been many attempts to use Machine Learning (ML) methods for disease detection and classification in agriculture, but recent advances in a subset of ML called Deep Learning (DL) have given this field of study renewed hope for improved accuracy. The widespread spread of diseases in the tomato crop has an impact on both the quality and quantity of the crop. A rapid, dependable, and non-destructive way of diagnosing Tomato diseases early on may be useful for farmers. The approach employs two deep learning based algorithms, the AlexNet and the SegNet Model, with input including seven different color images of tomato leaves, six of which are afflicted and one of which is healthy. This algorithm is applicable for other plants like potato, corn diseases of bacterial, fungul infection leaves. Some examples of hyperparameters that have been investigated for their effect on classification accuracy and execution time are mini-batch size, weights, and bias learning rate.

Journal Article

Share this book

Add to My Shelf

NONPARAMETRIC STOCHASTIC APPROXIMATION WITH LARGE STEP-SIZES

by Dieuleveut, Aymeric , Bach, Francis in Approximation , Covariance , Eigenvalues

2016

We consider the random-design least-squares regression problem within the reproducing kernel Hubert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS ℋ, even if the optimal predictor (i.e., the conditional expectation) is not in ℋ. In a stochastic approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of stochastic gradient descent), given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in ℋ. Our results apply as well in the usual finite-dimensional setting of parametric least-squares regression, showing adaptivity of our estimator to the spectral decay of the covariance matrix of the covariates.

Journal Article

Share this book

Add to My Shelf

Why Copy Others? Insights from the Social Learning Strategies Tournament

by Laland, K.N , Cownden, D , Ghirlanda, S in Behavioral sciences , Biological and medical sciences , computers

2010

Social learning (learning through observation or interaction with other individuals) is widespread in nature and is central to the remarkable success of humanity, yet it remains unclear why copying is profitable and how to copy most effectively. To address these questions, we organized a computer tournament in which entrants submitted strategies specifying how to use social learning and its asocial alternative (for example, trial-and-error learning) to acquire adaptive behavior in a complex environment. Most current theory predicts the emergence of mixed strategies that rely on some combination of the two types of learning. In the tournament, however, strategies that relied heavily on social learning were found to be remarkably successful, even when asocial information was no more costly than social information. Social learning proved advantageous because individuals frequently demonstrated the highest-payoff behavior in their repertoire, inadvertently filtering information for copiers. The winning strategy (discountmachine) relied nearly exclusively on social learning and weighted information according to the time since acquisition.

Journal Article

Share this book

Add to My Shelf

ADAPTIVE LEARNING RATES FOR SUPPORT VECTOR MACHINES WORKING ON DATA WITH LOW INTRINSIC DIMENSION

by Steinwart, Ingo , Hamm, Thomas in Adaptive learning , Classification , Learning

2021

We derive improved regression and classification rates for support vector machines using Gaussian kernels under the assumption that the data has some low-dimensional intrinsic structure that is described by the box-counting dimension. Under some standard regularity assumptions for regression and classification, we prove learning rates, in which the dimension of the ambient space is replaced by the box-counting dimension of the support of the data generating distribution. In the regression case, our rates are in some cases minimax optimal up to logarithmic factors, whereas in the classification case our rates are minimax optimal up to logarithmic factors in a certain range of our assumptions and otherwise of the form of the best known rates. Furthermore, we show that a training validation approach for choosing the hyperparameters of a SVM in a data dependent way achieves the same rates adaptively, that is, without any knowledge on the data generating distribution.

Journal Article

Share this book

Add to My Shelf

Learning-Based Rate Control for High Efficiency Video Coding

by Chen, Sovann , Miyanaga, Yoshikazu , Aramvith, Supavadee in Algorithms , Coding standards , Control algorithms

2023

High efficiency video coding (HEVC) has dramatically enhanced coding efficiency compared to the previous video coding standard, H.264/AVC. However, the existing rate control updates its parameters according to a fixed initialization, which can cause errors in the prediction of bit allocation to each coding tree unit (CTU) in frames. This paper proposes a learning-based mapping method between rate control parameters and video contents to achieve an accurate target bit rate and good video quality. The proposed framework contains two main structural codings, including spatial and temporal coding. We initiate an effective learning-based particle swarm optimization for spatial and temporal coding to determine the optimal parameters at the CTU level. For temporal coding at the picture level, we introduce semantic residual information into the parameter updating process to regulate the bit correctly on the actual picture. Experimental results indicate that the proposed algorithm is effective for HEVC and outperforms the state-of-the-art rate control in the HEVC reference software (HM-16.10) by 0.19 dB on average and up to 0.41 dB for low-delay P coding structure.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter