Asset Details

MbrlCatalogueTitleDetail

Paper

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Yang, Greg,

Littwin, Etai

2023

Overview

Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of \"kernel.\" We derive the corresponding \"neural tangent\" and \"maximal update\" limits for any architecture. Two foundational advances underlie the above results: 1) A new Tensor Program language, NEXORT, that can express how adaptive optimizers process gradients into updates. 2) The introduction of bra-ket notation to drastically simplify expressions and calculations in Tensor Programs. This work summarizes and generalizes all previous results in the Tensor Programs series of papers.

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Kernels

/ Mathematical analysis

/ Neural networks

/ Optimization

/ Tensors