Catalogue Search | MBRL

Lower bounds for finding stationary points I

by Duchi, John C , Carmon, Yair , Sidford, Aaron in Algorithms , Complexity , Convex analysis

2020

We prove lower bounds on the complexity of finding ϵ-stationary points (points x such that ‖∇f(x)‖≤ϵ) of smooth, high-dimensional, and potentially non-convex functions f. We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of f at a query point x. We show that for any (potentially randomized) algorithm A, there exists a function f with Lipschitz pth order derivatives such that A requires at least ϵ-(p+1)/p queries to find an ϵ-stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newton’s method, and generalized pth order regularization are worst-case optimal within their natural function classes.

Journal Article

Share this book

Add to My Shelf

Lower bounds for finding stationary points II: first-order methods

by Duchi, John C , Carmon, Yair , Sidford, Aaron in Convergence , Convexity , Economic models

2021

We establish lower bounds on the complexity of finding ϵ-stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in ϵ better than ϵ-8/5, which is within ϵ-1/15log1ϵ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove that no deterministic first-order method can achieve convergence rates better than ϵ-12/7, while ϵ-2 is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves a better rate, showing that finding stationary points is easier given convexity.

Journal Article

Share this book

Add to My Shelf

Approaching Optimality for Solving Dense Linear Systems with Low-Rank Structure

by Dereziński, Michał , Sidford, Aaron in Algorithms , Linear systems , Matrices (mathematics)

2025

We provide new high-accuracy randomized algorithms for solving linear systems and regression problems that are well-conditioned except for \\(k\\) large singular values. For solving such \\(d \\times d\\) positive definite system our algorithms succeed whp. and run in time \\(\\tilde O(d^2 + k^\\omega)\\). For solving such regression problems in a matrix \\(\\mathbf{A} \\in \\mathbb{R}^{n \\times d}\\) our methods succeed whp. and run in time \\(\\tilde O(\\mathrm{nnz}(\\mathbf{A}) + d^2 + k^\\omega)\\) where \\(\\omega\\) is the matrix multiplication exponent and \\(\\mathrm{nnz}(\\mathbf{A})\\) is the number of non-zeros in \\(\\mathbf{A}\\). Our methods nearly-match a natural complexity limit under dense inputs for these problems and improve upon a trade-off in prior approaches that obtain running times of either \\(\\tilde O(d^{2.065}+k^\\omega)\\) or \\(\\tilde O(d^2 + dk^{\\omega-1})\\) for \\(d\\times d\\) systems. Moreover, we show how to obtain these running times even under the weaker assumption that all but \\(k\\) of the singular values have a suitably bounded generalized mean. Consequently, we give the first nearly-linear time algorithm for computing a multiplicative approximation to the nuclear norm of an arbitrary dense matrix. Our algorithms are built on three general recursive preconditioning frameworks, where matrix sketching and low-rank update formulas are carefully tailored to the problems' structure.

Paper

Share this book

Add to My Shelf

Quantum speedups for stochastic optimization

by Sidford, Aaron , Zhang, Chenyi in Algorithms , Continuity (mathematics) , Critical point

2024

We consider the problem of minimizing a continuous function given quantum access to a stochastic gradient oracle. We provide two new methods for the special case of minimizing a Lipschitz convex function. Each method obtains a dimension versus accuracy trade-off which is provably unachievable classically and we prove that one method is asymptotically optimal in low-dimensional settings. Additionally, we provide quantum algorithms for computing a critical point of a smooth non-convex function at rates not known to be achievable classically. To obtain these results we build upon the quantum multivariate mean estimation result of Cornelissen et al. 2022 and provide a general quantum-variance reduction technique of independent interest.

Paper

Share this book

Add to My Shelf

Ultrasparse Ultrasparsifiers and Faster Laplacian System Solvers

by Jambulapati, Arun , Sidford, Aaron in Algorithms , Graphs , Solvers

2023

In this paper we provide an \\(O(m (\\log \\log n)^{O(1)} \\log(1/\\epsilon))\\)-expected time algorithm for solving Laplacian systems on \\(n\\)-node \\(m\\)-edge graphs, improving improving upon the previous best expected runtime of \\(O(m \\sqrt{\\log n} (\\log \\log n)^{O(1)} \\log(1/\\epsilon))\\) achieved by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result we provide efficient constructions of \\(\\ell_p\\)-stretch graph approximations with improved stretch and sparsity bounds. Additionally, as motivation for this work, we show that for every set of vectors in \\(\\mathbb{R}^d\\) (not just those induced by graphs) and all \\(k > 1\\) there exist ultrasparsifiers with \\(d-1 + O(d/\\sqrt{k})\\) re-weighted vectors of relative condition number at most \\(k\\). For small \\(k\\), this improves upon the previous best known relative condition number of \\(\\tilde{O}(\\sqrt{k \\log d})\\), which is only known for the graph case.

Paper

Share this book

Add to My Shelf

Solving Matrix Games with Near-Optimal Matvec Complexity

by Karmarkar, Ishani , O'Carroll, Liam , Sidford, Aaron in Game theory , Games , Lower bounds

2026

We study the problem of computing an \\(\\epsilon\\)-approximate Nash equilibrium of a two-player, bilinear game with a bounded payoff matrix \\(A \\in \\mathbb{R}^{m \\times n}\\), when the players' strategies are constrained to lie in simple sets. We provide algorithms which solve this problem in \\(\\tilde{O}(\\epsilon^{-2/3})\\) matrix-vector multiplies (matvecs) in two well-studied cases: \\(\\ell_1\\)-\\(\\ell_1\\) (or zero-sum) games, where the players' strategies are both in the probability simplex, and \\(\\ell_2\\)-\\(\\ell_1\\) games (encompassing hard-margin SVMs), where the players' strategies are in the unit Euclidean ball and probability simplex respectively. These results improve upon the previous state-of-the-art complexities of \\(\\tilde{O}(\\epsilon^{-8/9})\\) for \\(\\ell_1\\)-\\(\\ell_1\\) and \\(\\tilde{O}(\\epsilon^{-7/9})\\) for \\(\\ell_2\\)-\\(\\ell_1\\) due to [KOS '25]. In both settings our results are nearly-optimal as they match lower bounds of [KS '25] up to polylogarithmic factors.

Paper

Share this book

Add to My Shelf

Solving Zero-Sum Games with Fewer Matrix-Vector Products

by Karmarkar, Ishani , O'Carroll, Liam , Sidford, Aaron in Computation , Game theory , Support vector machines

2025

In this paper we consider the problem of computing an \\(\\epsilon\\)-approximate Nash Equilibrium of a zero-sum game in a payoff matrix \\(A \\in \\mathbb{R}^{m \\times n}\\) with \\(O(1)\\)-bounded entries given access to a matrix-vector product oracle for \\(A\\) and its transpose \\(A^\\top\\). We provide a deterministic algorithm that solves the problem using \\(\\tilde{O}(\\epsilon^{-8/9})\\)-oracle queries, where \\(\\tilde{O}(\\cdot)\\) hides factors polylogarithmic in \\(m\\), \\(n\\), and \\(\\epsilon^{-1}\\). Our result improves upon the state-of-the-art query complexity of \\(\\tilde{O}(\\epsilon^{-1})\\) established by [Nemirovski, 2004] and [Nesterov, 2005]. We obtain this result through a general framework that yields improved deterministic query complexities for solving a broader class of minimax optimization problems which includes computing a linear classifier (hard-margin support vector machine) as well as linear regression.

Paper

Share this book

Add to My Shelf

Accelerated Approximate Optimization of Multi-Commodity Flows on Directed Graphs

by Graur, Andrei , Sidford, Aaron , Chen, Li in Accuracy , Algorithms , Commodities

2025

We provide \\(m^{1+o(1)}k\\epsilon^{-1}\\)-time algorithms for computing multiplicative \\((1 - \\epsilon)\\)-approximate solutions to multi-commodity flow problems with \\(k\\)-commodities on \\(m\\)-edge directed graphs, including concurrent multi-commodity flow and maximum multi-commodity flow. To obtain our results, we provide new optimization tools of potential independent interest. First, we provide an improved optimization method for solving \\(\\ell_{q, p}\\)-regression problems to high accuracy. This method makes \\(\\tilde{O}_{q, p}(k)\\) queries to a high accuracy convex minimization oracle for an individual block, where \\(\\tilde{O}_{q, p}(\\cdot)\\) hides factors depending only on \\(q\\), \\(p\\), or \\(\\mathrm{poly}(\\log m)\\), improving upon the \\(\\tilde{O}_{q, p}(k^2)\\) bound of [Chen-Ye, ICALP 2024]. As a result, we obtain the first almost-linear time algorithm that solves \\(\\ell_{q, p}\\) flows on directed graphs to high accuracy. Second, we present optimization tools to reduce approximately solving composite \\(\\ell_{1, \\infty}\\)-regression problems to solving \\(m^{o(1)}\\epsilon^{-1}\\) instances of composite \\(\\ell_{q, p}\\)-regression problem. The method builds upon recent advances in solving box-simplex games [Jambulapati-Tian, NeurIPS 2023] and the area convex regularizer introduced in [Sherman, STOC 2017] to obtain faster rates for constrained versions of the problem. Carefully combining these techniques yields our directed multi-commodity flow algorithm.

Paper

Share this book

Add to My Shelf

Entropy Regularization and Faster Decremental Matching in General Graphs

by Chen, Jiale , Sidford, Aaron , Ta-Wei, Tu in Algorithms , Entropy , Graph theory

2024

We provide an algorithm that maintains, against an adaptive adversary, a \\((1-\\varepsilon)\\)-approximate maximum matching in \\(n\\)-node \\(m\\)-edge general (not necessarily bipartite) undirected graph undergoing edge deletions with high probability with (amortized) \\(O(\\mathrm{poly}(\\varepsilon^{-1}, \\log n))\\) time per update. We also obtain the same update time for maintaining a fractional approximate weighted matching (and hence an approximation to the value of the maximum weight matching) and an integral approximate weighted matching in dense graphs. Our unweighted result improves upon the prior state-of-the-art which includes a \\(\\mathrm{poly}(\\log{n}) \\cdot 2^{O(1/\\varepsilon^2)}\\) update time [Assadi-Bernstein-Dudeja 2022] and an \\(O(\\sqrt{m} \\varepsilon^{-2})\\) update time [Gupta-Peng 2013], and our weighted result improves upon the \\(O(\\sqrt{m}\\varepsilon^{-O(1/\\varepsilon)}\\log{n})\\) update time due to [Gupta-Peng 2013]. To obtain our results, we generalize a recent optimization approach to dynamic algorithms from [Jambulapati-Jin-Sidford-Tian 2022]. We show that repeatedly solving entropy-regularized optimization problems yields a lazy updating scheme for fractional decremental problems with a near-optimal number of updates. To apply this framework we develop optimization methods compatible with it and new dynamic rounding algorithms for the matching polytope.

Paper

Share this book

Add to My Shelf

Stability of the Lanczos Method for Matrix Function Approximation

by Musco, Christopher , Sidford, Aaron , Musco, Cameron in Approximation , Arithmetic , Chebyshev approximation

2024

The ubiquitous Lanczos method can approximate \\(f(A)x\\) for any symmetric \\(n \\times n\\) matrix \\(A\\), vector \\(x\\), and function \\(f\\). In exact arithmetic, the method's error after \\(k\\) iterations is bounded by the error of the best degree-\\(k\\) polynomial uniformly approximating \\(f(x)\\) on the range \\([\\lambda_{min}(A), \\lambda_{max}(A)]\\). However, despite decades of work, it has been unclear if this powerful guarantee holds in finite precision. We resolve this problem, proving that when \\(\\max_{x \\in [\\lambda_{min}, \\lambda_{max}]}|f(x)| \\le C\\), Lanczos essentially matches the exact arithmetic guarantee if computations use roughly \\(\\log(nC\\|A\\|)\\) bits of precision. Our proof extends work of Druskin and Knizhnerman [DK91], leveraging the stability of the classic Chebyshev recurrence to bound the stability of any polynomial approximating \\(f(x)\\). We also study the special case of \\(f(A) = A^{-1}\\), where stronger guarantees hold. In exact arithmetic Lanczos performs as well as the best polynomial approximating \\(1/x\\) at each of \\(A\\)'s eigenvalues, rather than on the full eigenvalue range. In seminal work, Greenbaum gives an approach to extending this bound to finite precision: she proves that finite precision Lanczos and the related CG method match any polynomial approximating \\(1/x\\) in a tiny range around each eigenvalue [Gre89]. For \\(A^{-1}\\), this bound appears stronger than ours. However, we exhibit matrices with condition number \\(\\kappa\\) where exact arithmetic Lanczos converges in \\(polylog(\\kappa)\\) iterations, but Greenbaum's bound predicts \\(\\Omega(\\kappa^{1/5})\\) iterations. It thus cannot offer significant improvement over the \\(O(\\kappa^{1/2})\\) bound achievable via our result. Our analysis raises the question of if convergence in less than \\(poly(\\kappa)\\) iterations can be expected in finite precision, even for matrices with clustered, skewed, or otherwise favorable eigenvalue distributions.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter