Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
134 result(s) for "Sidford, Aaron"
Sort by:
Lower bounds for finding stationary points I
We prove lower bounds on the complexity of finding ϵ-stationary points (points x such that ‖∇f(x)‖≤ϵ) of smooth, high-dimensional, and potentially non-convex functions f. We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of f at a query point x. We show that for any (potentially randomized) algorithm A, there exists a function f with Lipschitz pth order derivatives such that A requires at least ϵ-(p+1)/p queries to find an ϵ-stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newton’s method, and generalized pth order regularization are worst-case optimal within their natural function classes.
Lower bounds for finding stationary points II: first-order methods
We establish lower bounds on the complexity of finding ϵ-stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in ϵ better than ϵ-8/5, which is within ϵ-1/15log1ϵ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove that no deterministic first-order method can achieve convergence rates better than ϵ-12/7, while ϵ-2 is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves a better rate, showing that finding stationary points is easier given convexity.
Approaching Optimality for Solving Dense Linear Systems with Low-Rank Structure
We provide new high-accuracy randomized algorithms for solving linear systems and regression problems that are well-conditioned except for \\(k\\) large singular values. For solving such \\(d \\times d\\) positive definite system our algorithms succeed whp. and run in time \\(\\tilde O(d^2 + k^\\omega)\\). For solving such regression problems in a matrix \\(\\mathbf{A} \\in \\mathbb{R}^{n \\times d}\\) our methods succeed whp. and run in time \\(\\tilde O(\\mathrm{nnz}(\\mathbf{A}) + d^2 + k^\\omega)\\) where \\(\\omega\\) is the matrix multiplication exponent and \\(\\mathrm{nnz}(\\mathbf{A})\\) is the number of non-zeros in \\(\\mathbf{A}\\). Our methods nearly-match a natural complexity limit under dense inputs for these problems and improve upon a trade-off in prior approaches that obtain running times of either \\(\\tilde O(d^{2.065}+k^\\omega)\\) or \\(\\tilde O(d^2 + dk^{\\omega-1})\\) for \\(d\\times d\\) systems. Moreover, we show how to obtain these running times even under the weaker assumption that all but \\(k\\) of the singular values have a suitably bounded generalized mean. Consequently, we give the first nearly-linear time algorithm for computing a multiplicative approximation to the nuclear norm of an arbitrary dense matrix. Our algorithms are built on three general recursive preconditioning frameworks, where matrix sketching and low-rank update formulas are carefully tailored to the problems' structure.
Quantum speedups for stochastic optimization
We consider the problem of minimizing a continuous function given quantum access to a stochastic gradient oracle. We provide two new methods for the special case of minimizing a Lipschitz convex function. Each method obtains a dimension versus accuracy trade-off which is provably unachievable classically and we prove that one method is asymptotically optimal in low-dimensional settings. Additionally, we provide quantum algorithms for computing a critical point of a smooth non-convex function at rates not known to be achievable classically. To obtain these results we build upon the quantum multivariate mean estimation result of Cornelissen et al. 2022 and provide a general quantum-variance reduction technique of independent interest.
Ultrasparse Ultrasparsifiers and Faster Laplacian System Solvers
In this paper we provide an \\(O(m (\\log \\log n)^{O(1)} \\log(1/\\epsilon))\\)-expected time algorithm for solving Laplacian systems on \\(n\\)-node \\(m\\)-edge graphs, improving improving upon the previous best expected runtime of \\(O(m \\sqrt{\\log n} (\\log \\log n)^{O(1)} \\log(1/\\epsilon))\\) achieved by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result we provide efficient constructions of \\(\\ell_p\\)-stretch graph approximations with improved stretch and sparsity bounds. Additionally, as motivation for this work, we show that for every set of vectors in \\(\\mathbb{R}^d\\) (not just those induced by graphs) and all \\(k > 1\\) there exist ultrasparsifiers with \\(d-1 + O(d/\\sqrt{k})\\) re-weighted vectors of relative condition number at most \\(k\\). For small \\(k\\), this improves upon the previous best known relative condition number of \\(\\tilde{O}(\\sqrt{k \\log d})\\), which is only known for the graph case.
Solving Matrix Games with Near-Optimal Matvec Complexity
We study the problem of computing an \\(\\epsilon\\)-approximate Nash equilibrium of a two-player, bilinear game with a bounded payoff matrix \\(A \\in \\mathbb{R}^{m \\times n}\\), when the players' strategies are constrained to lie in simple sets. We provide algorithms which solve this problem in \\(\\tilde{O}(\\epsilon^{-2/3})\\) matrix-vector multiplies (matvecs) in two well-studied cases: \\(\\ell_1\\)-\\(\\ell_1\\) (or zero-sum) games, where the players' strategies are both in the probability simplex, and \\(\\ell_2\\)-\\(\\ell_1\\) games (encompassing hard-margin SVMs), where the players' strategies are in the unit Euclidean ball and probability simplex respectively. These results improve upon the previous state-of-the-art complexities of \\(\\tilde{O}(\\epsilon^{-8/9})\\) for \\(\\ell_1\\)-\\(\\ell_1\\) and \\(\\tilde{O}(\\epsilon^{-7/9})\\) for \\(\\ell_2\\)-\\(\\ell_1\\) due to [KOS '25]. In both settings our results are nearly-optimal as they match lower bounds of [KS '25] up to polylogarithmic factors.
Solving Zero-Sum Games with Fewer Matrix-Vector Products
In this paper we consider the problem of computing an \\(\\epsilon\\)-approximate Nash Equilibrium of a zero-sum game in a payoff matrix \\(A \\in \\mathbb{R}^{m \\times n}\\) with \\(O(1)\\)-bounded entries given access to a matrix-vector product oracle for \\(A\\) and its transpose \\(A^\\top\\). We provide a deterministic algorithm that solves the problem using \\(\\tilde{O}(\\epsilon^{-8/9})\\)-oracle queries, where \\(\\tilde{O}(\\cdot)\\) hides factors polylogarithmic in \\(m\\), \\(n\\), and \\(\\epsilon^{-1}\\). Our result improves upon the state-of-the-art query complexity of \\(\\tilde{O}(\\epsilon^{-1})\\) established by [Nemirovski, 2004] and [Nesterov, 2005]. We obtain this result through a general framework that yields improved deterministic query complexities for solving a broader class of minimax optimization problems which includes computing a linear classifier (hard-margin support vector machine) as well as linear regression.
Accelerated Approximate Optimization of Multi-Commodity Flows on Directed Graphs
We provide \\(m^{1+o(1)}k\\epsilon^{-1}\\)-time algorithms for computing multiplicative \\((1 - \\epsilon)\\)-approximate solutions to multi-commodity flow problems with \\(k\\)-commodities on \\(m\\)-edge directed graphs, including concurrent multi-commodity flow and maximum multi-commodity flow. To obtain our results, we provide new optimization tools of potential independent interest. First, we provide an improved optimization method for solving \\(\\ell_{q, p}\\)-regression problems to high accuracy. This method makes \\(\\tilde{O}_{q, p}(k)\\) queries to a high accuracy convex minimization oracle for an individual block, where \\(\\tilde{O}_{q, p}(\\cdot)\\) hides factors depending only on \\(q\\), \\(p\\), or \\(\\mathrm{poly}(\\log m)\\), improving upon the \\(\\tilde{O}_{q, p}(k^2)\\) bound of [Chen-Ye, ICALP 2024]. As a result, we obtain the first almost-linear time algorithm that solves \\(\\ell_{q, p}\\) flows on directed graphs to high accuracy. Second, we present optimization tools to reduce approximately solving composite \\(\\ell_{1, \\infty}\\)-regression problems to solving \\(m^{o(1)}\\epsilon^{-1}\\) instances of composite \\(\\ell_{q, p}\\)-regression problem. The method builds upon recent advances in solving box-simplex games [Jambulapati-Tian, NeurIPS 2023] and the area convex regularizer introduced in [Sherman, STOC 2017] to obtain faster rates for constrained versions of the problem. Carefully combining these techniques yields our directed multi-commodity flow algorithm.
Entropy Regularization and Faster Decremental Matching in General Graphs
We provide an algorithm that maintains, against an adaptive adversary, a \\((1-\\varepsilon)\\)-approximate maximum matching in \\(n\\)-node \\(m\\)-edge general (not necessarily bipartite) undirected graph undergoing edge deletions with high probability with (amortized) \\(O(\\mathrm{poly}(\\varepsilon^{-1}, \\log n))\\) time per update. We also obtain the same update time for maintaining a fractional approximate weighted matching (and hence an approximation to the value of the maximum weight matching) and an integral approximate weighted matching in dense graphs. Our unweighted result improves upon the prior state-of-the-art which includes a \\(\\mathrm{poly}(\\log{n}) \\cdot 2^{O(1/\\varepsilon^2)}\\) update time [Assadi-Bernstein-Dudeja 2022] and an \\(O(\\sqrt{m} \\varepsilon^{-2})\\) update time [Gupta-Peng 2013], and our weighted result improves upon the \\(O(\\sqrt{m}\\varepsilon^{-O(1/\\varepsilon)}\\log{n})\\) update time due to [Gupta-Peng 2013]. To obtain our results, we generalize a recent optimization approach to dynamic algorithms from [Jambulapati-Jin-Sidford-Tian 2022]. We show that repeatedly solving entropy-regularized optimization problems yields a lazy updating scheme for fractional decremental problems with a near-optimal number of updates. To apply this framework we develop optimization methods compatible with it and new dynamic rounding algorithms for the matching polytope.
Stability of the Lanczos Method for Matrix Function Approximation
The ubiquitous Lanczos method can approximate \\(f(A)x\\) for any symmetric \\(n \\times n\\) matrix \\(A\\), vector \\(x\\), and function \\(f\\). In exact arithmetic, the method's error after \\(k\\) iterations is bounded by the error of the best degree-\\(k\\) polynomial uniformly approximating \\(f(x)\\) on the range \\([\\lambda_{min}(A), \\lambda_{max}(A)]\\). However, despite decades of work, it has been unclear if this powerful guarantee holds in finite precision. We resolve this problem, proving that when \\(\\max_{x \\in [\\lambda_{min}, \\lambda_{max}]}|f(x)| \\le C\\), Lanczos essentially matches the exact arithmetic guarantee if computations use roughly \\(\\log(nC\\|A\\|)\\) bits of precision. Our proof extends work of Druskin and Knizhnerman [DK91], leveraging the stability of the classic Chebyshev recurrence to bound the stability of any polynomial approximating \\(f(x)\\). We also study the special case of \\(f(A) = A^{-1}\\), where stronger guarantees hold. In exact arithmetic Lanczos performs as well as the best polynomial approximating \\(1/x\\) at each of \\(A\\)'s eigenvalues, rather than on the full eigenvalue range. In seminal work, Greenbaum gives an approach to extending this bound to finite precision: she proves that finite precision Lanczos and the related CG method match any polynomial approximating \\(1/x\\) in a tiny range around each eigenvalue [Gre89]. For \\(A^{-1}\\), this bound appears stronger than ours. However, we exhibit matrices with condition number \\(\\kappa\\) where exact arithmetic Lanczos converges in \\(polylog(\\kappa)\\) iterations, but Greenbaum's bound predicts \\(\\Omega(\\kappa^{1/5})\\) iterations. It thus cannot offer significant improvement over the \\(O(\\kappa^{1/2})\\) bound achievable via our result. Our analysis raises the question of if convergence in less than \\(poly(\\kappa)\\) iterations can be expected in finite precision, even for matrices with clustered, skewed, or otherwise favorable eigenvalue distributions.