Catalogue Search | MBRL

Lower bounds for finding stationary points I

by Duchi, John C , Carmon, Yair , Sidford, Aaron in Algorithms , Complexity , Convex analysis

2020

We prove lower bounds on the complexity of finding ϵ-stationary points (points x such that ‖∇f(x)‖≤ϵ) of smooth, high-dimensional, and potentially non-convex functions f. We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of f at a query point x. We show that for any (potentially randomized) algorithm A, there exists a function f with Lipschitz pth order derivatives such that A requires at least ϵ-(p+1)/p queries to find an ϵ-stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newton’s method, and generalized pth order regularization are worst-case optimal within their natural function classes.

Journal Article

Share this book

Add to My Shelf

Lower bounds for finding stationary points II: first-order methods

by Duchi, John C , Carmon, Yair , Sidford, Aaron in Convergence , Convexity , Economic models

2021

We establish lower bounds on the complexity of finding ϵ-stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in ϵ better than ϵ-8/5, which is within ϵ-1/15log1ϵ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove that no deterministic first-order method can achieve convergence rates better than ϵ-12/7, while ϵ-2 is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves a better rate, showing that finding stationary points is easier given convexity.

Journal Article

Share this book

Add to My Shelf

Principled Algorithms for Finding Local Minima

by Hinder, Oliver in Computer science

2019

Convex optimization is the cornerstone of continuous optimization, but many real problems arising in machine learning and operations research are nonconvex. This two part thesis explores my work developing principled algorithms for finding local minima of nonconvex functions.Part I: The complexity of finding stationary points of nonconvex functions. For a long time, it was known that gradient descent achieved an ε-2 rate for finding stationary points of unconstrained nonconvex functions, but better rates for first-order methods were unknown. We show that, using an algorithm that judiciously utilizes Nesterov's accelerated gradient descent, it is possible to improve this rate to ε-7/4 rate for functions with Lipschitz first and second derivatives. Adding Lipschitz third derivatives improves this rate to ε-5/3. Moreover, we provide almost matching lower bounds to prove that (i) finding a stationary point is easier for convex functions and (ii) acceleration in nonconvex optimization requires assumptions beyond smoothness.Part II: Interior point methods for nonconvex optimization. The second part of this thesis focuses on an interior point method (IPM) for optimization problems with nonconvex constraints. The work of Wachter and Biegler suggests that infeasible-start interior point methods (IPMs) developed for linear programming cannot be adapted to nonlinear optimization without significant modification, i.e., using a two-phase or penalty method. We propose an IPM that, by careful initialization and updates of the slack variables, is guaranteed to find a first-order certificate of local infeasibility, local optimality or unboundedness of the (shifted) feasible region. Our proposed algorithm differs from other IPM methods for nonconvex programming because we reduce primal feasibility at the same rate as the barrier parameter. This gives an algorithm with more robust convergence properties and closely resembles successful algorithms from linear programming. Comparisons with IPOPT on a subset of CUTEst problems indicate less frequent failures and superior performance for detecting infeasibility. Finally, we develop a highly simplified version of our IPM method with nonconvex constraints that obtains an ε-scaled KKT point in ε-7/4 iterations. This provides the first proof that IPMs with nonconvex constraints have a polynomial runtime in 1/ε.

Dissertation

Share this book

Add to My Shelf

Worst-case analysis of restarted primal-dual hybrid gradient on totally unimodular linear programs

by Hinder, Oliver in Constraints , Linear programming , Optimization

2024

We analyze restarted PDHG on totally unimodular linear programs. In particular, we show that restarted PDHG finds an \\(\\epsilon\\)-optimal solution in \\(O( H m_1^{2.5} \\sqrt{\\textbf{nnz}(A)} \\log(H m_2 /\\epsilon) )\\) matrix-vector multiplies where \\(m_1\\) is the number of constraints, \\(m_2\\) the number of variables, \\(\\textbf{nnz}(A)\\) is the number of nonzeros in the constraint matrix, \\(H\\) is the largest absolute coefficient in the right hand side or objective vector, and \\(\\epsilon\\) is the distance to optimality of the outputted solution.

Paper

Share this book

Add to My Shelf

Worst-case analysis of restarted primal-dual hybrid gradient on totally unimodular linear programs

by Hinder, Oliver in Convergence , Hierarchies , Linear programming

2023

Recently, there has been increasing interest in using matrix-free methods to solve linear programs due to their ability to attack extreme-scale problems. Each iteration of these methods is a matrix-vector product, so they benefit from a low memory footprint and are parallelized easily. Restarted primal-dual hybrid gradient (PDHG) is a matrix-free method with good practical performance. Prior analysis showed that it converges linearly on linear programs where the linear convergence constant depends on the Hoffman constant of the KKT system. We refine this analysis of restarted PDHG for totally unimodular linear programs. In particular, we show that restarted PDHG finds an \\(\\epsilon\\)-optimal solution in \\(O( H m_1^{2.5} \\sqrt{\\textbf{nnz}(A)} \\log(H m_2 /\\epsilon) )\\) matrix-vector multiplies where \\(m_1\\) is the number of constraints, \\(m_2\\) the number of variables, \\(\\textbf{nnz}(A)\\) is the number of nonzeros in the constraint matrix, \\(H\\) is the largest absolute coefficient in the right hand side or objective vector, and \\(\\epsilon\\) is the distance to optimality of the point outputted by PDHG.

Paper

Share this book

Add to My Shelf

Value Creation in the Pharmaceutical Industry

by Alexander Schuhmacher, Markus Hinder, Oliver Gassmann in Pharmaceutical industry , Pharmaceutical industry--Technological innovations , SCIENCE

2016

This practical guide for advanced students and decision-makers in the pharma and biotech industry presents key success factors in R&D along with value creators in pharmaceutical innovation. A team of editors and authors with extensive experience in academia and industry and at some of the most prestigious business schools in Europe discusses in detail the innovation process in pharma as well as common and new research and innovation strategies. In doing so, they cover collaboration and partnerships, open innovation, biopharmaceuticals, translational medicine, good manufacturing practice, regulatory affairs, and portfolio management. Each chapter covers controversial aspects of recent developments in the pharmaceutical industry, with the aim of stimulating productive debates on the most effective and efficient innovation processes. A must-have for young professionals and MBA students preparing to enter R&D in pharma or biotech as well as for students on a combined BA/biomedical and natural sciences program.

eBook

Share this book

Add to My Shelf

A simple and practical adaptive trust-region method

by Hinder, Oliver , Hamad, Fadi in Arc cutting , Optimization , Regularization

2025

We present an adaptive trust-region method for unconstrained optimization that allows inexact solutions to the trust-region subproblems. Our method is a simple variant of the classical trust-region method of \\citet{sorensen1982newton}. The method achieves the best possible convergence bound up to an additive log factor, for finding an \\(\\epsilon\\)-approximate stationary point, i.e., \\(O( \\Delta_f L^{1/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iterations where \\(L\\) is the Lipschitz constant of the Hessian, \\(\\Delta_f\\) is the optimality gap, and \\(\\epsilon\\) is the termination tolerance for the gradient norm. This improves over existing trust-region methods whose worst-case bound is at least a factor of \\(L\\) worse. We compare our performance with state-of-the-art trust-region (TRU) and cubic regularization (ARC) methods from the GALAHAD library on the CUTEst benchmark set on problems with more than 100 variables. We use fewer function, gradient, and Hessian evaluations than these methods. For instance, our algorithm's median number of gradient evaluations is \\(23\\) compared to \\(36\\) for TRU and \\(29\\) for ARC. Compared to the conference version of this paper \\cite{hamad2022consistently}, our revised method includes several practical enhancements. These modifications dramatically improved performance, including an order of magnitude reduction in the shifted geometric mean of wall-clock times. We also show it suffices for the second derivatives to be locally Lipschitz to guarantee that either the minimum gradient norm converges to zero or the objective value tends towards negative infinity, even when the iterates diverge.

Paper

Share this book

Add to My Shelf

A simple and practical adaptive trust-region method

by Hinder, Oliver , Hamad, Fadi in Algorithms , Arc cutting , Regularization

2024

We present an adaptive trust-region method for unconstrained optimization that allows inexact solutions to the trust-region subproblems. Our method is a simple variant of the classical trust-region method of Sorensen [1]. The method achieves the best possible convergence bound up to an additive log factor, for finding an \\(\\epsilon\\)-approximate stationary point, i.e., \\(O( \\Delta_f L^{1/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iterations where \\(L\\) is the Lipschitz constant of the Hessian, \\(\\Delta_f\\) is the optimality gap, and \\(\\epsilon\\) is the termination tolerance for the gradient norm. This improves over existing trust-region methods whose worst-case bound is at least a factor of \\(L\\) worse. We compare our performance with state-of-the-art trust-region (TRU) and cubic regularization (ARC) methods from the GALAHAD library on the CUTEst benchmark set on problems with more than 100 variables. We use fewer function, gradient, and Hessian evaluations than these methods. For instance, our algorithm's median number of gradient evaluations is \\(23\\) compared to \\(36\\) for TRU and \\(29\\) for ARC. Compared to the conference version of this paper [2], our revised method includes practical enhancements and a refined subproblems termination criterion. These modifications dramatically improved performance, including an order of magnitude reduction in the shifted geometric mean of wall-clock times.

Paper

Share this book

Add to My Shelf

A consistently adaptive trust-region method

by Hinder, Oliver , Hamad, Fadi

2024

Adaptive trust-region methods attempt to maintain strong convergence guarantees without depending on conservative estimates of problem properties such as Lipschitz constants. However, on close inspection, one can show existing adaptive trust-region methods have theoretical guarantees with severely suboptimal dependence on problem properties such as the Lipschitz constant of the Hessian. For example, TRACE developed by Curtis et al. obtains a \\(O(\\Delta_f L^{3/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iteration bound where \\(L\\) is the Lipschitz constant of the Hessian. Compared with the optimal \\(O(\\Delta_f L^{1/2} \\epsilon^{-3/2})\\) bound this is suboptimal with respect to \\(L\\). We present the first adaptive trust-region method which circumvents this issue and requires at most \\(O( \\Delta_f L^{1/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iterations to find an \\(\\epsilon\\)-approximate stationary point, matching the optimal iteration bound up to an additive logarithmic term. Our method is a simple variant of a classic trust-region method and in our experiments performs competitively with both ARC and a classical trust-region method.

Paper

Share this book

Add to My Shelf

The Price of Adaptivity in Stochastic Convex Optimization

by Carmon, Yair , Hinder, Oliver in Convexity , Logarithms , Lower bounds

2024

We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a \"price of adaptivity\" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and double-logarithmic for median suboptimality. When there is uncertainty in both distance and gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch. En route, we also establish tight upper and lower bounds for (known-parameter) high-probability stochastic convex optimization with heavy-tailed and bounded noise, respectively.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter