Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
63
result(s) for
"Hinder, Oliver"
Sort by:
Lower bounds for finding stationary points I
2020
We prove lower bounds on the complexity of finding ϵ-stationary points (points x such that ‖∇f(x)‖≤ϵ) of smooth, high-dimensional, and potentially non-convex functions f. We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of f at a query point x. We show that for any (potentially randomized) algorithm A, there exists a function f with Lipschitz pth order derivatives such that A requires at least ϵ-(p+1)/p queries to find an ϵ-stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newton’s method, and generalized pth order regularization are worst-case optimal within their natural function classes.
Journal Article
Lower bounds for finding stationary points II: first-order methods
2021
We establish lower bounds on the complexity of finding ϵ-stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in ϵ better than ϵ-8/5, which is within ϵ-1/15log1ϵ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove that no deterministic first-order method can achieve convergence rates better than ϵ-12/7, while ϵ-2 is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves a better rate, showing that finding stationary points is easier given convexity.
Journal Article
Principled Algorithms for Finding Local Minima
2019
Convex optimization is the cornerstone of continuous optimization, but many real problems arising in machine learning and operations research are nonconvex. This two part thesis explores my work developing principled algorithms for finding local minima of nonconvex functions.Part I: The complexity of finding stationary points of nonconvex functions. For a long time, it was known that gradient descent achieved an ε-2 rate for finding stationary points of unconstrained nonconvex functions, but better rates for first-order methods were unknown. We show that, using an algorithm that judiciously utilizes Nesterov's accelerated gradient descent, it is possible to improve this rate to ε-7/4 rate for functions with Lipschitz first and second derivatives. Adding Lipschitz third derivatives improves this rate to ε-5/3. Moreover, we provide almost matching lower bounds to prove that (i) finding a stationary point is easier for convex functions and (ii) acceleration in nonconvex optimization requires assumptions beyond smoothness.Part II: Interior point methods for nonconvex optimization. The second part of this thesis focuses on an interior point method (IPM) for optimization problems with nonconvex constraints. The work of Wachter and Biegler suggests that infeasible-start interior point methods (IPMs) developed for linear programming cannot be adapted to nonlinear optimization without significant modification, i.e., using a two-phase or penalty method. We propose an IPM that, by careful initialization and updates of the slack variables, is guaranteed to find a first-order certificate of local infeasibility, local optimality or unboundedness of the (shifted) feasible region. Our proposed algorithm differs from other IPM methods for nonconvex programming because we reduce primal feasibility at the same rate as the barrier parameter. This gives an algorithm with more robust convergence properties and closely resembles successful algorithms from linear programming. Comparisons with IPOPT on a subset of CUTEst problems indicate less frequent failures and superior performance for detecting infeasibility. Finally, we develop a highly simplified version of our IPM method with nonconvex constraints that obtains an ε-scaled KKT point in ε-7/4 iterations. This provides the first proof that IPMs with nonconvex constraints have a polynomial runtime in 1/ε.
Dissertation
Worst-case analysis of restarted primal-dual hybrid gradient on totally unimodular linear programs
2024
We analyze restarted PDHG on totally unimodular linear programs. In particular, we show that restarted PDHG finds an \\(\\epsilon\\)-optimal solution in \\(O( H m_1^{2.5} \\sqrt{\\textbf{nnz}(A)} \\log(H m_2 /\\epsilon) )\\) matrix-vector multiplies where \\(m_1\\) is the number of constraints, \\(m_2\\) the number of variables, \\(\\textbf{nnz}(A)\\) is the number of nonzeros in the constraint matrix, \\(H\\) is the largest absolute coefficient in the right hand side or objective vector, and \\(\\epsilon\\) is the distance to optimality of the outputted solution.
Worst-case analysis of restarted primal-dual hybrid gradient on totally unimodular linear programs
2023
Recently, there has been increasing interest in using matrix-free methods to solve linear programs due to their ability to attack extreme-scale problems. Each iteration of these methods is a matrix-vector product, so they benefit from a low memory footprint and are parallelized easily. Restarted primal-dual hybrid gradient (PDHG) is a matrix-free method with good practical performance. Prior analysis showed that it converges linearly on linear programs where the linear convergence constant depends on the Hoffman constant of the KKT system. We refine this analysis of restarted PDHG for totally unimodular linear programs. In particular, we show that restarted PDHG finds an \\(\\epsilon\\)-optimal solution in \\(O( H m_1^{2.5} \\sqrt{\\textbf{nnz}(A)} \\log(H m_2 /\\epsilon) )\\) matrix-vector multiplies where \\(m_1\\) is the number of constraints, \\(m_2\\) the number of variables, \\(\\textbf{nnz}(A)\\) is the number of nonzeros in the constraint matrix, \\(H\\) is the largest absolute coefficient in the right hand side or objective vector, and \\(\\epsilon\\) is the distance to optimality of the point outputted by PDHG.
Value Creation in the Pharmaceutical Industry
by
Alexander Schuhmacher, Markus Hinder, Oliver Gassmann
in
Pharmaceutical industry
,
Pharmaceutical industry--Technological innovations
,
SCIENCE
2016
This practical guide for advanced students and decision-makers in the pharma and biotech industry presents key success factors in R&D along with value creators in pharmaceutical innovation.
A team of editors and authors with extensive experience in academia and industry and at some of the most prestigious business schools in Europe discusses in detail the innovation process in pharma as well as common and new research and innovation strategies. In doing so, they cover collaboration and partnerships, open innovation, biopharmaceuticals, translational medicine, good manufacturing practice, regulatory affairs, and portfolio management. Each chapter covers controversial aspects of recent developments in the pharmaceutical industry, with the aim of stimulating productive debates on the most effective and efficient innovation processes.
A must-have for young professionals and MBA students preparing to enter R&D in pharma or biotech as well as for students on a combined BA/biomedical and natural sciences program.
A simple and practical adaptive trust-region method
2025
We present an adaptive trust-region method for unconstrained optimization that allows inexact solutions to the trust-region subproblems. Our method is a simple variant of the classical trust-region method of \\citet{sorensen1982newton}. The method achieves the best possible convergence bound up to an additive log factor, for finding an \\(\\epsilon\\)-approximate stationary point, i.e., \\(O( \\Delta_f L^{1/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iterations where \\(L\\) is the Lipschitz constant of the Hessian, \\(\\Delta_f\\) is the optimality gap, and \\(\\epsilon\\) is the termination tolerance for the gradient norm. This improves over existing trust-region methods whose worst-case bound is at least a factor of \\(L\\) worse. We compare our performance with state-of-the-art trust-region (TRU) and cubic regularization (ARC) methods from the GALAHAD library on the CUTEst benchmark set on problems with more than 100 variables. We use fewer function, gradient, and Hessian evaluations than these methods. For instance, our algorithm's median number of gradient evaluations is \\(23\\) compared to \\(36\\) for TRU and \\(29\\) for ARC. Compared to the conference version of this paper \\cite{hamad2022consistently}, our revised method includes several practical enhancements. These modifications dramatically improved performance, including an order of magnitude reduction in the shifted geometric mean of wall-clock times. We also show it suffices for the second derivatives to be locally Lipschitz to guarantee that either the minimum gradient norm converges to zero or the objective value tends towards negative infinity, even when the iterates diverge.
A simple and practical adaptive trust-region method
2024
We present an adaptive trust-region method for unconstrained optimization that allows inexact solutions to the trust-region subproblems. Our method is a simple variant of the classical trust-region method of Sorensen [1]. The method achieves the best possible convergence bound up to an additive log factor, for finding an \\(\\epsilon\\)-approximate stationary point, i.e., \\(O( \\Delta_f L^{1/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iterations where \\(L\\) is the Lipschitz constant of the Hessian, \\(\\Delta_f\\) is the optimality gap, and \\(\\epsilon\\) is the termination tolerance for the gradient norm. This improves over existing trust-region methods whose worst-case bound is at least a factor of \\(L\\) worse. We compare our performance with state-of-the-art trust-region (TRU) and cubic regularization (ARC) methods from the GALAHAD library on the CUTEst benchmark set on problems with more than 100 variables. We use fewer function, gradient, and Hessian evaluations than these methods. For instance, our algorithm's median number of gradient evaluations is \\(23\\) compared to \\(36\\) for TRU and \\(29\\) for ARC. Compared to the conference version of this paper [2], our revised method includes practical enhancements and a refined subproblems termination criterion. These modifications dramatically improved performance, including an order of magnitude reduction in the shifted geometric mean of wall-clock times.
A consistently adaptive trust-region method
2024
Adaptive trust-region methods attempt to maintain strong convergence guarantees without depending on conservative estimates of problem properties such as Lipschitz constants. However, on close inspection, one can show existing adaptive trust-region methods have theoretical guarantees with severely suboptimal dependence on problem properties such as the Lipschitz constant of the Hessian. For example, TRACE developed by Curtis et al. obtains a \\(O(\\Delta_f L^{3/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iteration bound where \\(L\\) is the Lipschitz constant of the Hessian. Compared with the optimal \\(O(\\Delta_f L^{1/2} \\epsilon^{-3/2})\\) bound this is suboptimal with respect to \\(L\\). We present the first adaptive trust-region method which circumvents this issue and requires at most \\(O( \\Delta_f L^{1/2} \\epsilon^{-3/2}) + \\tilde{O}(1)\\) iterations to find an \\(\\epsilon\\)-approximate stationary point, matching the optimal iteration bound up to an additive logarithmic term. Our method is a simple variant of a classic trust-region method and in our experiments performs competitively with both ARC and a classical trust-region method.
The Price of Adaptivity in Stochastic Convex Optimization
2024
We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a \"price of adaptivity\" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and double-logarithmic for median suboptimality. When there is uncertainty in both distance and gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch. En route, we also establish tight upper and lower bounds for (known-parameter) high-probability stochastic convex optimization with heavy-tailed and bounded noise, respectively.