Catalogue Search | MBRL

BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS

by Bertsimas, Dimitris , Mazumder, Rahul , King, Angela in 62G35 , 62J05 , 62J07

2016

In the period 1991-2015, algorithmic advances in Mixed Integer Optimization (MIO) coupled with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving MIO problems. We present a MIO approach for solving the classical best subset selection problem of choosing k out of p features in linear regression given n observations. We develop a discrete extension of modern first-order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with n in the 1000s and p in the 100s in minutes to provable optimality, and finds near optimal solutions for n in the 100s and p in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than Lasso and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

Journal Article

Share this book

Add to My Shelf

Trust-region problems with linear inequality constraints: exact SDP relaxation, global optimality and robust optimization

by Li, G. Y. , Jeyakumar, V. in Analysis , Calculus of Variations and Optimal Control; Optimization , Combinatorics

2014

The trust-region problem, which minimizes a nonconvex quadratic function over a ball, is a key subproblem in trust-region methods for solving nonlinear optimization problems. It enjoys many attractive properties such as an exact semi-definite linear programming relaxation (SDP-relaxation) and strong duality. Unfortunately, such properties do not, in general, hold for an extended trust-region problem having extra linear constraints. This paper shows that two useful and powerful features of the classical trust-region problem continue to hold for an extended trust-region problem with linear inequality constraints under a new dimension condition. First, we establish that the class of extended trust-region problems has an exact SDP-relaxation, which holds without the Slater constraint qualification. This is achieved by proving that a system of quadratic and affine functions involved in the model satisfies a range-convexity whenever the dimension condition is fulfilled. Second, we show that the dimension condition together with the Slater condition ensures that a set of combined first and second-order Lagrange multiplier conditions is necessary and sufficient for global optimality of the extended trust-region problem and consequently for strong duality. Through simple examples we also provide an insightful account of our development from SDP-relaxation to strong duality. Finally, we show that the dimension condition is easily satisfied for the extended trust-region model that arises from the reformulation of a robust least squares problem (LSP) as well as a robust second order cone programming model problem (SOCP) as an equivalent semi-definite linear programming problem. This leads us to conclude that, under mild assumptions, solving a robust LSP or SOCP under matrix-norm uncertainty or polyhedral uncertainty is equivalent to solving a semi-definite linear programming problem and so, their solutions can be validated in polynomial time.

Journal Article

Share this book

Add to My Shelf

Geometric approaches to matrix normalization and graph balancing

by Shonkwiler, Clayton , Needham, Tom in 05C50 , 15B57 , 65K10

2025

Normal matrices, or matrices which commute with their adjoints, are of fundamental importance in pure and applied mathematics. In this paper, we study a natural functional on the space of square complex matrices whose global minimizers are normal matrices. We show that this functional, which we refer to as the non-normal energy, has incredibly well-behaved gradient descent dynamics: despite it being nonconvex, we show that the only critical points of the non-normal energy are the normal matrices, and that its gradient descent trajectories fix matrix spectra and preserve the subset of real matrices. We also show that, even when restricted to the subset of unit Frobenius norm matrices, the gradient flow of the non-normal energy retains many of these useful properties. This is applied to prove that low-dimensional homotopy groups of spaces of unit norm normal matrices vanish; for example, we show that the space of $d \\times d$ complex unit norm normal matrices is simply connected for all $d \\geq 2$ . Finally, we consider the related problem of balancing a weighted directed graph – that is, readjusting its edge weights so that the weighted in-degree and out-degree are the same at each node. We adapt the non-normal energy to define another natural functional whose global minima are balanced graphs and show that gradient descent of this functional always converges to a balanced graph, while preserving graph spectra and realness of the weights. Our results were inspired by concepts from symplectic geometry and Geometric Invariant Theory, but we mostly avoid invoking this machinery and our proofs are generally self-contained.

Journal Article

Share this book

Add to My Shelf

Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems

by Li, Guoyin , Pong, Ting Kei in Algebra , Algorithms , Applied mathematics

2016

We adapt the Douglas–Rachford (DR) splitting method to solve nonconvex feasibility problems by studying this method for a class of nonconvex optimization problem. While the convergence properties of the method for convex problems have been well studied, far less is known in the nonconvex setting. In this paper, for the direct adaptation of the method to minimize the sum of a proper closed function g and a smooth function f with a Lipschitz continuous gradient, we show that if the step-size parameter is smaller than a computable threshold and the sequence generated has a cluster point, then it gives a stationary point of the optimization problem. Convergence of the whole sequence and a local convergence rate are also established under the additional assumption that f and g are semi-algebraic. We also give simple sufficient conditions guaranteeing the boundedness of the sequence generated. We then apply our nonconvex DR splitting method to finding a point in the intersection of a closed convex set C and a general closed set D by minimizing the squared distance to C subject to D . We show that if either set is bounded and the step-size parameter is smaller than a computable threshold, then the sequence generated from the DR splitting method is actually bounded. Consequently, the sequence generated will have cluster points that are stationary for an optimization problem, and the whole sequence is convergent under an additional assumption that C and D are semi-algebraic. We achieve these results based on a new merit function constructed particularly for the DR splitting method. Our preliminary numerical results indicate that our DR splitting method usually outperforms the alternating projection method in finding a sparse solution of a linear system, in terms of both the solution quality and the number of iterations taken.

Journal Article

Share this book

Add to My Shelf

OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS

by Wang, Zhaoran , Zhang, Tong , Liu, Han in 62F30 , 62J12 , 90C26

2014

We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path-following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence for any local solution attained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty.

Journal Article

Share this book

Add to My Shelf

Some calculus rules, generalized convexity via convexifactors and their applications

by Kohli, Bhawna

2024

Journal Article

Share this book

Add to My Shelf

An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization

by Yang, Xiaoqi , Pan, Shaohua , Wu, Yuqia in Applied mathematics , Approximation , Convergence

2024

This paper focuses on the minimization of a sum of a twice continuously differentiable function f and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed by an approximation to the Hessian of f involving the ϱ th power of the KKT residual. For ϱ = 0 , we justify the global convergence of the iterate sequence for the KL objective function and its R-linear convergence rate for the KL objective function of exponent 1/2. For ϱ ∈ ( 0 , 1 ) , by assuming that cluster points satisfy a locally Hölderian error bound of order q on a second-order stationary point set and a local error bound of order q > 1 + ϱ on the common stationary point set, respectively, we establish the global convergence of the iterate sequence and its superlinear convergence rate with order depending on q and ϱ . A dual semismooth Newton augmented Lagrangian method is also developed for seeking an inexact minimizer of subproblems. Numerical comparisons with two state-of-the-art methods on ℓ 1 -regularized Student’s t -regressions, group penalized Student’s t -regressions, and nonconvex image restoration confirm the efficiency of the proposed method.

Journal Article

Share this book

Add to My Shelf

Non-convex scenario optimization

by Campi, Marco C. , Garatti, Simone in Calculus of Variations and Optimal Control; Optimization , Combinatorics , Full Length Paper

2025

Scenario optimization is an approach to data-driven decision-making that has been introduced some fifteen years ago and has ever since then grown fast. Its most remarkable feature is that it blends the heuristic nature of data-driven methods with a rigorous theory that allows one to gain factual, reliable, insight in the solution. The usability of the scenario theory, however, has been restrained thus far by the obstacle that most results are standing on the assumption of convexity . With this paper, we aim to free the theory from this limitation. Specifically, we focus on the body of results that are known under the name of “wait-and-judge” and show that its fundamental achievements maintain their validity in a non-convex setup. While optimization is a major center of attention, this paper travels beyond it and into data-driven decision making . Adopting such a broad framework opens the door to building a new theory of truly vast applicability.

Journal Article

Share this book

Add to My Shelf

Finding global minima via kernel approximations

by Marteau-Ferey, Ulysse , Rudi, Alessandro , Bach, Francis in Calculus of Variations and Optimal Control; Optimization , Combinatorics , Full Length Paper

2025

We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sum-of-squares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinite-dimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum. Given n samples, the computational cost is O ( n 3.5 ) in time, O ( n 2 ) in space, and we achieve a convergence rate to the global optimum that is O ( n - m / d + 1 / 2 + 3 / d ) where m is the degree of differentiability of the function and d the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions with many derivatives. Indeed, when m is in the order of d , the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worst-case constants (that we track explicitly through the paper).

Journal Article

Share this book

Add to My Shelf

An Outer Space Approach to Tackle Generalized Affine Fractional Program Problems

by Jiao, Hongwei , Li, Binbin , Shang, Youlin in Affine transformations , Algorithms , Approximation

2024

This paper aims to globally solve a generalized affine fractional program problem (GAFPP). Firstly, by introducing some outer space variables and performing equivalent transformations, we can derive the equivalence problem (EP) of the GAFPP. Secondly, by constructing a novel linear relaxation method, we can deduce the affine relaxation problem (ARP) of the EP. Next, by solving the ARP to compute the lower bound, we propose a new outer space branch-and-bound algorithm for tackling the GAFPP. Then, the global convergence of the algorithm is proved, and the computational complexity of the algorithm in the worst case is analyzed. Finally, numerical experimental results are reported to illustrate the effectiveness of the algorithm.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter