Catalogue Search | MBRL

Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications

by Wen, Bihan , Ravishankar, Saiprasad , Bresler, Yoram in Algorithms , Analysis , Artificial Intelligence

2015

In recent years, sparse signal modeling, especially using the synthesis model has been popular. Sparse coding in the synthesis model is however, NP-hard. Recently, interest has turned to the sparsifying transform model, for which sparse coding is cheap. However, natural images typically contain diverse textures that cannot be sparsified well by a single transform. Hence, in this work, we propose a union of sparsifying transforms model. Sparse coding in this model reduces to a form of clustering. The proposed model is also equivalent to a structured overcomplete sparsifying transform model with block cosparsity, dubbed OCTOBOS. The alternating algorithm introduced for learning such transforms involves simple closed-form solutions. A theoretical analysis provides a convergence guarantee for this algorithm. It is shown to be globally convergent to the set of partial minimizers of the non-convex learning problem. We also show that under certain conditions, the algorithm converges to the set of stationary points of the overall objective. When applied to images, the algorithm learns a collection of well-conditioned square transforms, and a good clustering of patches or textures. The resulting sparse representations for the images are much better than those obtained with a single learned transform, or with analytical transforms. We show the promising performance of the proposed approach in image denoising, which compares quite favorably with approaches involving a single learned square transform or an overcomplete synthesis dictionary, or gaussian mixture models. The proposed denoising method is also faster than the synthesis dictionary based approach.

Journal Article

Share this book

Add to My Shelf

SINGULARITY, MISSPECIFICATION AND THE CONVERGENCE RATE OF EM

by Yu, Bin , Khamaru, Koulik , Jordan, Michael I. in Algorithms , Convergence , Empirical analysis

2020

A line of recent work has analyzed the behavior of the Expectation-Maximization (EM) algorithm in the well-specified setting, in which the population likelihood is locally strongly concave around its maximizing argument. Examples include suitably separated Gaussian mixture models and mixtures of linear regressions. We consider over-specified settings in which the number of fitted components is larger than the number of components in the true distribution. Such mis-specified settings can lead to singularity in the Fisher information matrix, and moreover, the maximum likelihood estimator based on n i.i.d. samples in d dimensions can have a nonstandard O ( ( d / n ) 1 4 ) rate of convergence. Focusing on the simple setting of two-component mixtures fit to a d-dimensional Gaussian distribution, we study the behavior of the EM algorithm both when the mixture weights are different (unbalanced case), and are equal (balanced case). Our analysis reveals a sharp distinction between these two cases: in the former, the EM algorithm converges geometrically to a point at Euclidean distance of O ( ( d / n ) 1 2 ) from the true parameter, whereas in the latter case, the convergence rate is exponentially slower, and the fixed point has a much lower O ( ( d / n ) 1 4 ) accuracy. Analysis of this singular case requires the introduction of some novel techniques: in particular, we make use of a careful form of localization in the associated empirical process, and develop a recursive argument to progressively sharpen the statistical rate.

Journal Article

Share this book

Add to My Shelf

Tensor graphical lasso (TeraLasso)

by Greenewald, Kristjan , Hero, Alfred , Zhou, Shuheng in Algorithms , Cartesian coordinates , Computer simulation

2019

The paper introduces a multiway tensor generalization of the bigraphical lasso which uses a two-way sparse Kronecker sum multivariate normal model for the precision matrix to model parsimoniously conditional dependence relationships of matrix variate data based on the Cartesian product of graphs. We call this tensor graphical lasso generalization TeraLasso. We demonstrate by using theory and examples that the TeraLasso model can be accurately and scalably estimated from very limited data samples of high dimensional variables with multiway co-ordinates such as space, time and replicates. Statistical consistency and statistical rates of convergence are established for both the bigraphical lasso and TeraLasso estimators of the precision matrix and estimators of its support (non-sparsity) set respectively. We propose a scalable composite gradient descent algorithm and analyse the computational convergence rate, showing that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to the global minimizer of the TeraLasso objective function. Finally, we illustrate TeraLasso by using both simulation and experimental data from a meteorological data set, showing that we can accurately estimate precision matrices and recover meaningful conditional dependence graphs from high dimensional complex data sets.

Journal Article

Share this book

Add to My Shelf

convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees

by Khare, Kshitij , Rajaratnam, Bala , Oh, Sang‐Yun in Analysis , Analysis of covariance , Breast cancer

2015

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l₁‐penalties to either parametric likelihoods, or regularized regression/pseudolikelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudolikelihood‐based objective functions have provable convergence guarantees, it is not clear whether corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. We propose a new pseudolikelihood‐based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a co‐ordinatewise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established by using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well defined under very general conditions and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated and real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudolikelihood methods as special cases of a more general formulation, leading to important insights.

Journal Article

Share this book

Add to My Shelf

Generic linear convergence through metric subregularity in a variable-metric extension of the proximal point algorithm

by Rockafellar, R. Tyrrell in Algorithms , Convergence , Decoupling

2023

The proximal point algorithm finds a zero of a maximal monotone mapping by iterations in which the mapping is made strongly monotone by the addition of a proximal term. Here it is articulated with the norm behind the proximal term possibly shifting from one iteration to the next, but under conditions that eventually make the metric settle down. Despite the varying geometry, the sequence generated by the algorithm is shown to converge to a particular solution. Although this is not the first variable-metric extension of proximal point algorithm, it is the first to retain the flexibility needed for applications to augmented Lagrangian methodology and progressive decoupling. Moreover, in a generic sense, the convergence it generates is Q-linear at a rate that depends in a simple way on the modulus of metric subregularity of the mapping at that solution. This is a tighter rate than previously identified and reveals for the first time the definitive role of metric subregularity in how the proximal point algorithm performs, even in fixed-metric mode.

Journal Article

Share this book

Add to My Shelf

Generalizations of the proximal method of multipliers in convex optimization

by Rockafellar, R. Tyrrell in Algorithms , Convergence , Convex analysis

2024

The proximal method of multipliers, originally introduced as a way of solving convex programming problems with inequality constraints, is a proximally stabilized alternative to the augmented Lagrangian method that is sometimes called the proximal augmented Lagrangian method. It has gained attention as a vehicle for deriving decomposition algorithms for wider formulations of problems in convex optimization than just convex programming. Here those themes are developed further. The basic algorithm is articulated in several seemingly different formats that are equivalent under exact computations, but diverge when minimization steps are executed only approximately. Stopping criteria are demonstrated to maintain convergence to a particular solution despite such approximations. Q-linear convergence is obtained from a metric regularity property of the Lagrangian mapping at the solution that acts as a mildly enhanced condition for local optimality on top of convexity and is generically available, in a sense. Moreover, all this is brought about with the proximal terms allowed to vary in their underlying metric from one iteration to the next. That generalization enables the results to be translated to the theory of the progressive decoupling algorithm, significantly adding to its versatility and providing linear convergence guarantees in its broad applicability to techniques for problem decomposition.

Journal Article

Share this book

Add to My Shelf

Sampled Gromov Wasserstein

by Kerdoncuff, Tanguy , Emonet, Rémi , Sebban, Marc in Complexity , Experiments , Machine learning

2021

Optimal Transport (OT) has proven to be a powerful tool to compare probability distributions in machine learning, but dealing with probability measures lying in different spaces remains an open problem. To address this issue, the Gromov Wasserstein distance (GW) only considers intra-distribution pairwise (dis)similarities. However, for two (discrete) distributions with N points, the state of the art solvers have an iterative O(N4) complexity when using an arbitrary loss function, making most of the real world problems intractable. In this paper, we introduce a new iterative way to approximate GW, called Sampled Gromov Wasserstein, which uses the current estimate of the transport plan to guide the sampling of cost matrices. This simple idea, supported by theoretical convergence guarantees, comes with a O(N2) solver. A special case of Sampled Gromov Wasserstein, which can be seen as the natural extension of the well known Sliced Wasserstein to distributions lying in different spaces, reduces even further the complexity to O(N log N). Our contributions are supported by experiments on synthetic and real datasets.

Journal Article

Share this book

Add to My Shelf

Entropy-Regularized Federated Optimization for Non-IID Data

by Khan, Koffka in Accuracy , Algorithms , Communication

2025

Federated learning (FL) struggles under non-IID client data when local models drift toward conflicting optima, impairing global convergence and performance. We introduce entropy-regularized federated optimization (ERFO), a lightweight client-side modification that augments each local objective with a Shannon entropy penalty on the per-parameter update distribution. ERFO requires no additional communication, adds a single-scalar hyperparameter λ, and integrates seamlessly into any FedAvg-style training loop. We derive a closed-form gradient for the entropy regularizer and provide convergence guarantees: under μ-strong convexity and L-smoothness, ERFO achieves the same O(1/T) (or linear) rates as FedAvg (with only O(λ) bias for fixed λ and exact convergence when λt→0); in the non-convex case, we prove stationary-point convergence at O(1/T). Empirically, on five-client non-IID splits of the UNSW-NB15 intrusion-detection dataset, ERFO yields a +1.6 pp gain in accuracy and +0.008 in macro-F1 over FedAvg with markedly smoother dynamics. On a three-of-five split of PneumoniaMNIST, a fixed λ matches or exceeds FedAvg, FedProx, and SCAFFOLD—achieving 90.3% accuracy and 0.878 macro-F1—while preserving rapid, stable learning. ERFO’s gradient-only design is model-agnostic, making it broadly applicable across tasks.

Journal Article

Share this book

Add to My Shelf

A distributed approach to the OPF problem

by Erseghe, Tomaso in Access control , Adhesion , Advanced signal processing techniques and telecommunications network infrastructures for Smart Grid analysis

2015

This paper presents a distributed approach to optimal power flow (OPF) in an electrical network, suitable for application in a future smart grid scenario where access to resource and control is decentralized. The non-convex OPF problem is solved by an augmented Lagrangian method, similar to the widely known ADMM algorithm, with the key distinction that penalty parameters are constantly increased. A (weak) assumption on local solver reliability is required to always ensure convergence. A certificate of convergence to a local optimum is available in the case of bounded penalty parameters. For moderate sized networks (up to 300 nodes, and even in the presence of a severe partition of the network), the approach guarantees a performance very close to the optimum, with an appreciably fast convergence speed. The generality of the approach makes it applicable to any (convex or non-convex) distributed optimization problem in networked form. In the comparison with the literature, mostly focused on convex SDP approximations, the chosen approach guarantees adherence to the reference problem, and it also requires a smaller local computational complexity effort.

Journal Article

Share this book

Add to My Shelf

A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees

by Oh, Sang-Yun , Khare, Kshitij , Rajaratnam, Bala in Convergence guarantee , Gene regulatory network , Generalized pseudo-likelihood

2014

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either parametric likelihoods, or regularized regression/pseudolikelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudolikelihood-based objective functions have provable convergence guarantees, it is not clear whether corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. We propose a new pseudolikelihood-based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established by using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well defined under very general conditions and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated and real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudolikelihood methods as special cases of a more general formulation, leading to important insights.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter