Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
370
result(s) for
"optimal statistical rate"
Sort by:
OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS
2014
We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path-following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence for any local solution attained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty.
Journal Article
Residual Weighted Learning for Estimating Individualized Treatment Rules
by
Kosorok, Michael R.
,
Khan, Umer
,
Zhou, Xin
in
Algorithms
,
artificial intelligence
,
Clinical outcomes
2017
Personalized medicine has received increasing attention among statisticians, computer scientists, and clinical practitioners. A major component of personalized medicine is the estimation of individualized treatment rules (ITRs). Recently, Zhao et al. proposed outcome weighted learning (OWL) to construct ITRs that directly optimize the clinical outcome. Although OWL opens the door to introducing machine learning techniques to optimal treatment regimes, it still has some problems in performance. (1) The estimated ITR of OWL is affected by a simple shift of the outcome. (2) The rule from OWL tries to keep treatment assignments that subjects actually received. (3) There is no variable selection mechanism with OWL. All of them weaken the finite sample performance of OWL. In this article, we propose a general framework, called residual weighted learning (RWL), to alleviate these problems, and hence to improve finite sample performance. Unlike OWL which weights misclassification errors by clinical outcomes, RWL weights these errors by residuals of the outcome from a regression fit on clinical covariates excluding treatment assignment. We use the smoothed ramp loss function in RWL and provide a difference of convex (d.c.) algorithm to solve the corresponding nonconvex optimization problem. By estimating residuals with linear models or generalized linear models, RWL can effectively deal with different types of outcomes, such as continuous, binary, and count outcomes. We also propose variable selection methods for linear and nonlinear rules, respectively, to further improve the performance. We show that the resulting estimator of the treatment rule is consistent. We further obtain a rate of convergence for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule. The performance of the proposed RWL methods is illustrated in simulation studies and in an analysis of cystic fibrosis clinical trial data. Supplementary materials for this article are available online.
Journal Article
Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions
2017
Data subject to heavy-tailed errors are commonly encountered in various scientific fields. To address this problem, procedures based on quantile regression and least absolute deviation regression have been developed in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions, especially when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regression functions in ultrahigh dimensional settings with existence of only the second moment? To solve this problem, we propose a penalized Huber loss with diverging parameter to reduce biases created by the traditional Huber loss. Such a penalized robust approximate (RA) quadratic loss will be called the RA lasso. In the ultrahigh dimensional setting, where the dimensionality can grow exponentially with the sample size, our results reveal that the RA lasso estimator produces a consistent estimator at the same rate as the optimal rate under the light tail situation. We further study the computational convergence of the RA lasso and show that the composite gradient descent algorithm indeed produces a solution that admits the same optimal rate after sufficient iterations. As a by-product, we also establish the concentration inequality for estimating the population mean when there is only the second moment. We compare the RA lasso with other regularized robust estimators based on quantile regression and least absolute deviation regression. Extensive simulation studies demonstrate the satisfactory finite sample performance of the RA lasso.
Journal Article
I-LAMM FOR SPARSE LEARNING
2018
We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high-dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Theoretically, we establish a phase transition: the first stage has a sublinear iteration complexity, while the second stage achieves an improved linear rate of convergence. Though this framework is completely algorithmic, it provides solutions with optimal statistical performances and controlled algorithmic complexity for a large family of nonconvex optimization problems. The iteration effects on statistical errors are clearly demonstrated via a contraction property. Our theory relies on a localized version of the sparse/restricted eigenvalue condition, which allows us to analyze a large family of loss and penalty functions and provide optimality guarantees under very weak assumptions (e.g., I-LAMM requires much weaker minimal signal strength than other procedures). Thorough numerical results are provided to support the obtained theory.
Journal Article
A multi-objective optimization algorithm for feature selection problems
by
Gharehchopogh, Farhad Soleimanian
,
Abdollahzadeh, Benyamin
in
Algorithms
,
Control methods
,
Data mining
2022
Feature selection (FS) is a critical step in data mining, and machine learning algorithms play a crucial role in algorithms performance. It reduces the processing time and accuracy of the categories. In this paper, three different solutions are proposed to FS. In the first solution, the Harris Hawks Optimization (HHO) algorithm has been multiplied, and in the second solution, the Fruitfly Optimization Algorithm (FOA) has been multiplied, and in the third solution, these two solutions are hydride and are named MOHHOFOA. The results were tested with MOPSO, NSGA-II, BGWOPSOFS and B-MOABC algorithms for FS on 15 standard data sets with mean, best, worst, standard deviation (STD) criteria. The Wilcoxon statistical test was also used with a significance level of 5% and the Bonferroni–Holm method to control the family-wise error rate. The results are shown in the Pareto front charts, indicating that the proposed solutions' performance on the data set is promising.
Journal Article
OPTIMAL RATES FOR COMMUNITY ESTIMATION IN THE WEIGHTED STOCHASTIC BLOCK MODEL
2020
Community identification in a network is an important problem in fields such as social science, neuroscience and genetics. Over the past decade, stochastic block models (SBMs) have emerged as a popular statistical framework for this problem. However, SBMs have an important limitation in that they are suited only for networks with unweighted edges; in various scientific applications, disregarding the edge weights may result in a loss of valuable information. We study a weighted generalization of the SBM, in which observations are collected in the form of a weighted adjacency matrix and the weight of each edge is generated independently from an unknown probability density determined by the community membership of its endpoints. We characterize the optimal rate of misclustering error of the weighted SBM in terms of the Renyi divergence of order 1/2 between the weight distributions of within-community and between-community edges, substantially generalizing existing results for unweighted SBMs. Furthermore, we present a computationally tractable algorithm based on discretization that achieves the optimal error rate. Our method is adaptive in the sense that the algorithm, without assuming knowledge of the weight densities, performs as well as the best algorithm that knows the weight densities.
Journal Article
NUCLEAR-NORM PENALIZATION AND OPTIMAL RATES FOR NOISY LOW-RANK MATRIX COMPLETION
2011
This paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m₁ x m₂ matrix A₀ corrupted by noise are observed. We propose a new nuclear-norm penalized estimator of A₀ and establish a general sharp oracle inequality for this estimator for arbitrary values of n, m₁, m₂ under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form, and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting m₁ m₂ ≫ n. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix A₀, a nonminimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of A₀ with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by A₀, and the aim is to find the best trace regression model approximating the data. As a by-product, we show that, under the restricted eigenvalue condition, the usual vector Lasso estimator satisfies a sharp oracle inequality (i.e., an oracle inequality with leading constant 1).
Journal Article
Adaptive Thresholding for Sparse Covariance Matrix Estimation
by
Cai, Tony
,
Liu, Weidong
in
Acceleration of convergence
,
Analysis of covariance
,
Analytical estimating
2011
In this article we consider estimation of sparse covariance matrices and propose a thresholding procedure that is adaptive to the variability of individual entries. The estimators are fully data-driven and demonstrate excellent performance both theoretically and numerically. It is shown that the estimators adaptively achieve the optimal rate of convergence over a large class of sparse covariance matrices under the spectral norm. In contrast, the commonly used universal thresholding estimators are shown to be suboptimal over the same parameter spaces. Support recovery is discussed as well. The adaptive thresholding estimators are easy to implement. The numerical performance of the estimators is studied using both simulated and real data. Simulation results demonstrate that the adaptive thresholding estimators uniformly outperform the universal thresholding estimators. The method is also illustrated in an analysis on a dataset from a small round blue-cell tumor microarray experiment. A supplement to this article presenting additional technical proofs is available online.
Journal Article
Prediction in Functional Linear Regression
2006
There has been substantial recent work on methods for estimating the slope function in linear regression for functional data analysis. However, as in the case of more conventional finite-dimensional regression, much of the practical interest in the slope centers on its application for the purpose of prediction, rather than on its significance in its own right. We show that the problems of slope-function estimation, and of prediction from an estimator of the slope function, have very different characteristics. While the former is intrinsically nonparametric, the latter can be either nonparametric or semi-parametric. In particular, the optimal mean-square convergence rate of predictors is n⁻¹, where n denotes sample size, if the predictand is a sufficiently smooth function. In other cases, convergence occurs at a polynomial rate that is strictly slower than n⁻¹. At the boundary between these two regimes, the mean-square convergence rate is less than n⁻¹ by only a logarithmic factor. More generally, the rate of convergence of the predicted value of the mean response in the regression model, given a particular value of the explanatory variable, is determined by a subtle interaction among the smoothness of the predictand, of the slope function in the model, and of the autocovariance function for the distribution of explanatory variables.
Journal Article
ERROR ESTIMATES FOR A SEMIDISCRETE FINITE ELEMENT METHOD FOR FRACTIONAL ORDER PARABOLIC EQUATIONS
by
ZHOU, ZHI
,
LAZAROV, RAYTCHO
,
JIN, BANGTI
in
Analytical estimating
,
Applied mathematics
,
Approximation
2013
We consider the initial boundary value problem for a homogeneous time-fractional diffusion equation with an initial condition v(x) and a homogeneous Dirichlet boundary condition in a bounded convex polygonal domain Ω. We study two semidiscrete approximation schemes, i.e., the Galerkin finite element method (FEM) and lumped mass Galerkin FEM, using piecewise linear functions. We establish almost optimal with respect to the data regularity error estimates, including the cases of smooth and nonsmooth initial data, i.e., $v \\in {H^2}(\\Omega ) \\cap H_0^1(\\Omega )$ and v ∈ L₂ (Ω). For the lumped mass method, the optimal L₂-norm error estimate is valid only under an additional assumption on the mesh, which in two dimensions is known to be satisfied for symmetric meshes. Finally, we present some numerical results that give insight into the reliability of the theoretical study.
Journal Article