Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5,347
result(s) for
"Support recovery"
Sort by:
STATISTICAL CONSISTENCY AND ASYMPTOTIC NORMALITY FOR HIGH-DIMENSIONAL ROBUST M-ESTIMATORS
2017
We study theoretical properties of regularized robust M-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an ℓ₁-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support; hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex M-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex M-estimator to achieve consistency and a nonconvex M-estimator to increase efficiency. We conclude with simulation results that corroborate our theoretical findings.
Journal Article
Simultaneous Inference for High-Dimensional Linear Models
2017
This article proposes a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional sparse linear models based on the recent desparsifying Lasso estimator. Our procedure allows the dimension of the parameter vector of interest to be exponentially larger than sample size, and it automatically accounts for the dependence within the desparsifying Lasso estimator. Moreover, our simultaneous testing method can be naturally coupled with the margin screening to enhance its power in sparse testing with a reduced computational cost, or with the step-down method to provide a strong control for the family-wise error rate. In theory, we prove that our simultaneous testing procedure asymptotically achieves the prespecified significance level, and enjoys certain optimality in terms of its power even when the model errors are non-Gaussian. Our general theory is also useful in studying the support recovery problem. To broaden the applicability, we further extend our main results to generalized linear models with convex loss functions. The effectiveness of our methods is demonstrated via simulation studies. Supplementary materials for this article are available online.
Journal Article
ADAPTIVE ESTIMATION IN STRUCTURED FACTOR MODELS WITH APPLICATIONS TO OVERLAPPING CLUSTERING
2020
This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix A in a latent factor model X = AZ + E, for an observable random vector X ∈ ℝ
p
, with correlated unobservable factors Z ∈ ℝ
K
, with K unknown, and uncorrelated noise E. Each row of A is scaled, and allowed to be sparse. In order to identify the loading matrix A, we require the existence of pure variables, which are components of X that are associated, via A, with one and only one latent factor. Despite the fact that the number of factors K, the number of the pure variables and their location are all unknown, we only require a mild condition on the covariance matrix of Z, and a minimum of only two pure variables per latent factor to show that A is uniquely defined, up to signed permutations. Our proofs for model identifiability are constructive, and lead to our novel estimation method of the number of factors and of the set of pure variables, from a sample of size n of observations on X. This is the first step of our LOVE algorithm, which is optimization-free, and has low computational complexity of order p². The second step of LOVE is an easily implementable linear program that estimates A. We prove that the resulting estimator is near minimax rate optimal for A, with respect to the ǁ ǁ∞,q loss, for q ≥ 1, up to logarithmic factors in p, and that it can be minimax-rate optimal in many cases of interest.
The model structure is motivated by the problem of overlapping variable clustering, ubiquitous in data science. We define the population level clusters as groups of those components of X that are associated, via the matrix A, with the same unobservable latent factor, and multifactor association is allowed. Clusters are respectively anchored by the pure variables, and form overlapping subgroups of the p-dimensional random vector X. The Latent model approach to OVErlapping clustering is reflected in the name of our algorithm, LOVE.
The third step of LOVE estimates the clusters from the support of the columns of the estimated A. We guarantee cluster recovery with zero false positive proportion, and with false negative proportion control. The practical relevance of LOVE is illustrated through the analysis of a RNA-seq data set, devoted to determining the functional annotation of genes with unknown function.
Journal Article
Peer-delivered harm reduction and recovery support services: initial evaluation from a hybrid recovery community drop-in center and syringe exchange program
by
Curtis, Brenda
,
Ashford, Robert D.
,
Brown, Austin M.
in
Addictions
,
Adult
,
Care and treatment
2018
Background
Recovery from substance use disorder (SUD) is often considered at odds with harm reduction strategies. More recently, harm reduction has been categorized as both a pathway to recovery and a series of services to reduce the harmful consequences of substance use. Peer recovery support services (PRSS) are effective in improving SUD outcomes, as well as improving the engagement and effectiveness of harm reduction programs.
Methods
This study provides an initial evaluation of a hybrid recovery community organization providing PRSS as well as peer-based harm reduction services via a syringe exchange program. Administrative data collected during normal operations of the Missouri Network for Opiate Reform and Recovery were analyzed using Pearson chi-square tests and Monte Carlo chi-square tests.
Results
Intravenous substance-using participants (
N
= 417) had an average of 2.14 engagements (SD = 2.59) with the program. Over the evaluation period, a range of 5345–8995 sterile syringes were provided, with a range of 600–1530 used syringes collected. Participant housing status, criminal justice status, and previous health diagnosis were all significantly related to whether they had multiple engagements.
Conclusions
Results suggest that recovery community organizations are well situated and staffed to also provide harm reduction services, such as syringe exchange programs. Given the relationship between engagement and participant housing, criminal justice status, and previous health diagnosis, recommendations for service delivery include additional education and outreach for homeless, justice-involved, LatinX, and LGBTQ+ identifying individuals.
Journal Article
Sparse classification: a scalable discrete optimization perspective
by
Bertsimas, Dimitris
,
Pauphilet, Jean
,
Van Parys, Bart
in
Algorithms
,
Approximation
,
Classification
2021
We formulate the sparse classification problem of n samples with p features as a binary convex optimization problem and propose a outer-approximation algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm finds optimal solutions for n and p in the 10,000 s within minutes. On synthetic data our algorithm achieves perfect support recovery in the large sample regime. Namely, there exists an n0 such that the algorithm takes a long time to find an optimal solution and does not recover the correct support for n0.
Journal Article
Adaptive Thresholding for Sparse Covariance Matrix Estimation
by
Cai, Tony
,
Liu, Weidong
in
Acceleration of convergence
,
Analysis of covariance
,
Analytical estimating
2011
In this article we consider estimation of sparse covariance matrices and propose a thresholding procedure that is adaptive to the variability of individual entries. The estimators are fully data-driven and demonstrate excellent performance both theoretically and numerically. It is shown that the estimators adaptively achieve the optimal rate of convergence over a large class of sparse covariance matrices under the spectral norm. In contrast, the commonly used universal thresholding estimators are shown to be suboptimal over the same parameter spaces. Support recovery is discussed as well. The adaptive thresholding estimators are easy to implement. The numerical performance of the estimators is studied using both simulated and real data. Simulation results demonstrate that the adaptive thresholding estimators uniformly outperform the universal thresholding estimators. The method is also illustrated in an analysis on a dataset from a small round blue-cell tumor microarray experiment. A supplement to this article presenting additional technical proofs is available online.
Journal Article
PENALIZED INTERACTION ESTIMATION FOR ULTRAHIGH DIMENSIONAL QUADRATIC REGRESSION
2021
Quadratic regressions extend linear models by simultaneously including the main effects and the interactions between the covariates. As such, estimating interactions in high-dimensional quadratic regressions has received extensive attention. Here, we introduce a novel method that allows us to estimate the main effects and the interactions separately. Unlike existing methods for ultrahigh-dimensional quadratic regressions, our proposal does not require the widely used heredity assumption. In addition, our proposed estimates have explicit formulae and obey the invariance principle at the population level. We estimate the interactions in matrix form under a penalized convex loss function. The resulting estimates are shown to be consistent, even when the covariate dimension is an exponential order of the sample size. We develop an efficient alternating direction method of multipliers algorithm to implement the penalized estimation. This algorithm fully exploits the cheap computational cost of the matrix multiplication and is much more efficient than existing penalized methods, such as the all-pairs LASSO. We demonstrate the promising performance of the proposed method using extensive numerical studies.
Journal Article
Byzantine-robust distributed sparse learning for M-estimation
by
Tu, Jiyuan
,
Liu, Weidong
,
Mao, Xiaojun
in
Algorithms
,
Artificial Intelligence
,
Computer networks
2023
In a distributed computing environment, there is usually a small fraction of machines that are corrupted and send arbitrary erroneous information to the master machine. This phenomenon is modeled as a Byzantine failure. Byzantine-robust distributed learning has recently become an important topic in machine learning research. In this paper, we develop a Byzantine-resilient method for the distributed sparse
M
-estimation problem. When the loss function is non-smooth, it is computationally costly to solve the penalized non-smooth optimization problem in a direct manner. To alleviate the computational burden, we construct a pseudo-response variable and transform the original problem into an
ℓ
1
-penalized least-squares problem, which is much more computationally feasible. Based on this idea, we develop a communication-efficient distributed algorithm. Theoretically, we show that the proposed estimator obtains a fast convergence rate with only a constant number of iterations. Furthermore, we establish a support recovery result, which, to the best of our knowledge, is the first such result in the literature of Byzantine-robust distributed learning. We demonstrate the effectiveness of our approach in simulation.
Journal Article
DISTRIBUTED SPARSE COMPOSITE QUANTILE REGRESSION IN ULTRAHIGH DIMENSIONS
2023
We examine distributed estimation and support recovery for ultrahigh-dimensional linear regression models under a potentially arbitrary noise distribution. The composite quantile regression is an efficient alternative to the least squares method, and provides robustness against heavy-tailed noise while maintaining reasonable efficiency in the case of light-tailed noise. The highly nonsmooth nature of the composite quantile regression loss poses challenges to both the theoretical and the computational development in an ultrahigh-dimensional distributed estimation setting. Thus, we cast the composite quantile regression into the least squares framework, and propose a distributed algorithm based on an approximate Newton method. This algorithm is efficient in terms of both computation and communication, and requires only gradient information to be communicated between the machines. We show that the resultant distributed estimator attains a near-oracle rate after a constant number of communications, and provide theoretical guarantees for its estimation and support recovery accuracy. Extensive experiments demonstrate the competitive empirical performance of our algorithm.
Journal Article
Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical Model
by
Zhao, Hongyu
,
Ren, Zhao
,
Zhou, Harrison
in
Asymptotic methods
,
Computation
,
confidence interval
2016
We propose an asymptotically normal and efficient procedure to estimate every finite subgraph for covariate-adjusted Gaussian graphical model. As a consequence, a confidence interval as well as p-value can be obtained for each edge. The procedure is tuning-free and enjoys easy implementation and efficient computation through parallel estimation on subgraphs or edges. We apply the asymptotic normality result to perform support recovery through edge-wise adaptive thresholding. This support recovery procedure is called ANTAC, standing for asymptotically normal estimation with thresholding after adjusting covariates. ANTAC outperforms other methodologies in the literature in a range of simulation studies. We apply ANTAC to identify gene-gene interactions using an eQTL dataset. Our result achieves better interpretability and accuracy in comparison with a state-of-the-art method. Supplementary materials for the article are available online.
Journal Article