Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
5,347 result(s) for "Support recovery"
Sort by:
STATISTICAL CONSISTENCY AND ASYMPTOTIC NORMALITY FOR HIGH-DIMENSIONAL ROBUST M-ESTIMATORS
We study theoretical properties of regularized robust M-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an ℓ₁-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support; hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex M-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex M-estimator to achieve consistency and a nonconvex M-estimator to increase efficiency. We conclude with simulation results that corroborate our theoretical findings.
Simultaneous Inference for High-Dimensional Linear Models
This article proposes a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional sparse linear models based on the recent desparsifying Lasso estimator. Our procedure allows the dimension of the parameter vector of interest to be exponentially larger than sample size, and it automatically accounts for the dependence within the desparsifying Lasso estimator. Moreover, our simultaneous testing method can be naturally coupled with the margin screening to enhance its power in sparse testing with a reduced computational cost, or with the step-down method to provide a strong control for the family-wise error rate. In theory, we prove that our simultaneous testing procedure asymptotically achieves the prespecified significance level, and enjoys certain optimality in terms of its power even when the model errors are non-Gaussian. Our general theory is also useful in studying the support recovery problem. To broaden the applicability, we further extend our main results to generalized linear models with convex loss functions. The effectiveness of our methods is demonstrated via simulation studies. Supplementary materials for this article are available online.
ADAPTIVE ESTIMATION IN STRUCTURED FACTOR MODELS WITH APPLICATIONS TO OVERLAPPING CLUSTERING
This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix A in a latent factor model X = AZ + E, for an observable random vector X ∈ ℝ p , with correlated unobservable factors Z ∈ ℝ K , with K unknown, and uncorrelated noise E. Each row of A is scaled, and allowed to be sparse. In order to identify the loading matrix A, we require the existence of pure variables, which are components of X that are associated, via A, with one and only one latent factor. Despite the fact that the number of factors K, the number of the pure variables and their location are all unknown, we only require a mild condition on the covariance matrix of Z, and a minimum of only two pure variables per latent factor to show that A is uniquely defined, up to signed permutations. Our proofs for model identifiability are constructive, and lead to our novel estimation method of the number of factors and of the set of pure variables, from a sample of size n of observations on X. This is the first step of our LOVE algorithm, which is optimization-free, and has low computational complexity of order p². The second step of LOVE is an easily implementable linear program that estimates A. We prove that the resulting estimator is near minimax rate optimal for A, with respect to the ǁ ǁ∞,q loss, for q ≥ 1, up to logarithmic factors in p, and that it can be minimax-rate optimal in many cases of interest. The model structure is motivated by the problem of overlapping variable clustering, ubiquitous in data science. We define the population level clusters as groups of those components of X that are associated, via the matrix A, with the same unobservable latent factor, and multifactor association is allowed. Clusters are respectively anchored by the pure variables, and form overlapping subgroups of the p-dimensional random vector X. The Latent model approach to OVErlapping clustering is reflected in the name of our algorithm, LOVE. The third step of LOVE estimates the clusters from the support of the columns of the estimated A. We guarantee cluster recovery with zero false positive proportion, and with false negative proportion control. The practical relevance of LOVE is illustrated through the analysis of a RNA-seq data set, devoted to determining the functional annotation of genes with unknown function.
Peer-delivered harm reduction and recovery support services: initial evaluation from a hybrid recovery community drop-in center and syringe exchange program
Background Recovery from substance use disorder (SUD) is often considered at odds with harm reduction strategies. More recently, harm reduction has been categorized as both a pathway to recovery and a series of services to reduce the harmful consequences of substance use. Peer recovery support services (PRSS) are effective in improving SUD outcomes, as well as improving the engagement and effectiveness of harm reduction programs. Methods This study provides an initial evaluation of a hybrid recovery community organization providing PRSS as well as peer-based harm reduction services via a syringe exchange program. Administrative data collected during normal operations of the Missouri Network for Opiate Reform and Recovery were analyzed using Pearson chi-square tests and Monte Carlo chi-square tests. Results Intravenous substance-using participants ( N  = 417) had an average of 2.14 engagements (SD = 2.59) with the program. Over the evaluation period, a range of 5345–8995 sterile syringes were provided, with a range of 600–1530 used syringes collected. Participant housing status, criminal justice status, and previous health diagnosis were all significantly related to whether they had multiple engagements. Conclusions Results suggest that recovery community organizations are well situated and staffed to also provide harm reduction services, such as syringe exchange programs. Given the relationship between engagement and participant housing, criminal justice status, and previous health diagnosis, recommendations for service delivery include additional education and outreach for homeless, justice-involved, LatinX, and LGBTQ+ identifying individuals.
Sparse classification: a scalable discrete optimization perspective
We formulate the sparse classification problem of n samples with p features as a binary convex optimization problem and propose a outer-approximation algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm finds optimal solutions for n and p in the 10,000 s within minutes. On synthetic data our algorithm achieves perfect support recovery in the large sample regime. Namely, there exists an n0 such that the algorithm takes a long time to find an optimal solution and does not recover the correct support for n0.
Adaptive Thresholding for Sparse Covariance Matrix Estimation
In this article we consider estimation of sparse covariance matrices and propose a thresholding procedure that is adaptive to the variability of individual entries. The estimators are fully data-driven and demonstrate excellent performance both theoretically and numerically. It is shown that the estimators adaptively achieve the optimal rate of convergence over a large class of sparse covariance matrices under the spectral norm. In contrast, the commonly used universal thresholding estimators are shown to be suboptimal over the same parameter spaces. Support recovery is discussed as well. The adaptive thresholding estimators are easy to implement. The numerical performance of the estimators is studied using both simulated and real data. Simulation results demonstrate that the adaptive thresholding estimators uniformly outperform the universal thresholding estimators. The method is also illustrated in an analysis on a dataset from a small round blue-cell tumor microarray experiment. A supplement to this article presenting additional technical proofs is available online.
PENALIZED INTERACTION ESTIMATION FOR ULTRAHIGH DIMENSIONAL QUADRATIC REGRESSION
Quadratic regressions extend linear models by simultaneously including the main effects and the interactions between the covariates. As such, estimating interactions in high-dimensional quadratic regressions has received extensive attention. Here, we introduce a novel method that allows us to estimate the main effects and the interactions separately. Unlike existing methods for ultrahigh-dimensional quadratic regressions, our proposal does not require the widely used heredity assumption. In addition, our proposed estimates have explicit formulae and obey the invariance principle at the population level. We estimate the interactions in matrix form under a penalized convex loss function. The resulting estimates are shown to be consistent, even when the covariate dimension is an exponential order of the sample size. We develop an efficient alternating direction method of multipliers algorithm to implement the penalized estimation. This algorithm fully exploits the cheap computational cost of the matrix multiplication and is much more efficient than existing penalized methods, such as the all-pairs LASSO. We demonstrate the promising performance of the proposed method using extensive numerical studies.
Byzantine-robust distributed sparse learning for M-estimation
In a distributed computing environment, there is usually a small fraction of machines that are corrupted and send arbitrary erroneous information to the master machine. This phenomenon is modeled as a Byzantine failure. Byzantine-robust distributed learning has recently become an important topic in machine learning research. In this paper, we develop a Byzantine-resilient method for the distributed sparse M -estimation problem. When the loss function is non-smooth, it is computationally costly to solve the penalized non-smooth optimization problem in a direct manner. To alleviate the computational burden, we construct a pseudo-response variable and transform the original problem into an ℓ 1 -penalized least-squares problem, which is much more computationally feasible. Based on this idea, we develop a communication-efficient distributed algorithm. Theoretically, we show that the proposed estimator obtains a fast convergence rate with only a constant number of iterations. Furthermore, we establish a support recovery result, which, to the best of our knowledge, is the first such result in the literature of Byzantine-robust distributed learning. We demonstrate the effectiveness of our approach in simulation.
DISTRIBUTED SPARSE COMPOSITE QUANTILE REGRESSION IN ULTRAHIGH DIMENSIONS
We examine distributed estimation and support recovery for ultrahigh-dimensional linear regression models under a potentially arbitrary noise distribution. The composite quantile regression is an efficient alternative to the least squares method, and provides robustness against heavy-tailed noise while maintaining reasonable efficiency in the case of light-tailed noise. The highly nonsmooth nature of the composite quantile regression loss poses challenges to both the theoretical and the computational development in an ultrahigh-dimensional distributed estimation setting. Thus, we cast the composite quantile regression into the least squares framework, and propose a distributed algorithm based on an approximate Newton method. This algorithm is efficient in terms of both computation and communication, and requires only gradient information to be communicated between the machines. We show that the resultant distributed estimator attains a near-oracle rate after a constant number of communications, and provide theoretical guarantees for its estimation and support recovery accuracy. Extensive experiments demonstrate the competitive empirical performance of our algorithm.
Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical Model
We propose an asymptotically normal and efficient procedure to estimate every finite subgraph for covariate-adjusted Gaussian graphical model. As a consequence, a confidence interval as well as p-value can be obtained for each edge. The procedure is tuning-free and enjoys easy implementation and efficient computation through parallel estimation on subgraphs or edges. We apply the asymptotic normality result to perform support recovery through edge-wise adaptive thresholding. This support recovery procedure is called ANTAC, standing for asymptotically normal estimation with thresholding after adjusting covariates. ANTAC outperforms other methodologies in the literature in a range of simulation studies. We apply ANTAC to identify gene-gene interactions using an eQTL dataset. Our result achieves better interpretability and accuracy in comparison with a state-of-the-art method. Supplementary materials for the article are available online.