Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
732 result(s) for "High dimensional spaces"
Sort by:
CHAOS AND UNPREDICTABILITY IN EVOLUTION
The possibility of complicated dynamic behavior driven by nonlinear feedbacks in dynamical systems has revolutionized science in the latter part of the last century. Yet despite examples of complicated frequency dynamics, the possibility of long-term evolutionary chaos is rarely considered. The concept of \"survival of the fittest\" is central to much evolutionary thinking and embodies a perspective of evolution as a directional optimization process exhibiting simple, predictable dynamics. This perspective is adequate for simple scenarios, when frequency-independent selection acts on scalar phenotypes. However, in most organisms many phenotypic properties combine in complicated ways to determine ecological interactions, and hence frequency-dependent selection. Therefore, it is natural to consider models for evolutionary dynamics generated by frequency-dependent selection acting simultaneously on many different phenotypes. Here we show that complicated, chaotic dynamics of long-term evolutionary trajectories in phenotype space is very common in a large class of such models when the dimension of phenotype space is large, and when there are selective interactions between the phenotypic components. Our results suggest that the perspective of evolution as a process with simple, predictable dynamics covers only a small fragment of long-term evolution.
ROBUST SUBSPACE CLUSTERING
Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.
HIGH-DIMENSIONAL GENERALIZATIONS OF ASYMMETRIC LEAST SQUARES REGRESSION AND THEIR APPLICATIONS
Asymmetric least squares regression is an important method that has wide applications in statistics, econometrics and finance. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. In this paper, we systematically study the Sparse Asymmetric LEast Squares (SALES) regression under high dimensions where the penalty functions include the Lasso and nonconvex penalties. We develop a unified efficient algorithm for fitting SALES and establish its theoretical properties. As an important application, SALES is used to detect heteroscedasticity in high-dimensional data. Another method for detecting heteroscedasticity is the sparse quantile regression. However, both SALES and the sparse quantile regression may fail to tell which variables are important for the conditional mean and which variables are important for the conditional scale/variance, especially when there are variables that are important for both the mean and the scale. To that end, we further propose a COupled Sparse Asymmetric LEast Squares (COSALES) regression which can be efficiently solved by an algorithm similar to that for solving SALES. We establish theoretical properties of COSALES. In particular, COSALES using the SCAD penalty or MCP is shown to consistently identify the two important subsets for the mean and scale simultaneously, even when the two subsets overlap. We demonstrate the empirical performance of SALES and COSALES by simulated and real data.
Lasso-Type Recovery of Sparse Representations for High-Dimensional Data
The Lasso is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables $p_{n}$ is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the $\\ell _{2}\\text{-norm}$ sense for fixed designs under conditions on (a) the number $s_{n}$ of nonzero components of the vector $\\beta _{n}$ and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the $\\ell _{2}$ error with an appropriate choice of the smoothing parameter. The rate is shown to be optimal under the condition of bounded maximal and minimal sparse eigenvalues. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a meaningful reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.
COAUTHORSHIP AND CITATION NETWORKS FOR STATISTICIANS
We have collected and cleaned two network data sets: Coauthorship and Citation networks for statisticians. The data sets are based on all research papers published in four of the top journals in statistics from 2003 to the first half of 2012. We analyze the data sets from many different perspectives, focusing on (a) productivity, patterns and trends, (b) centrality and (c) community structures. For (a), we find that over the 10-year period, both the average number of papers per author and the fraction of self citations have been decreasing, but the proportion of distant citations has been increasing. These findings are consistent with the belief that the statistics community has become increasingly more collaborative, competitive and globalized. For (b), we have identified the most prolific/collaborative/highly cited authors. We have also identified a handful of \"hot\" papers, suggesting \"Variable Selection\" as one of the \"hot\" areas. For (c), we have identified about 15 meaningful communities or research groups, including large-size ones such as \"Spatial Statistics,\" \"Large-Scale Multiple Testing\" and \"Variable Selection\" as well as small-size ones such as \"Dimensional Reduction,\" \"Bayes,\" \"Quantile Regression\" and \"Theoretical Machine Learning.\" Our findings shed light on research habits, trends and topological patterns of statisticians. The data sets provide a fertile ground for future research on social networks.
On the Adaptive Elastic-Net with a Diverging Number of Parameters
We consider the problem of model selection and estimation in situations where the number of parameters diverges with the sample size. When the dimension is high, an ideal method should have the oracle property [J. Amer Statist. Assoc. 96 (2001) 1348-1360] and [Ann. Statist. 32 (2004) 928-961] which ensures the optimal large sample performance. Furthermore, the highdimensionality often induces the collinearity problem, which should be properly handled by the ideal method. Many existing variable selection methods fail to achieve both goals simultaneously. In this paper, we propose the adaptive elastic-net that combines the strengths of the quadratic regularization and the adaptively weighted lasso shrinkage. Under weak regularity conditions, we establish the oracle property of the adaptive elastic-net. We show by simulations that the adaptive elastic-net deals with the collinearity problem better than the other oracle-like methods, thus enjoying much improved finite sample performance.
A TWO-SAMPLE TEST FOR HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING
We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical T² test does not work for this \"large p, small n\" situation. The proposed test does not require explicit conditions in the relationship between the data dimension and sample size. This offers much flexibility in analyzing high-dimensional data. An application of the proposed test is in testing significance for sets of genes which we demonstrate in an empirical study on a leukemia data set.
p-Values for High-Dimensional Regression
Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder that splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. This involves a one-time random split of the data, however. Results are sensitive to this arbitrary choice, which amounts to a \"p-value lottery\" and makes it difficult to reproduce results. Here we show that inference across multiple random splits can be aggregated while maintaining asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error and false discovery rate. In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.
Sequential sufficient dimension reduction for large p, small n problems
We propose a new and simple framework for dimension reduction in the large p, small n setting. The framework decomposes the data into pieces, thereby enabling existing approaches for n>p to be adapted to n
Tests for High-Dimensional Covariance Matrices
We propose tests for sphericity and identity of high-dimensional covariance matrices. The tests are nonparametric without assuming a specific parametric distribution for the data. They can accommodate situations where the data dimension is much larger than the sample size, namely the \"large p, small n\" situations. We demonstrate by both theoretical and empirical studies that the tests have good properties for a wide range of dimensions and sample sizes. We applied the proposed test on a microarray dataset on Yorkshire Gilts and tested for the covariance structure for the expression levels for sets of genes.