Catalogue Search | MBRL

Large covariance estimation by thresholding principal orthogonal complements

by Fan, Jianqing , Mincheva, Martina , Liao, Yuan in Analysis of covariance , Approximate factor model , Approximation

2013

The paper deals with the estimation of a high dimensional covariance with a conditional sparsity structure and fast diverging eigenvalues. By assuming a sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the principal orthogonal complement thresholding method 'POET' to explore such an approximate factor structure with sparsity. The POET-estimator includes the sample covariance matrix, the factor-based covariance matrix, the thresholding estimator and the adaptive thresholding estimator as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.

Journal Article

Share this book

Add to My Shelf

Large Covariance Estimation for Compositional Data Via Composition-Adjusted Thresholding

by Li, Hongzhe , Cao, Yuanpei , Lin, Wei in Adaptive thresholding , Analysis of covariance , Bacteria

2019

High-dimensional compositional data arise naturally in many applications such as metagenomic data analysis. The observed data lie in a high-dimensional simplex, and conventional statistical methods often fail to produce sensible results due to the unit-sum constraint. In this article, we address the problem of covariance estimation for high-dimensional compositional data and introduce a composition-adjusted thresholding (COAT) method under the assumption that the basis covariance matrix is sparse. Our method is based on a decomposition relating the compositional covariance to the basis covariance, which is approximately identifiable as the dimensionality tends to infinity. The resulting procedure can be viewed as thresholding the sample centered log-ratio covariance matrix and hence is scalable for large covariance matrices. We rigorously characterize the identifiability of the covariance parameters, derive rates of convergence under the spectral norm, and provide theoretical guarantees on support recovery. Simulation studies demonstrate that the COAT estimator outperforms some existing optimization-based estimators. We apply the proposed method to the analysis of a microbiome dataset to understand the dependence structure among bacterial taxa in the human gut.

Journal Article

Share this book

Add to My Shelf

Condition-number-regularized covariance estimation

by Won, Joong-Ho , Rajaratnam, Bala , Kim, Seung-Jean in Analysis of covariance , Bayesian analysis , Condition number

2013

Estimation of high dimensional covariance matrices is known to be a difficult problem, has many applications and is of current interest to the larger statistics community. In many applications including the so-called 'large p, small n' setting, the estimate of the covariance matrix is required to be not only invertible but also well conditioned. Although many regularization schemes attempt to do this, none of them address the ill conditioning problem directly. We propose a maximum likelihood approach, with the direct goal of obtaining a well-conditioned estimator. No sparsity assumptions on either the covariance matrix or its inverse are imposed, thus making our procedure more widely applicable. We demonstrate that the proposed regularization scheme is computationally efficient, yields a type of Steinian shrinkage estimator and has a natural Bayesian interpretation. We investigate the theoretical properties of the regularized covariance estimator comprehensively, including its regularization path, and proceed to develop an approach that adaptively determines the level of regularization that is required. Finally, we demonstrate the performance of the regularized estimator in decision theoretic comparisons and in the financial portfolio optimization setting. The approach proposed has desirable properties and can serve as a competitive procedure, especially when the sample size is small and when a well-conditioned estimator is required.

Journal Article

Share this book

Add to My Shelf

Information-Based Optimal Subdata Selection for Big Data Linear Regression

by Yang, Min , Stufken, John , Wang, HaiYing in Analysis of covariance , Big Data , Computer simulation

2019

Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large datasets due to computational limitations. A critical step in big data analysis is data reduction. Existing investigations in the context of linear regression focus on subsampling-based methods. However, not only is this approach prone to sampling errors, it also leads to a covariance matrix of the estimators that is typically bounded from below by a term that is of the order of the inverse of the subdata size. We propose a novel approach, termed information-based optimal subdata selection (IBOSS). Compared to leading existing subdata methods, the IBOSS approach has the following advantages: (i) it is significantly faster; (ii) it is suitable for distributed parallel computing; (iii) the variances of the slope parameter estimators converge to 0 as the full data size increases even if the subdata size is fixed, that is, the convergence rate depends on the full data size; (iv) data analysis for IBOSS subdata is straightforward and the sampling distribution of an IBOSS estimator is easy to assess. Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude. The advantages of the new approach are also illustrated through analysis of real data. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Chesson's coexistence theory

by Barabás, György , Stump, Simon Maccracken , D'Andrea, Rafael in average fitness differences , Coexistence , community ecology

2018

We give a comprehensive review of Chesson's coexistence theory, summarizing, for the first time, all its fundamental details in one single document. Our goal is for both theoretical and empirical ecologists to be able to use the theory to interpret their findings, and to get a precise sense of the limits of its applicability. To this end, we introduce an explicit handling of limiting factors, and a new way of defining the scaling factors that partition invasion growth rates into the different mechanisms contributing to coexistence. We explain terminology such as relative nonlinearity, storage effect, and growth-density covariance, both in a formal setting and through their biological interpretation. We review the theory's applications and contributions to our current understanding of species coexistence. While the theory is very general, it is not well suited to all problems, so we carefully point out its limitations. Finally, we critique the paradigm of decomposing invasion growth rates into stabilizing and equalizing components: we argue that these concepts are useful when used judiciously, but have often been employed in an overly simplified way to justify false claims.

Journal Article

Share this book

Add to My Shelf

Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets

by Nychka, Douglas W. , Schervish, Mark J. , Kaufman, Cari G. in Algorithms , Analysis of covariance , Anomalies

2008

Maximum likelihood is an attractive method of estimating covariance parameters in spatial models based on Gaussian processes. But calculating the likelihood can be computationally infeasible for large data sets, requiring O(n 3 ) calculations for a data set with n observations. This article proposes the method of covariance tapering to approximate the likelihood in this setting. In this approach, covariance matrixes are \"tapered,\" or multiplied element wise by a sparse correlation matrix. The resulting matrixes can then be manipulated using efficient sparse matrix algorithms. We propose two approximations to the Gaussian likelihood using tapering. One of these approximations simply replaces the model covariance with a tapered version, whereas the other is motivated by the theory of unbiased estimating equations. Focusing on the particular case of the Matérn class of covariance functions, we give conditions under which estimators maximizing the tapering approximations are, like the maximum likelihood estimator, strongly consistent. Moreover, we show in a simulation study that the tapering estimators can have sampling densities quite similar to that of the maximum likelihood estimator, even when the degree of tapering is severe. We illustrate the accuracy and computational gains of the tapering methods in an analysis of yearly total precipitation anomalies at weather stations in the United States.

Journal Article

Share this book

Add to My Shelf

explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach

by Lindström, Johan , Lindgren, Finn , Rue, Håvard in Algorithms , Analysis of covariance , Approximate Bayesian inference

2011

Continuously indexed Gaussian fields (GFs) are the most important ingredient in spatial statistical modelling and geostatistics. The specification through the covariance function gives an intuitive interpretation of the field properties. On the computational side, GFs are hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the dimension. Although computational power today is at an all time high, this fact seems still to be a computational bottleneck in many applications. Along with GFs, there is the class of Gaussian Markov random fields (GMRFs) which are discretely indexed. The Markov property makes the precision matrix involved sparse, which enables the use of numerical algorithms for sparse matrices, that for fields in only use the square root of the time required by general algorithms. The specification of a GMRF is through its full conditional distributions but its marginal properties are not transparent in such a parameterization. We show that, using an approximate stochastic weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the Matérn class, provide an explicit link, for any triangulation of , between GFs and GMRFs, formulated as a basis function representation. The consequence is that we can take the best from the two worlds and do the modelling by using GFs but do the computations by using GMRFs. Perhaps more importantly, our approach generalizes to other covariance functions generated by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illustrate our approach by analysing global temperature data with a non-stationary model defined on a sphere.

Journal Article

Share this book

Add to My Shelf

Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization

by Athanasopoulos, George , Wickramasuriya, Shanika L. , Hyndman, Rob J. in Aggregation , Algorithms , Analysis of covariance

2019

Large collections of time series often have aggregation constraints due to product or geographical groupings. The forecasts for the most disaggregated series are usually required to add-up exactly to the forecasts of the aggregated series, a constraint we refer to as \"coherence.\" Forecast reconciliation is the process of adjusting forecasts to make them coherent. The reconciliation algorithm proposed by Hyndman et al. ( 2011 ) is based on a generalized least squares estimator that requires an estimate of the covariance matrix of the coherency errors (i.e., the errors that arise due to incoherence). We show that this matrix is impossible to estimate in practice due to identifiability conditions. We propose a new forecast reconciliation approach that incorporates the information from a full covariance matrix of forecast errors in obtaining a set of coherent forecasts. Our approach minimizes the mean squared error of the coherent forecasts across the entire collection of time series under the assumption of unbiasedness. The minimization problem has a closed-form solution. We make this solution scalable by providing a computationally efficient representation. We evaluate the performance of the proposed method compared to alternative methods using a series of simulation designs which take into account various features of the collected time series. This is followed by an empirical application using Australian domestic tourism data. The results indicate that the proposed method works well with artificial and real data. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees

by Khare, Kshitij , Rajaratnam, Bala , Oh, Sang‐Yun in Analysis , Analysis of covariance , Breast cancer

2015

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l₁‐penalties to either parametric likelihoods, or regularized regression/pseudolikelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudolikelihood‐based objective functions have provable convergence guarantees, it is not clear whether corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. We propose a new pseudolikelihood‐based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a co‐ordinatewise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established by using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well defined under very general conditions and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated and real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudolikelihood methods as special cases of a more general formulation, leading to important insights.

Journal Article

Share this book

Add to My Shelf

Covariance Regression Analysis

by Lan, Wei , Tsai, Chih-Ling , Wang, Hansheng in Analysis of covariance , Computer simulation , covariance

2017

This article introduces covariance regression analysis for a p-dimensional response vector. The proposed method explores the regression relationship between the p-dimensional covariance matrix and auxiliary information. We study three types of estimators: maximum likelihood, ordinary least squares, and feasible generalized least squares estimators. Then, we demonstrate that these regression estimators are consistent and asymptotically normal. Furthermore, we obtain the high dimensional and large sample properties of the corresponding covariance matrix estimators. Simulation experiments are presented to demonstrate the performance of both regression and covariance matrix estimates. An example is analyzed from the Chinese stock market to illustrate the usefulness of the proposed covariance regression model. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter