Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
45,401
result(s) for
"Covariance"
Sort by:
Large covariance estimation by thresholding principal orthogonal complements
by
Fan, Jianqing
,
Mincheva, Martina
,
Liao, Yuan
in
Analysis of covariance
,
Approximate factor model
,
Approximation
2013
The paper deals with the estimation of a high dimensional covariance with a conditional sparsity structure and fast diverging eigenvalues. By assuming a sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the principal orthogonal complement thresholding method 'POET' to explore such an approximate factor structure with sparsity. The POET-estimator includes the sample covariance matrix, the factor-based covariance matrix, the thresholding estimator and the adaptive thresholding estimator as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
Journal Article
Large Covariance Estimation for Compositional Data Via Composition-Adjusted Thresholding
by
Li, Hongzhe
,
Cao, Yuanpei
,
Lin, Wei
in
Adaptive thresholding
,
Analysis of covariance
,
Bacteria
2019
High-dimensional compositional data arise naturally in many applications such as metagenomic data analysis. The observed data lie in a high-dimensional simplex, and conventional statistical methods often fail to produce sensible results due to the unit-sum constraint. In this article, we address the problem of covariance estimation for high-dimensional compositional data and introduce a composition-adjusted thresholding (COAT) method under the assumption that the basis covariance matrix is sparse. Our method is based on a decomposition relating the compositional covariance to the basis covariance, which is approximately identifiable as the dimensionality tends to infinity. The resulting procedure can be viewed as thresholding the sample centered log-ratio covariance matrix and hence is scalable for large covariance matrices. We rigorously characterize the identifiability of the covariance parameters, derive rates of convergence under the spectral norm, and provide theoretical guarantees on support recovery. Simulation studies demonstrate that the COAT estimator outperforms some existing optimization-based estimators. We apply the proposed method to the analysis of a microbiome dataset to understand the dependence structure among bacterial taxa in the human gut.
Journal Article
Condition-number-regularized covariance estimation
by
Won, Joong-Ho
,
Rajaratnam, Bala
,
Kim, Seung-Jean
in
Analysis of covariance
,
Bayesian analysis
,
Condition number
2013
Estimation of high dimensional covariance matrices is known to be a difficult problem, has many applications and is of current interest to the larger statistics community. In many applications including the so-called 'large p, small n' setting, the estimate of the covariance matrix is required to be not only invertible but also well conditioned. Although many regularization schemes attempt to do this, none of them address the ill conditioning problem directly. We propose a maximum likelihood approach, with the direct goal of obtaining a well-conditioned estimator. No sparsity assumptions on either the covariance matrix or its inverse are imposed, thus making our procedure more widely applicable. We demonstrate that the proposed regularization scheme is computationally efficient, yields a type of Steinian shrinkage estimator and has a natural Bayesian interpretation. We investigate the theoretical properties of the regularized covariance estimator comprehensively, including its regularization path, and proceed to develop an approach that adaptively determines the level of regularization that is required. Finally, we demonstrate the performance of the regularized estimator in decision theoretic comparisons and in the financial portfolio optimization setting. The approach proposed has desirable properties and can serve as a competitive procedure, especially when the sample size is small and when a well-conditioned estimator is required.
Journal Article
Information-Based Optimal Subdata Selection for Big Data Linear Regression
by
Yang, Min
,
Stufken, John
,
Wang, HaiYing
in
Analysis of covariance
,
Big Data
,
Computer simulation
2019
Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large datasets due to computational limitations. A critical step in big data analysis is data reduction. Existing investigations in the context of linear regression focus on subsampling-based methods. However, not only is this approach prone to sampling errors, it also leads to a covariance matrix of the estimators that is typically bounded from below by a term that is of the order of the inverse of the subdata size. We propose a novel approach, termed information-based optimal subdata selection (IBOSS). Compared to leading existing subdata methods, the IBOSS approach has the following advantages: (i) it is significantly faster; (ii) it is suitable for distributed parallel computing; (iii) the variances of the slope parameter estimators converge to 0 as the full data size increases even if the subdata size is fixed, that is, the convergence rate depends on the full data size; (iv) data analysis for IBOSS subdata is straightforward and the sampling distribution of an IBOSS estimator is easy to assess. Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude. The advantages of the new approach are also illustrated through analysis of real data. Supplementary materials for this article are available online.
Journal Article
Chesson's coexistence theory
by
Barabás, György
,
Stump, Simon Maccracken
,
D'Andrea, Rafael
in
average fitness differences
,
Coexistence
,
community ecology
2018
We give a comprehensive review of Chesson's coexistence theory, summarizing, for the first time, all its fundamental details in one single document. Our goal is for both theoretical and empirical ecologists to be able to use the theory to interpret their findings, and to get a precise sense of the limits of its applicability. To this end, we introduce an explicit handling of limiting factors, and a new way of defining the scaling factors that partition invasion growth rates into the different mechanisms contributing to coexistence. We explain terminology such as relative nonlinearity, storage effect, and growth-density covariance, both in a formal setting and through their biological interpretation. We review the theory's applications and contributions to our current understanding of species coexistence. While the theory is very general, it is not well suited to all problems, so we carefully point out its limitations. Finally, we critique the paradigm of decomposing invasion growth rates into stabilizing and equalizing components: we argue that these concepts are useful when used judiciously, but have often been employed in an overly simplified way to justify false claims.
Journal Article
Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets
by
Nychka, Douglas W.
,
Schervish, Mark J.
,
Kaufman, Cari G.
in
Algorithms
,
Analysis of covariance
,
Anomalies
2008
Maximum likelihood is an attractive method of estimating covariance parameters in spatial models based on Gaussian processes. But calculating the likelihood can be computationally infeasible for large data sets, requiring O(n
3
) calculations for a data set with n observations. This article proposes the method of covariance tapering to approximate the likelihood in this setting. In this approach, covariance matrixes are \"tapered,\" or multiplied element wise by a sparse correlation matrix. The resulting matrixes can then be manipulated using efficient sparse matrix algorithms. We propose two approximations to the Gaussian likelihood using tapering. One of these approximations simply replaces the model covariance with a tapered version, whereas the other is motivated by the theory of unbiased estimating equations. Focusing on the particular case of the Matérn class of covariance functions, we give conditions under which estimators maximizing the tapering approximations are, like the maximum likelihood estimator, strongly consistent. Moreover, we show in a simulation study that the tapering estimators can have sampling densities quite similar to that of the maximum likelihood estimator, even when the degree of tapering is severe. We illustrate the accuracy and computational gains of the tapering methods in an analysis of yearly total precipitation anomalies at weather stations in the United States.
Journal Article
explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach
by
Lindström, Johan
,
Lindgren, Finn
,
Rue, Håvard
in
Algorithms
,
Analysis of covariance
,
Approximate Bayesian inference
2011
Continuously indexed Gaussian fields (GFs) are the most important ingredient in spatial statistical modelling and geostatistics. The specification through the covariance function gives an intuitive interpretation of the field properties. On the computational side, GFs are hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the dimension. Although computational power today is at an all time high, this fact seems still to be a computational bottleneck in many applications. Along with GFs, there is the class of Gaussian Markov random fields (GMRFs) which are discretely indexed. The Markov property makes the precision matrix involved sparse, which enables the use of numerical algorithms for sparse matrices, that for fields in only use the square root of the time required by general algorithms. The specification of a GMRF is through its full conditional distributions but its marginal properties are not transparent in such a parameterization. We show that, using an approximate stochastic weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the Matérn class, provide an explicit link, for any triangulation of , between GFs and GMRFs, formulated as a basis function representation. The consequence is that we can take the best from the two worlds and do the modelling by using GFs but do the computations by using GMRFs. Perhaps more importantly, our approach generalizes to other covariance functions generated by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illustrate our approach by analysing global temperature data with a non-stationary model defined on a sphere.
Journal Article
Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization
by
Athanasopoulos, George
,
Wickramasuriya, Shanika L.
,
Hyndman, Rob J.
in
Aggregation
,
Algorithms
,
Analysis of covariance
2019
Large collections of time series often have aggregation constraints due to product or geographical groupings. The forecasts for the most disaggregated series are usually required to add-up exactly to the forecasts of the aggregated series, a constraint we refer to as \"coherence.\" Forecast reconciliation is the process of adjusting forecasts to make them coherent.
The reconciliation algorithm proposed by Hyndman et al. (
2011
) is based on a generalized least squares estimator that requires an estimate of the covariance matrix of the coherency errors (i.e., the errors that arise due to incoherence). We show that this matrix is impossible to estimate in practice due to identifiability conditions.
We propose a new forecast reconciliation approach that incorporates the information from a full covariance matrix of forecast errors in obtaining a set of coherent forecasts. Our approach minimizes the mean squared error of the coherent forecasts across the entire collection of time series under the assumption of unbiasedness. The minimization problem has a closed-form solution. We make this solution scalable by providing a computationally efficient representation.
We evaluate the performance of the proposed method compared to alternative methods using a series of simulation designs which take into account various features of the collected time series. This is followed by an empirical application using Australian domestic tourism data. The results indicate that the proposed method works well with artificial and real data. Supplementary materials for this article are available online.
Journal Article
convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees
by
Khare, Kshitij
,
Rajaratnam, Bala
,
Oh, Sang‐Yun
in
Analysis
,
Analysis of covariance
,
Breast cancer
2015
Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l₁‐penalties to either parametric likelihoods, or regularized regression/pseudolikelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudolikelihood‐based objective functions have provable convergence guarantees, it is not clear whether corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. We propose a new pseudolikelihood‐based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a co‐ordinatewise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established by using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well defined under very general conditions and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated and real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudolikelihood methods as special cases of a more general formulation, leading to important insights.
Journal Article
Covariance Regression Analysis
by
Lan, Wei
,
Tsai, Chih-Ling
,
Wang, Hansheng
in
Analysis of covariance
,
Computer simulation
,
covariance
2017
This article introduces covariance regression analysis for a p-dimensional response vector. The proposed method explores the regression relationship between the p-dimensional covariance matrix and auxiliary information. We study three types of estimators: maximum likelihood, ordinary least squares, and feasible generalized least squares estimators. Then, we demonstrate that these regression estimators are consistent and asymptotically normal. Furthermore, we obtain the high dimensional and large sample properties of the corresponding covariance matrix estimators. Simulation experiments are presented to demonstrate the performance of both regression and covariance matrix estimates. An example is analyzed from the Chinese stock market to illustrate the usefulness of the proposed covariance regression model. Supplementary materials for this article are available online.
Journal Article