Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
18,347
result(s) for
"Covariance matrix"
Sort by:
LIMITING LAWS FOR DIVERGENT SPIKED EIGENVALUES AND LARGEST NONSPIKED EIGENVALUE OF SAMPLE COVARIANCE MATRICES
by
Han, Xiao
,
Pan, Guangming
,
Cai, T. Tony
in
Asymptotic methods
,
Asymptotic properties
,
Constraining
2020
We study the asymptotic distributions of the spiked eigenvalues and the largest nonspiked eigenvalue of the sample covariance matrix under a general covariance model with divergent spiked eigenvalues, while the other eigenvalues are bounded but otherwise arbitrary. The limiting normal distribution for the spiked sample eigenvalues is established. It has distinct features that the asymptotic mean relies on not only the population spikes but also the nonspikes and that the asymptotic variance in general depends on the population eigenvectors. In addition, the limiting Tracy–Widom law for the largest nonspiked sample eigenvalue is obtained.
Estimation of the number of spikes and the convergence of the leading eigenvectors are also considered. The results hold even when the number of the spikes diverges. As a key technical tool, we develop a central limit theorem for a type of random quadratic forms where the random vectors and random matrices involved are dependent. This result can be of independent interest.
Journal Article
Information-Based Optimal Subdata Selection for Big Data Linear Regression
by
Yang, Min
,
Stufken, John
,
Wang, HaiYing
in
Analysis of covariance
,
Big Data
,
Computer simulation
2019
Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large datasets due to computational limitations. A critical step in big data analysis is data reduction. Existing investigations in the context of linear regression focus on subsampling-based methods. However, not only is this approach prone to sampling errors, it also leads to a covariance matrix of the estimators that is typically bounded from below by a term that is of the order of the inverse of the subdata size. We propose a novel approach, termed information-based optimal subdata selection (IBOSS). Compared to leading existing subdata methods, the IBOSS approach has the following advantages: (i) it is significantly faster; (ii) it is suitable for distributed parallel computing; (iii) the variances of the slope parameter estimators converge to 0 as the full data size increases even if the subdata size is fixed, that is, the convergence rate depends on the full data size; (iv) data analysis for IBOSS subdata is straightforward and the sampling distribution of an IBOSS estimator is easy to assess. Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude. The advantages of the new approach are also illustrated through analysis of real data. Supplementary materials for this article are available online.
Journal Article
Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization
by
Athanasopoulos, George
,
Wickramasuriya, Shanika L.
,
Hyndman, Rob J.
in
Aggregation
,
Algorithms
,
Analysis of covariance
2019
Large collections of time series often have aggregation constraints due to product or geographical groupings. The forecasts for the most disaggregated series are usually required to add-up exactly to the forecasts of the aggregated series, a constraint we refer to as \"coherence.\" Forecast reconciliation is the process of adjusting forecasts to make them coherent.
The reconciliation algorithm proposed by Hyndman et al. (
2011
) is based on a generalized least squares estimator that requires an estimate of the covariance matrix of the coherency errors (i.e., the errors that arise due to incoherence). We show that this matrix is impossible to estimate in practice due to identifiability conditions.
We propose a new forecast reconciliation approach that incorporates the information from a full covariance matrix of forecast errors in obtaining a set of coherent forecasts. Our approach minimizes the mean squared error of the coherent forecasts across the entire collection of time series under the assumption of unbiasedness. The minimization problem has a closed-form solution. We make this solution scalable by providing a computationally efficient representation.
We evaluate the performance of the proposed method compared to alternative methods using a series of simulation designs which take into account various features of the collected time series. This is followed by an empirical application using Australian domestic tourism data. The results indicate that the proposed method works well with artificial and real data. Supplementary materials for this article are available online.
Journal Article
Multivariate output analysis for Markov chain Monte Carlo
by
VATS, DOOTIKA
,
FLEGAL, JAMES M.
,
JONES, GALIN L.
in
Central limit theorem
,
Computer simulation
,
Covariance matrix
2019
Markov chain Monte Carlo produces a correlated sample which may be used for estimating expectations with respect to a target distribution. A fundamental question is: when should sampling stop so that we have good estimates of the desired quantities? The key to answering this question lies in assessing the Monte Carlo error through a multivariate Markov chain central limit theorem. The multivariate nature of this Monte Carlo error has been largely ignored in the literature. We present a multivariate framework for terminating a simulation in Markov chain Monte Carlo. We define a multivariate effective sample size, the estimation of which requires strongly consistent estimators of the covariance matrix in the Markov chain central limit theorem, a property we show for the multivariate batch means estimator. We then provide a lower bound on the number of minimum effective samples required for a desired level of precision. This lower bound does not depend on the underlying stochastic process and can be calculated a priori. This result is obtained by drawing a connection between terminating simulation via effective sample size and terminating simulation using a relative standard deviation fixed-volume sequential stopping rule, which we demonstrate is an asymptotically valid procedure. The finite-sample properties of the proposed method are demonstrated in a variety of examples.
Journal Article
GAUSSIAN APPROXIMATION FOR HIGH DIMENSIONAL TIME SERIES
2017
We consider the problem of approximating sums of high dimensional stationary time series by Gaussian vectors, using the framework of functional dependence measure. The validity of the Gaussian approximation depends on the sample size n, the dimension p, the moment condition and the dependence of the underlying processes. We also consider an estimator for long-run covariance matrices and study its convergence properties. Our results allow constructing simultaneous confidence intervals for mean vectors of high-dimensional time series with asymptotically correct coverage probabilities. As an application, we propose a Kolmogorov–Smirnov-type statistic for testing distributions of high-dimensional time series.
Journal Article
Large covariance estimation by thresholding principal orthogonal complements
by
Fan, Jianqing
,
Mincheva, Martina
,
Liao, Yuan
in
Analysis of covariance
,
Approximate factor model
,
Approximation
2013
The paper deals with the estimation of a high dimensional covariance with a conditional sparsity structure and fast diverging eigenvalues. By assuming a sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the principal orthogonal complement thresholding method 'POET' to explore such an approximate factor structure with sparsity. The POET-estimator includes the sample covariance matrix, the factor-based covariance matrix, the thresholding estimator and the adaptive thresholding estimator as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
Journal Article
SUBSTITUTION PRINCIPLE FOR CLT OF LINEAR SPECTRAL STATISTICS OF HIGH-DIMENSIONAL SAMPLE COVARIANCE MATRICES WITH APPLICATIONS TO HYPOTHESIS TESTING
2015
Sample covariance matrices are widely used in multivariate statistical analysis. The central limit theorems (CLTs) for linear spectral statistics of high-dimensional noncentralized sample covariance matrices have received considerable attention in random matrix theory and have been applied to many high-dimensional statistical problems. However, known population mean vectors are assumed for noncentralized sample covariance matrices, some of which even assume Gaussian-like moment conditions. In fact, there are still another two most frequently used sample covariance matrices: the ME (moment estimator, constructed by subtracting the sample mean vector from each sample vector) and the unbiased sample covariance matrix (by changing the denominator n as N = n — 1 in the ME) without depending on unknown population mean vectors. In this paper, we not only establish the new CLTs for noncentralized sample covariance matrices when the Gaussianlike moment conditions do not hold but also characterize the nonnegligible differences among the CLTs for the three classes of high-dimensional sample covariance matrices by establishing a substitution principle: by substituting the adjusted sample size N = n — 1 for the actual sample size n in the centering term of the new CLTs, we obtain the CLT of the unbiased sample covariance matrices. Moreover, it is found that the difference between the CLTs for the ME and unbiased sample covariance matrix is nonnegligible in the centering term although the only difference between two sample covariance matrices is a normalization by n and n — 1, respectively. The new results are applied to two testing problems for high-dimensional covariance matrices.
Journal Article
Spatial autoregressive models for statistical inference from ecological data
by
Hooten, Mevin B.
,
Peterson, Erin E.
,
Ver Hoef, Jay M.
in
Alaska
,
Autocorrelation
,
Autoregressive models
2018
Ecological data often exhibit spatial pattern, which can be modeled as autocorrelation. Conditional autoregressive (CAR) and simultaneous autoregressive (SAR) models are network-based models (also known as graphical models) specifically designed to model spatially autocorrelated data based on neighborhood relationships. We identify and discuss six different types of practical ecological inference using CAR and SAR models, including: (1) model selection, (2) spatial regression, (3) estimation of autocorrelation, (4) estimation of other connectivity parameters, (5) spatial prediction, and (6) spatial smoothing. We compare CAR and SAR models, showing their development and connection to partial correlations. Special cases, such as the intrinsic autoregressive model (IAR), are described. Conditional autoregressive and SAR models depend on weight matrices, whose practical development uses neighborhood definition and row-standardization. Weight matrices can also include ecological covariates and connectivity structures, which we emphasize, but have been rarely used. Trends in harbor seals (Phoca vitulina) in southeastern Alaska from 463 polygons, some with missing data, are used to illustrate the six inference types. We develop a variety of weight matrices and CAR and SAR spatial regression models are fit using maximum likelihood and Bayesian methods. Profile likelihood graphs illustrate inference for covariance parameters. The same data set is used for both prediction and smoothing, and the relative merits of each are discussed. We show the nonstationary variances and correlations of a CAR model and demonstrate the effect of row-standardization. We include several take-home messages for CAR and SAR models, including (1) choosing between CAR and IAR models, (2) modeling ecological effects in the covariance matrix, (3) the appeal of spatial smoothing, and (4) how to handle isolated neighbors. We highlight several reasons why ecologists will want to make use of autoregressive models, both directly and in hierarchical models, and not only in explicit spatial settings, but also for more general connectivity models.
Journal Article
Sparse Sliced Inverse Regression via Lasso
2019
For multiple index models, it has recently been shown that the sliced inverse regression (SIR) is consistent for estimating the sufficient dimension reduction (SDR) space if and only if
, where p is the dimension and n is the sample size. Thus, when p is of the same or a higher order of n, additional assumptions such as sparsity must be imposed in order to ensure consistency for SIR. By constructing artificial response variables made up from top eigenvectors of the estimated conditional covariance matrix, we introduce a simple Lasso regression method to obtain an estimate of the SDR space. The resulting algorithm, Lasso-SIR, is shown to be consistent and achieves the optimal convergence rate under certain sparsity conditions when p is of order
, where λ is the generalized signal-to-noise ratio. We also demonstrate the superior performance of Lasso-SIR compared with existing approaches via extensive numerical studies and several real data examples.
Supplementary materials
for this article are available online.
Journal Article
Large Covariance Estimation for Compositional Data Via Composition-Adjusted Thresholding
by
Li, Hongzhe
,
Cao, Yuanpei
,
Lin, Wei
in
Adaptive thresholding
,
Analysis of covariance
,
Bacteria
2019
High-dimensional compositional data arise naturally in many applications such as metagenomic data analysis. The observed data lie in a high-dimensional simplex, and conventional statistical methods often fail to produce sensible results due to the unit-sum constraint. In this article, we address the problem of covariance estimation for high-dimensional compositional data and introduce a composition-adjusted thresholding (COAT) method under the assumption that the basis covariance matrix is sparse. Our method is based on a decomposition relating the compositional covariance to the basis covariance, which is approximately identifiable as the dimensionality tends to infinity. The resulting procedure can be viewed as thresholding the sample centered log-ratio covariance matrix and hence is scalable for large covariance matrices. We rigorously characterize the identifiability of the covariance parameters, derive rates of convergence under the spectral norm, and provide theoretical guarantees on support recovery. Simulation studies demonstrate that the COAT estimator outperforms some existing optimization-based estimators. We apply the proposed method to the analysis of a microbiome dataset to understand the dependence structure among bacterial taxa in the human gut.
Journal Article