Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
LanguageLanguage
-
SubjectSubject
-
Item TypeItem Type
-
DisciplineDiscipline
-
YearFrom:-To:
-
More FiltersMore FiltersIs Peer Reviewed
Done
Filters
Reset
566
result(s) for
"U-statistics"
Sort by:
Testing mutual independence in high dimension via distance covariance
by
Zhang, Xianyang
,
Yao, Shun
,
Shao, Xiaofeng
in
Analysis of covariance
,
Banded dependence
,
Correlation
2018
We introduce an 𝓛₂-type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed on the basis of the pairwise distance covariance and it accounts for the non-linear and non-monotone dependences among the data, which cannot be fully captured by the existing tests based on either Pearson correlation or rank correlation. Our test can be conveniently implemented in practice as the limiting null distribution of the test statistic is shown to be standard normal. It exhibits excellent finite sample performance in our simulation studies even when the sample size is small albeit the dimension is high and is shown to identify non-linear dependence in empirical data analysis successfully. On the theory side, asymptotic normality of our test statistic is shown under quite mild moment assumptions and with little restriction on the growth rate of the dimension as a function of sample size. As a demonstration of good power properties for our distance-covariance-based test, we further show that an infeasible version of our test statistic has the rate optimality in the class of Gaussian distributions with equal correlation.
Journal Article
BOOTSTRAP WITH CLUSTER-DEPENDENCE IN TWO OR MORE DIMENSIONS
2021
We propose a bootstrap procedure for data that may exhibit cluster-dependence in two or more dimensions. The asymptotic distribution of the sample mean or other statistics may be non-Gaussian if observations are dependent but uncorrelated within clusters. We show that there exists no procedure for estimating the limiting distribution of the sample mean under two-way clustering that achieves uniform consistency. However, we propose bootstrap procedures that achieve adaptivity with respect to different uniformity criteria. Important cases and extensions discussed in the paper include regression inference, U- and V-statistics, subgraph counts for network data, and non-exhaustive samples of matched data.
Journal Article
HIGH-DIMENSIONAL CONSISTENT INDEPENDENCE TESTING WITH MAXIMA OF RANK CORRELATIONS
2020
Testing mutual independence for high-dimensional observations is a fundamental statistical challenge. Popular tests based on linear and simple rank correlations are known to be incapable of detecting nonlinear, nonmonotone relationships, calling for methods that can account for such dependences. To address this challenge, we propose a family of tests that are constructed using maxima of pairwise rank correlations that permit consistent assessment of pairwise independence. Built upon a newly developed Cramér-type moderate deviation theorem for degenerate U-statistics, our results cover a variety of rank correlations including Hoeffding’s D, Blum–Kiefer–Rosenblatt’s R and Bergsma–Dassios–Yanagimoto’s τ∗. The proposed tests are distribution-free in the class of multivariate distributions with continuous margins, implementable without the need for permutation, and are shown to be rate-optimal against sparse alternatives under the Gaussian copula model. As a by-product of the study, we reveal an identity between the aforementioned three rank correlation statistics, and hence make a step towards proving a conjecture of Bergsma and Dassios.
Journal Article
Tests for high dimensional generalized linear models
2016
We consider testing regression coefficients in high dimensional generalized linear models. By modifying the test statistic of Goeman and his colleagues for large but fixed dimensional settings, we propose a new test, based on an asymptotic analysis, that is applicable for diverging dimensions and is robust to accommodate a wide range of link functions. The power properties of the tests are evaluated asymptotically under two families of alternative hypotheses. In addition, a test in the presence of nuisance parameters is also proposed. The tests can provide p-values for testing significance of multiple gene sets, whose application is demonstrated in a case-study on lung cancer.
Journal Article
DISTANCE-BASED AND RKHS-BASED DEPENDENCE METRICS IN HIGH DIMENSION
2020
In this paper, we study distance covariance, Hilbert–Schmidt covariance (aka Hilbert–Schmidt independence criterion [In Advances in Neural Information Processing Systems (2008) 585–592]) and related independence tests under the high dimensional scenario. We show that the sample distance/Hilbert–Schmidt covariance between two random vectors can be approximated by the sum of squared componentwise sample cross-covariances up to an asymptotically constant factor, which indicates that the standard distance/Hilbert–Schmidt covariance based test can only capture linear dependence in high dimension. Under the assumption that the components within each high dimensional vector are weakly dependent, the distance correlation based t test developed by Székely and Rizzo (J. Multivariate Anal. 117 (2013) 193–213) for independence is shown to have trivial limiting power when the two random vectors are nonlinearly dependent but component-wisely uncorrelated. This new and surprising phenomenon, which seems to be discovered and carefully studied for the first time, is further confirmed in our simulation study. As a remedy, we propose tests based on an aggregation of marginal sample distance/Hilbert–Schmidt covariances and show their superior power behavior against their joint counterparts in simulations. We further extend the distance correlation based t test to those based on Hilbert–Schmidt covariance and marginal distance/Hilbert–Schmidt covariance. A novel unified approach is developed to analyze the studentized sample distance/Hilbert–Schmidt covariance as well as the studentized sample marginal distance covariance under both null and alternative hypothesis. Our theoretical and simulation results shed light on the limitation of distance/Hilbert–Schmidt covariance when used jointly in the high dimensional setting and suggest the aggregation of marginal distance/Hilbert–Schmidt covariance as a useful alternative.
Journal Article
Pearson's Chi-square Test and Rank Correlation Inferences for Clustered Data
2017
Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration.
Journal Article
OPTIMAL RATES FOR INDEPENDENCE TESTING VIA U-STATISTIC PERMUTATION TESTS
by
Kontoyiannis, Ioannis
,
Berrett, Thomas B.
,
Samworth, Richard J.
in
Fourier transforms
,
Independence
,
Minimax technique
2021
We study the problem of independence testing given independent and identically distributed pairs taking values in a σ-finite, separable measure space. Defining a natural measure of dependence D(f) as the squared L²-distance between a joint density f and the product of its marginals, we first show that there is no valid test of independence that is uniformly consistent against alternatives of the form {f : D(f) ≥ ρ²}. We therefore restrict attention to alternatives that impose additional Sobolev-type smoothness constraints, and define a permutation test based on a basis expansion and a U-statistic estimator of D(f) that we prove is minimax optimal in terms of its separation rates in many instances. Finally, for the case of a Fourier basis on [0, 1]², we provide an approximation to the power function that offers several additional insights. Our methodology is implemented in the R package USP.
Journal Article
Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data
2014
Change point analysis has applications in a wide variety of fields. The general problem concerns the inference of a change in distribution for a set of time-ordered observations. Sequential detection is an online version in which new data are continually arriving and are analyzed adaptively. We are concerned with the related, but distinct, offline version, in which retrospective analysis of an entire sequence is performed. For a set of multivariate observations of arbitrary dimension, we consider nonparametric estimation of both the number of change points and the positions at which they occur. We do not make any assumptions regarding the nature of the change in distribution or any distribution assumptions beyond the existence of the αth absolute moment, for some α ∈ (0, 2). Estimation is based on hierarchical clustering and we propose both divisive and agglomerative algorithms. The divisive method is shown to provide consistent estimates of both the number and the location of change points under standard regularity assumptions. We compare the proposed approach with competing methods in a simulation study. Methods from cluster analysis are applied to assess performance and to allow simple comparisons of location estimates, even when the estimated number differs. We conclude with applications in genetics, finance, and spatio-temporal analysis. Supplementary materials for this article are available online.
Journal Article
ASYMPTOTICALLY INDEPENDENT U-STATISTICS IN HIGH-DIMENSIONAL TESTING
by
Xu, Gongjun
,
Wu, Chong
,
Pan, Wei
in
Asymptotic methods
,
Asymptotic properties
,
Bootstrap method
2021
Many high-dimensional hypothesis tests aim to globally examine marginal or low-dimensional features of a high-dimensional joint distribution, such as testing of mean vectors, covariance matrices and regression coefficients. This paper constructs a family of U-statistics as unbiased estimators of the ℓp
-norms of those features.We show that under the null hypothesis, the U-statistics of different finite orders are asymptotically independent and normally distributed. Moreover, they are also asymptotically independent with the maximum-type test statistic, whose limiting distribution is an extreme value distribution. Based on the asymptotic independence property, we propose an adaptive testing procedure which combines p-values computed from the U-statistics of different orders.We further establish power analysis results and show that the proposed adaptive procedure maintains high power against various alternatives.
Journal Article
Applications of distance correlation to time series
2018
The use of empirical characteristic functions for inference problems, including estimation in some special parametric settings and testing for goodness of fit, has a long history dating back to the 70s. More recently, there has been renewed interest in using empirical characteristic functions in other inference settings. The distance covariance and correlation, developed by Székely et al. (Ann. Statist. 35 (2007) 2769–2794) and Székely and Rizzo (Ann. Appl. Stat. 3 (2009) 1236–1265) for measuring dependence and testing independence between two random vectors, are perhaps the best known illustrations of this. We apply these ideas to stationary univariate and multivariate time series to measure lagged auto- and cross-dependence in a time series. Assuming strong mixing, we establish the relevant asymptotic theory for the sample auto- and cross-distance correlation functions. We also apply the auto-distance correlation function (ADCF) to the residuals of an autoregressive processes as a test of goodness of fit. Under the null that an autoregressive model is true, the limit distribution of the empirical ADCF can differ markedly from the corresponding one based on an i.i.d. sequence. We illustrate the use of the empirical auto- and cross-distance correlation functions for testing dependence and cross-dependence of time series in a variety of contexts.
Journal Article