Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
2,528
result(s) for
"Contingency tables"
Sort by:
Testing for independence in arbitrary distributions
by
NEŠLEHOVÁ, J. G.
,
GENEST, C.
,
RÉMILLARD, B.
in
Asymptotic methods
,
Asymptotic properties
,
Computer simulation
2019
Statistics are proposed for testing the hypothesis that arbitrary random variables are mutually independent. The tests are consistent and well behaved for any marginal distributions; they can be used, for example, for contingency tables which are sparse or whose dimension depends on the sample size, as well as for mixed data. No regularity conditions, data jittering, or binning mechanisms are required. The statistics are rank-based functionals of Cramér–von Mises type whose asymptotic behaviour derives from the empirical multilinear copula process. Approximate p-values are computed using a wild bootstrap. The procedures are simple to implement and computationally efficient, and maintain their level well in moderate to large samples. Simulations suggest that the tests are robust with respect to the number of ties in the data, can easily detect a broad range of alternatives, and outperform existing procedures in many settings. Additional insight into their performance is provided through asymptotic local power calculations under contiguous alternatives. The procedures are illustrated on traumatic brain injury data.
Journal Article
The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do?
by
Verbeek, Albert
,
Kroonenberg, P. M.
in
2 × 2 contingency table
,
Chi-square test
,
Cochran table
2018
In an informal way, some dilemmas in connection with hypothesis testing in contingency tables are discussed. The body of the article concerns the numerical evaluation of Cochran's Rule about the minimum expected value in r × c contingency tables with fixed margins when testing independence with Pearson's X
2
statistic using the χ
2
distribution.
Journal Article
Classical tests, linear models and their extensions for the analysis of 2 × 2 contingency tables
by
Morrissey, Michael B.
,
Nagel, Rebecca
,
Ruxton, Graeme D.
in
2 × 2 contingency table
,
Binary data
,
chi‐squared test
2024
Ecologists and evolutionary biologists are regularly tasked with the comparison of binary data across groups. There is, however, some discussion in the biostatistics literature about the best methodology for the analysis of data comprising binary explanatory and response variables forming a 2 × 2 contingency table.
We assess several methodologies for the analysis of 2 × 2 contingency tables using a simulation scheme of different sample sizes with outcomes evenly or unevenly distributed between groups. Specifically, we assess the commonly recommended logistic (generalised linear model [GLM]) regression analysis, the classical Pearson chi‐squared test and four conventional alternatives (Yates' correction, Fisher's exact, exact unconditional and mid‐p), as well as the widely discouraged linear model (LM) regression.
We found that both LM and GLM analyses provided unbiased estimates of the difference in proportions between groups. LM and GLM analyses also provided accurate standard errors and confidence intervals when the experimental design was balanced. When the experimental design was unbalanced, sample size was small, and one of the two groups had a probability close to 1 or 0, LM analysis could substantially over‐ or under‐represent statistical uncertainty. For null hypothesis significance testing, the performance of the chi‐squared test and LM analysis were almost identical. Across all scenarios, both had high power to detect non‐null effects and reject false positives. By contrast, the GLM analysis was underpowered when using z‐based p‐values, in particular when one of the two groups had a probability near 1 or 0. The GLM using the LRT had better power to detect non‐null results.
Our simulation results suggest that, wherever a chi‐squared test would be recommended, a linear regression is a suitable alternative for the analysis of 2 × 2 contingency table data. When researchers opt for more sophisticated procedures, we provide R functions to calculate the standard error of a difference between two probabilities from a Bernoulli GLM output using the delta method. We also explore approaches to compliment GLM analysis of 2 × 2 contingency tables with credible intervals on the probability scale. These additional operations should support researchers to make valid assessments of both statistical and practical significances.
Journal Article
On testing the equality between interquartile ranges
2024
The interquartile range is a statistical measure well suited to describe the variability of the data at hand, both at the population level and for sample data. The interquartile range is particularly useful when the distribution of the data is asymmetric or irregularly shaped. Here, the use of the interquartile range is investigated when the main aim is to compare the variability of two distributions using two independent random samples, without the need to make any distributional assumptions. Several techniques are compared through numerical studies and real data examples, with a particular attention given to the use of sample quantiles based on the Harrel-Davis estimator or the quantile regression.
Journal Article
Weighted cumulative correspondence analysis based on a particular cumulative power divergence family
by
Della Ragione, Livia
,
D’Ambra, Antonello
,
Meccariello, Giovanni
in
Chi-square test
,
Contingency
,
Contingency tables
2024
The Pearson’s X2 statistic and the likelihood ratio statistic G2 are most frequently used for testing independence or homogeneity, in two-way contingency table. These indexes are members of a continuous family of Power Divergence (PD) statistics, but they perform badly in studying the association between ordinal categorical variables. Taguchi’s and Nair’s statistics have been introduced in the literature as simple alternatives to Pearson’s index for contingency tables with ordered categorical variables. It’s possible to show, using a parameter, how to link Taguchi’s and Nair’s statistics obtaining a new class called Weighted Cumulative Chi-Squared (WCCS-type tests). Therefore, the main aim of this paper is to introduce a new divergence family based on cumulative frequencies called Weighted Cumulative Power Divergence. Moreover, an extension of Cumulative Correspondence Analysis based on WCCS and further properties are shown.
Journal Article
Identification of Driver Epistatic Gene Pairs Combining Germline and Somatic Mutations in Cancer
by
Heine-Suñer, Damià
,
Hernandez-Rodriguez, Jessica
,
Capriotti, Emidio
in
Algorithms
,
Cancer
,
Colon cancer
2023
Cancer arises from the complex interplay of various factors. Traditionally, the identification of driver genes focuses primarily on the analysis of somatic mutations. We describe a new method for the detection of driver gene pairs based on an epistasis analysis that considers both germline and somatic variations. Specifically, the identification of significantly mutated gene pairs entails the calculation of a contingency table, wherein one of the co-mutated genes can exhibit a germline variant. By adopting this approach, it is possible to select gene pairs in which the individual genes do not exhibit significant associations with cancer. Finally, a survival analysis is used to select clinically relevant gene pairs. To test the efficacy of the new algorithm, we analyzed the colon adenocarcinoma (COAD) and lung adenocarcinoma (LUAD) samples available at The Cancer Genome Atlas (TCGA). In the analysis of the COAD and LUAD samples, we identify epistatic gene pairs significantly mutated in tumor tissue with respect to normal tissue. We believe that further analysis of the gene pairs detected by our method will unveil new biological insights, enhancing a better description of the cancer mechanism.
Journal Article
Asymptotic Properties of Random Contingency Tables with Uniform Margin
2023
Let
C
≥
2
be a positive integer. Consider the set of
n
×
n
non-negative integer matrices whose row sums and column sums are all equal to
Cn
and let
X
=
(
X
ij
)
1
≤
i
,
j
≤
n
be uniformly distributed on this set. This
X
is called the random contingency table with uniform margin. In this paper, we study various asymptotic properties of
X
=
(
X
ij
)
1
≤
i
,
j
≤
n
as
n
→
∞
.
Journal Article
BET on Independence
2019
We study the problem of nonparametric dependence detection. Many existing methods may suffer severe power loss due to nonuniform consistency, which we illustrate with a paradox. To avoid such power loss, we approach the nonparametric test of independence through the new framework of binary expansion statistics (BEStat) and binary expansion testing (BET), which examine dependence through a novel binary expansion filtration approximation of the copula. Through a Hadamard transform, we find that the symmetry statistics in the filtration are complete sufficient statistics for dependence. These statistics are also uncorrelated under the null. By using symmetry statistics, the BET avoids the problem of nonuniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for reliable power and (b) by providing clear interpretations of global relationships upon rejection of independence. The binary expansion approach also connects the symmetry statistics with the current computing system to facilitate efficient bitwise implementation. We illustrate the BET with a study of the distribution of stars in the night sky and with an exploratory data analysis of the TCGA breast cancer data.
Supplementary materials
for this article are available online.
Journal Article
An extension of correspondence analysis based on the multiple Taguchi’s index to evaluate the relationships between three categorical variables graphically: an application to the Italian football championship
2023
The aim of this paper is to evaluate the relationships between three categorical variables, of which at least one is ordinal, from a graphical point of view and using also inferential tools. Three way Correspondence Analysis is a useful data science visualisation technique to find and display these relationships. This analysis, like the classical two-way analysis, cannot be applied in an efficient way in the presence of ordinal categorical variables because this characteristic is not taken directly into account by the Pearson’s chi-square contingency coefficient. Taguchi (Statistical analysis, Maruzen, Tokyo, 1966, Igaku 29:806–813, 1974) introduced a statistic that considers the ordinal nature of a categorical variable using the cumulative frequency of the cells of the contingency table across this variable. He introduced it as a simple alternative to Pearson’s statistic for ordered contingency tables. This index is also at the base of several Correspondence Analysis extensions that have been proposed in the literature. We have developed a multiple extension of Taguchi’s index. An enhancement of Correspondence Analysis has also been developed based on decomposition of this index. An orthogonal decomposition of this new index has been introduced to test the statistical significance of each aggregated column category. Moreover, a confidence region for each row and aggregated column category of the table has been developed. An application has been developed to highlight the easy applicability and graphical reading of the results of our approach. In this study, we evaluate the relationships between the ranking of the Italian football “Serie A” championship of the last 10 seasons and a set of two factors defined by average percentage of ball possession and number of tags for each team. This new approach may represent a useful guide for researchers who graphically analyse ranking data.
Journal Article