Catalogue Search | MBRL

Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test

by Delacre, Marie , Leys, Christophe , Lakens, Daniël in Equality , Kurtosis , Normal distribution

2017

When comparing two independent groups, psychology researchers commonly use Student’s t-tests. Assumptions of normality and homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student’s t-test can be severely biased and lead to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research, and choosing between Student’s t-test and Welch’s t-test based on the outcomes of a test of the equality of variances often fails to provide an appropriate answer. We show that the Welch’s t-test provides a better control of Type 1 error rates when the assumption of homogeneity of variance is not met, and it loses little robustness compared to Student’s t-test when the assumptions are met. We argue that Welch’s t-test should be used as a default strategy.Publisher’s Note: A correction article relating to this paper has been published and can be found at https://www.rips-irsp.com/articles/10.5334/irsp.661/.

Journal Article

Share this book

Add to My Shelf

Application of student's t-test, analysis of variance, and covariance

by Pandey, Gaurav , Mishra, Priyadarshni , Mishra, Prabhaker in Analysis of covariance , Analysis of variance , Body mass index

2019

Student's t test (t test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA) are statistical methods used in the testing of hypothesis for comparison of means between the groups. The Student's t test is used to compare the means between two groups, whereas ANOVA is used to compare the means among three or more groups. In ANOVA, first gets a common P value. A significant P value of the ANOVA test indicates for at least one pair, between which the mean difference was statistically significant. To identify that significant pair(s), we use multiple comparisons. In ANOVA, when using one categorical independent variable, it is called one-way ANOVA, whereas for two categorical independent variables, it is called two-way ANOVA. When using at least one covariate to adjust with dependent variable, ANOVA becomes ANCOVA. When the size of the sample is small, mean is very much affected by the outliers, so it is necessary to keep sufficient sample size while using these methods.

Journal Article

Share this book

Add to My Shelf

Inference Under Covariate-Adaptive Randomization

by Bugni, Federico A. , Shaikh, Azeem M. , Canay, Ivan A. in Adjustment , Assignment , Averages

2018

This article studies inference for the average treatment effect in randomized controlled trials with covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve \"balance\" within each stratum. Our main requirement is that the randomization scheme assigns treatment status within each stratum so that the fraction of units being assigned to treatment within each stratum has a well behaved distribution centered around a proportion π as the sample size tends to infinity. Such schemes include, for example, Efron's biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a prespecified value in such settings, we first show the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the usual standard error of the two-sample t-test leads to a test that is exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. Next, we consider the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata. We show that this test is exact for the important special case of randomization schemes with , but is otherwise conservative. We again provide a simple adjustment to the standard errors that yields an exact test more generally. Finally, we study the behavior of a modified version of a permutation test, which we refer to as the covariate-adaptive permutation test, that only permutes treatment status for units within the same stratum. When applied to the usual two-sample t-statistic, we show that this test is exact for randomization schemes with and that additionally achieve what we refer to as \"strong balance.\" For randomization schemes with , this test may have limiting rejection probability under the null hypothesis strictly greater than the nominal level. When applied to a suitably adjusted version of the two-sample t-statistic, however, we show that this test is exact for all randomization schemes that achieve \"strong balance,\" including those with . A simulation study confirms the practical relevance of our theoretical results. We conclude with recommendations for empirical practice and an empirical illustration. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests

by Lee, Michael D. , Iverson, Geoffrey J. , Matzke, Dora in Anecdotal research , Bayesian analysis , Bayesian Statistics

2011

Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdotal. Second, effect sizes can provide additional evidence to p values and default Bayes factors. The authors conclude that the Bayesian approach is comparatively prudent, preventing researchers from overestimating the evidence in favor of an effect.

Journal Article

Share this book

Add to My Shelf

Analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research

by Kelter, Riko in Bayes Theorem , Bayesian analysis , Bayesian biostatistics

2020

Background The replication crisis hit the medical sciences about a decade ago, but today still most of the flaws inherent in null hypothesis significance testing (NHST) have not been solved. While the drawbacks of p -values have been detailed in endless venues, for clinical research, only a few attractive alternatives have been proposed to replace p -values and NHST. Bayesian methods are one of them, and they are gaining increasing attention in medical research, as some of their advantages include the description of model parameters in terms of probability, as well as the incorporation of prior information in contrast to the frequentist framework. While Bayesian methods are not the only remedy to the situation, there is an increasing agreement that they are an essential way to avoid common misconceptions and false interpretation of study results. The requirements necessary for applying Bayesian statistics have transitioned from detailed programming knowledge into simple point-and-click programs like JASP. Still, the multitude of Bayesian significance and effect measures which contrast the gold standard of significance in medical research, the p -value, causes a lack of agreement on which measure to report. Methods Therefore, in this paper, we conduct an extensive simulation study to compare common Bayesian significance and effect measures which can be obtained from a posterior distribution. In it, we analyse the behaviour of these measures for one of the most important statistical procedures in medical research and in particular clinical trials, the two-sample Student’s (and Welch’s) t-test. Results The results show that some measures cannot state evidence for both the null and the alternative. While the different indices behave similarly regarding increasing sample size and noise, the prior modelling influences the obtained results and extreme priors allow for cherry-picking similar to p-hacking in the frequentist paradigm. The indices behave quite differently regarding their ability to control the type I error rates and regarding their ability to detect an existing effect. Conclusion Based on the results, two of the commonly used indices can be recommended for more widespread use in clinical and biomedical research, as they improve the type I error control compared to the classic two-sample t-test and enjoy multiple other desirable properties.

Journal Article

Share this book

Add to My Shelf

Simulation data for the analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research

by Kelter, Riko in Bayes Theorem , Bayesian analysis , Bayesian Biostatistics

2020

Objectives The data presented herein represents the simulated datasets of a recently conducted larger study which investigated the behaviour of Bayesian indices of significance and effect size as alternatives to traditional p-values. The study considered the setting of Student’s and Welch’s two-sample t-test often used in medical research. It investigated the influence of the sample size, noise, the selected prior hyperparameters and the sensitivity to type I errors. The posterior indices used included the Bayes factor, the region of practical equivalence, the probability of direction, the MAP-based p-value and the e-value in the Full Bayesian Significance Test. The simulation study was conducted in the statistical programming language R. Data description The R script files for simulation of the datasets used in the study are presented in this article. These script files can both simulate the raw datasets and run the analyses. As researchers may be faced with different effect sizes, noise levels or priors in their domain than the ones studied in the original paper, the scripts extend the original results by allowing to recreate all analyses of interest in different contexts. Therefore, they should be relevant to other researchers.

Journal Article

Share this book

Add to My Shelf

Long-range and local circuits for top-down modulation of visual cortex processing

by Chang, Wei-Cheng , Kamigaki, Tsukasa , Miyamichi, Kazunari in Attention , Axons , Brain

2014

Top-down modulation of sensory processing allows the animal to select inputs most relevant to current tasks. We found that the cingulate (Cg) region of the mouse frontal cortex powerfully influences sensory processing in the primary visual cortex (V1) through long-range projections that activate local γ-aminobutyric acid–ergic (GABAergic) circuits. Optogenetic activation of Cg neurons enhanced V1 neuron responses and improved visual discrimination. Focal activation of Cg axons in V1 caused a response increase at the activation site but a decrease at nearby locations (center-surround modulation). Whereas somatostatin-positive GABAergic interneurons contributed preferentially to surround suppression, vasoactive intestinal peptide-positive interneurons were crucial for center facilitation. Long-range corticocortical projections thus act through local microcircuits to exert spatially specific top-down modulation of sensory processing.

Journal Article

Share this book

Add to My Shelf

PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-TESTS WITH FALSE DISCOVERY RATE CONTROL

by Liu, Weidong , Shao, Qi-Man in 62H15 , Approximation , Bootstrap correction

2014

Applying the Benjamini and Hochberg (B-H) method to multiple Student's t tests is a popular technique for gene selection in microarray data analysis. Given the nonnormality of the population, the true p-values of the hypothesis tests are typically unknown. Hence it is common to use the standard normal distribution N(0, 1), Student's t distribution tn-1 or the bootstrap method to estimate the p-values. In this paper, we prove that when the population has the finite 4th moment and the dimension m and the sample size n satisfy log m = o(n⅓), the B-H method controls the false discovery rate (FDR) and the false discovery proportion (FDP) at a given level α asymptotically with p-values estimated from N(0, 1) or tn-1 distribution. However, a phase transition phenomenon occurs when logm ≥con⅓. In this case, the FDR and the FDP of the B-H method may be larger than α or even converge to one. In contrast, the bootstrap calibration is accurate for log m = o(n½) as long as the underlying distribution has the sub-Gaussian tails. However, such a light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap calibration is very conservative for the heavy tailed distributions. To solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than its usual counterpart.

Journal Article

Share this book

Add to My Shelf

Phonetic Feature Encoding in Human Superior Temporal Gyrus

by Chang, Edward F. , Mesgarani, Nima , Cheung, Connie in Acoustic spectra , Acoustics , Auditory Cortex - anatomy & histology

2014

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.

Journal Article

Share this book

Add to My Shelf

Bayesian Hodges-Lehmann tests for statistical equivalence in the two-sample setting: Power analysis, type I error rates and equivalence boundary selection in biomedical research

by Kelter, Riko in Bayes factor , Bayes Theorem , Bayesian Biostatistics

2021

Background Null hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences. However, the problems of NHST and p -values have been discussed widely and various Bayesian alternatives have been proposed. Some proposals focus on equivalence testing, which aims at testing an interval hypothesis instead of a precise hypothesis. An interval hypothesis includes a small range of parameter values instead of a single null value and the idea goes back to Hodges and Lehmann. As researchers can always expect to observe some (although often negligibly small) effect size, interval hypotheses are more realistic for biomedical research. However, the selection of an equivalence region (the interval boundaries) often seems arbitrary and several Bayesian approaches to equivalence testing coexist. Methods A new proposal is made how to determine the equivalence region for Bayesian equivalence tests based on objective criteria like type I error rate and power. Existing approaches to Bayesian equivalence testing in the two-sample setting are discussed with a focus on the Bayes factor and the region of practical equivalence (ROPE). A simulation study derives the necessary results to make use of the new method in the two-sample setting, which is among the most frequently carried out procedures in biomedical research. Results Bayesian Hodges-Lehmann tests for statistical equivalence differ in their sensitivity to the prior modeling, power, and the associated type I error rates. The relationship between type I error rates, power and sample sizes for existing Bayesian equivalence tests is identified in the two-sample setting. Results allow to determine the equivalence region based on the new method by incorporating such objective criteria. Importantly, results show that not only can prior selection influence the type I error rate and power, but the relationship is even reverse for the Bayes factor and ROPE based equivalence tests. Conclusion Based on the results, researchers can select between the existing Bayesian Hodges-Lehmann tests for statistical equivalence and determine the equivalence region based on objective criteria, thus improving the reproducibility of biomedical research.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter