Catalogue Search | MBRL

Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test

by Delacre, Marie , Leys, Christophe , Lakens, Daniël in Equality , Kurtosis , Normal distribution

2017

When comparing two independent groups, psychology researchers commonly use Student’s t-tests. Assumptions of normality and homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student’s t-test can be severely biased and lead to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research, and choosing between Student’s t-test and Welch’s t-test based on the outcomes of a test of the equality of variances often fails to provide an appropriate answer. We show that the Welch’s t-test provides a better control of Type 1 error rates when the assumption of homogeneity of variance is not met, and it loses little robustness compared to Student’s t-test when the assumptions are met. We argue that Welch’s t-test should be used as a default strategy.Publisher’s Note: A correction article relating to this paper has been published and can be found at https://www.rips-irsp.com/articles/10.5334/irsp.661/.

Journal Article

Share this book

Add to My Shelf

Application of student's t-test, analysis of variance, and covariance

by Pandey, Gaurav , Mishra, Priyadarshni , Mishra, Prabhaker in Analysis of covariance , Analysis of variance , Body mass index

2019

Student's t test (t test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA) are statistical methods used in the testing of hypothesis for comparison of means between the groups. The Student's t test is used to compare the means between two groups, whereas ANOVA is used to compare the means among three or more groups. In ANOVA, first gets a common P value. A significant P value of the ANOVA test indicates for at least one pair, between which the mean difference was statistically significant. To identify that significant pair(s), we use multiple comparisons. In ANOVA, when using one categorical independent variable, it is called one-way ANOVA, whereas for two categorical independent variables, it is called two-way ANOVA. When using at least one covariate to adjust with dependent variable, ANOVA becomes ANCOVA. When the size of the sample is small, mean is very much affected by the outliers, so it is necessary to keep sufficient sample size while using these methods.

Journal Article

Share this book

Add to My Shelf

Inference Under Covariate-Adaptive Randomization

by Bugni, Federico A. , Shaikh, Azeem M. , Canay, Ivan A. in Adjustment , Assignment , Averages

2018

This article studies inference for the average treatment effect in randomized controlled trials with covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve \"balance\" within each stratum. Our main requirement is that the randomization scheme assigns treatment status within each stratum so that the fraction of units being assigned to treatment within each stratum has a well behaved distribution centered around a proportion π as the sample size tends to infinity. Such schemes include, for example, Efron's biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a prespecified value in such settings, we first show the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the usual standard error of the two-sample t-test leads to a test that is exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. Next, we consider the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata. We show that this test is exact for the important special case of randomization schemes with , but is otherwise conservative. We again provide a simple adjustment to the standard errors that yields an exact test more generally. Finally, we study the behavior of a modified version of a permutation test, which we refer to as the covariate-adaptive permutation test, that only permutes treatment status for units within the same stratum. When applied to the usual two-sample t-statistic, we show that this test is exact for randomization schemes with and that additionally achieve what we refer to as \"strong balance.\" For randomization schemes with , this test may have limiting rejection probability under the null hypothesis strictly greater than the nominal level. When applied to a suitably adjusted version of the two-sample t-statistic, however, we show that this test is exact for all randomization schemes that achieve \"strong balance,\" including those with . A simulation study confirms the practical relevance of our theoretical results. We conclude with recommendations for empirical practice and an empirical illustration. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Simulation data for the analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research

by Kelter, Riko in Bayes Theorem , Bayesian analysis , Bayesian Biostatistics

2020

Objectives The data presented herein represents the simulated datasets of a recently conducted larger study which investigated the behaviour of Bayesian indices of significance and effect size as alternatives to traditional p-values. The study considered the setting of Student’s and Welch’s two-sample t-test often used in medical research. It investigated the influence of the sample size, noise, the selected prior hyperparameters and the sensitivity to type I errors. The posterior indices used included the Bayes factor, the region of practical equivalence, the probability of direction, the MAP-based p-value and the e-value in the Full Bayesian Significance Test. The simulation study was conducted in the statistical programming language R. Data description The R script files for simulation of the datasets used in the study are presented in this article. These script files can both simulate the raw datasets and run the analyses. As researchers may be faced with different effect sizes, noise levels or priors in their domain than the ones studied in the original paper, the scripts extend the original results by allowing to recreate all analyses of interest in different contexts. Therefore, they should be relevant to other researchers.

Journal Article

Share this book

Add to My Shelf

Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests

by Lee, Michael D. , Iverson, Geoffrey J. , Matzke, Dora in Anecdotal research , Bayesian analysis , Bayesian Statistics

2011

Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdotal. Second, effect sizes can provide additional evidence to p values and default Bayes factors. The authors conclude that the Bayesian approach is comparatively prudent, preventing researchers from overestimating the evidence in favor of an effect.

Journal Article

Share this book

Add to My Shelf

Analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research

by Kelter, Riko in Bayes Theorem , Bayesian analysis , Bayesian biostatistics

2020

Background The replication crisis hit the medical sciences about a decade ago, but today still most of the flaws inherent in null hypothesis significance testing (NHST) have not been solved. While the drawbacks of p -values have been detailed in endless venues, for clinical research, only a few attractive alternatives have been proposed to replace p -values and NHST. Bayesian methods are one of them, and they are gaining increasing attention in medical research, as some of their advantages include the description of model parameters in terms of probability, as well as the incorporation of prior information in contrast to the frequentist framework. While Bayesian methods are not the only remedy to the situation, there is an increasing agreement that they are an essential way to avoid common misconceptions and false interpretation of study results. The requirements necessary for applying Bayesian statistics have transitioned from detailed programming knowledge into simple point-and-click programs like JASP. Still, the multitude of Bayesian significance and effect measures which contrast the gold standard of significance in medical research, the p -value, causes a lack of agreement on which measure to report. Methods Therefore, in this paper, we conduct an extensive simulation study to compare common Bayesian significance and effect measures which can be obtained from a posterior distribution. In it, we analyse the behaviour of these measures for one of the most important statistical procedures in medical research and in particular clinical trials, the two-sample Student’s (and Welch’s) t-test. Results The results show that some measures cannot state evidence for both the null and the alternative. While the different indices behave similarly regarding increasing sample size and noise, the prior modelling influences the obtained results and extreme priors allow for cherry-picking similar to p-hacking in the frequentist paradigm. The indices behave quite differently regarding their ability to control the type I error rates and regarding their ability to detect an existing effect. Conclusion Based on the results, two of the commonly used indices can be recommended for more widespread use in clinical and biomedical research, as they improve the type I error control compared to the classic two-sample t-test and enjoy multiple other desirable properties.

Journal Article

Share this book

Add to My Shelf

T-Friedman Test: A New Statistical Test for Multiple Comparison with an Adjustable Conservativeness Measure

by Liu, Jie , Xu, Yubo in Algorithms , Artificial Intelligence , Benchmarks

2022

To prove that a certain algorithm is superior to the benchmark algorithms, the statistical hypothesis tests are commonly adopted with experimental results on a number of datasets. Some statistical hypothesis tests draw statistical test results more conservative than the others, while it is not yet possible to characterize quantitatively the degree of conservativeness of such a statistical test. On the basis of the existing nonparametric statistical tests, this paper proposes a new statistical test for multiple comparison which is named as t-Friedman test. T-Friedman test combines t test with Friedman test for multiple comparison. The confidence level of the t test is adopted as a measure of conservativeness of the proposed t-Friedman test. A bigger confidence level infers a higher degree of conservativeness, and vice versa. Based on the synthetic results generated by Monte Carlo simulations with predefined distributions, the performance of several state-of-the-art multiple comparison tests and post hoc procedures are first qualitatively analyzed. The influences of the type of predefined distribution, the number of benchmark algorithms and the number of datasets are explored in the experiments. The conservativeness measure of the proposed method is also validated and verified in the experiments. Finally, some suggestions for the application of these nonparametric statistical tests are provided.

Journal Article

Share this book

Add to My Shelf

Homoscedasticity: an overlooked critical assumption for linear regression

by Chen, Tian , Tu, Justin , Yang, Kun in Analysis of Variance , Biostatistical Methods in Psychiatry , F-test

2019

Linear regression is widely used in biomedical and psychosocial research. A critical assumption that is often overlooked is homoscedasticity. Unlike normality, the other assumption on data distribution, homoscedasticity is often taken for granted when fitting linear regression models. However, contrary to popular belief, this assumption actually has a bigger impact on validity of linear regression results than normality. In this report, we use Monte Carlo simulation studies to investigate and compare their effects on validity of inference.

Journal Article

Share this book

Add to My Shelf

Using the paired T-Test to compare suppliers

by De Brito, Caroline Soares , Silva, Dayana Elizabeth Werderits , Abrãao, Ricardo in Student's t-test

2023

The manufacture of industrial products requires rigorous quality, which is why they need to be in compliance. It is therefore essential that two different suppliers present products within the same specifications. The aim of this article is to show a case study carried out in a company in the south of the state of Rio de Janeiro, which used the Paired T Test to compare two types of foam for hospital mattresses. The results showed that supplier 1 produces a foam below the specified thickness of 11 cm while supplier 2 produces a foam within the specified value, so supplier 1 would be rejected and supplier 2's service would be used.

Journal Article

Share this book

Add to My Shelf

Valid population inference for information-based imaging: From the second-level t-test to prevalence inference

by Allefeld, Carsten , Görgen, Kai , Haynes, John-Dylan in Accuracy , Brain - physiology , Brain Mapping - methods

2016

In multivariate pattern analysis of neuroimaging data, ‘second-level’ inference is often performed by entering classification accuracies into a t-test vs chance level across subjects. We argue that while the random-effects analysis implemented by the t-test does provide population inference if applied to activation differences, it fails to do so in the case of classification accuracy or other ‘information-like’ measures, because the true value of such measures can never be below chance level. This constraint changes the meaning of the population-level null hypothesis being tested, which becomes equivalent to the global null hypothesis that there is no effect in any subject in the population. Consequently, rejecting it only allows to infer that there are some subjects in which there is an information effect, but not that it generalizes, rendering it effectively equivalent to fixed-effects analysis. This statement is supported by theoretical arguments as well as simulations. We review possible alternative approaches to population inference for information-based imaging, converging on the idea that it should not target the mean, but the prevalence of the effect in the population. One method to do so, ‘permutation-based information prevalence inference using the minimum statistic’, is described in detail and applied to empirical data. •A second level t-test applied to accuracies in MVPA does not provide population inference.•The same holds for other measures used in information-based imaging.•The reason is that the true value of ‘information-like’ measures cannot be below chance level.•This is in contrast to the use of the t-test in univariate analysis which does support generalization.•Population inference in MVPA can be achieved by targeting the effect prevalence instead of the mean.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter