Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
6,974
result(s) for
"T-tests"
Sort by:
Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test
2017
When comparing two independent groups, psychology researchers commonly use Student’s t-tests. Assumptions of normality and homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student’s t-test can be severely biased and lead to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research, and choosing between Student’s t-test and Welch’s t-test based on the outcomes of a test of the equality of variances often fails to provide an appropriate answer. We show that the Welch’s t-test provides a better control of Type 1 error rates when the assumption of homogeneity of variance is not met, and it loses little robustness compared to Student’s t-test when the assumptions are met. We argue that Welch’s t-test should be used as a default strategy.Publisher’s Note: A correction article relating to this paper has been published and can be found at https://www.rips-irsp.com/articles/10.5334/irsp.661/.
Journal Article
Application of student's t-test, analysis of variance, and covariance
by
Pandey, Gaurav
,
Mishra, Priyadarshni
,
Mishra, Prabhaker
in
Analysis of covariance
,
Analysis of variance
,
Body mass index
2019
Student's t test (t test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA) are statistical methods used in the testing of hypothesis for comparison of means between the groups. The Student's t test is used to compare the means between two groups, whereas ANOVA is used to compare the means among three or more groups. In ANOVA, first gets a common P value. A significant P value of the ANOVA test indicates for at least one pair, between which the mean difference was statistically significant. To identify that significant pair(s), we use multiple comparisons. In ANOVA, when using one categorical independent variable, it is called one-way ANOVA, whereas for two categorical independent variables, it is called two-way ANOVA. When using at least one covariate to adjust with dependent variable, ANOVA becomes ANCOVA. When the size of the sample is small, mean is very much affected by the outliers, so it is necessary to keep sufficient sample size while using these methods.
Journal Article
Inference Under Covariate-Adaptive Randomization
2018
This article studies inference for the average treatment effect in randomized controlled trials with covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve \"balance\" within each stratum. Our main requirement is that the randomization scheme assigns treatment status within each stratum so that the fraction of units being assigned to treatment within each stratum has a well behaved distribution centered around a proportion π as the sample size tends to infinity. Such schemes include, for example, Efron's biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a prespecified value in such settings, we first show the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the usual standard error of the two-sample t-test leads to a test that is exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. Next, we consider the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata. We show that this test is exact for the important special case of randomization schemes with
, but is otherwise conservative. We again provide a simple adjustment to the standard errors that yields an exact test more generally. Finally, we study the behavior of a modified version of a permutation test, which we refer to as the covariate-adaptive permutation test, that only permutes treatment status for units within the same stratum. When applied to the usual two-sample t-statistic, we show that this test is exact for randomization schemes with
and that additionally achieve what we refer to as \"strong balance.\" For randomization schemes with
, this test may have limiting rejection probability under the null hypothesis strictly greater than the nominal level. When applied to a suitably adjusted version of the two-sample t-statistic, however, we show that this test is exact for all randomization schemes that achieve \"strong balance,\" including those with
. A simulation study confirms the practical relevance of our theoretical results. We conclude with recommendations for empirical practice and an empirical illustration. Supplementary materials for this article are available online.
Journal Article
Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests
by
Lee, Michael D.
,
Iverson, Geoffrey J.
,
Matzke, Dora
in
Anecdotal research
,
Bayesian analysis
,
Bayesian Statistics
2011
Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdotal. Second, effect sizes can provide additional evidence to p values and default Bayes factors. The authors conclude that the Bayesian approach is comparatively prudent, preventing researchers from overestimating the evidence in favor of an effect.
Journal Article
Analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research
2020
Background
The replication crisis hit the medical sciences about a decade ago, but today still most of the flaws inherent in null hypothesis significance testing (NHST) have not been solved. While the drawbacks of
p
-values have been detailed in endless venues, for clinical research, only a few attractive alternatives have been proposed to replace
p
-values and NHST. Bayesian methods are one of them, and they are gaining increasing attention in medical research, as some of their advantages include the description of model parameters in terms of probability, as well as the incorporation of prior information in contrast to the frequentist framework. While Bayesian methods are not the only remedy to the situation, there is an increasing agreement that they are an essential way to avoid common misconceptions and false interpretation of study results. The requirements necessary for applying Bayesian statistics have transitioned from detailed programming knowledge into simple point-and-click programs like JASP. Still, the multitude of Bayesian significance and effect measures which contrast the gold standard of significance in medical research, the
p
-value, causes a lack of agreement on which measure to report.
Methods
Therefore, in this paper, we conduct an extensive simulation study to compare common Bayesian significance and effect measures which can be obtained from a posterior distribution. In it, we analyse the behaviour of these measures for one of the most important statistical procedures in medical research and in particular clinical trials, the two-sample Student’s (and Welch’s) t-test.
Results
The results show that some measures cannot state evidence for both the null and the alternative. While the different indices behave similarly regarding increasing sample size and noise, the prior modelling influences the obtained results and extreme priors allow for cherry-picking similar to p-hacking in the frequentist paradigm. The indices behave quite differently regarding their ability to control the type I error rates and regarding their ability to detect an existing effect.
Conclusion
Based on the results, two of the commonly used indices can be recommended for more widespread use in clinical and biomedical research, as they improve the type I error control compared to the classic two-sample t-test and enjoy multiple other desirable properties.
Journal Article
Simulation data for the analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research
2020
Objectives
The data presented herein represents the simulated datasets of a recently conducted larger study which investigated the behaviour of Bayesian indices of significance and effect size as alternatives to traditional p-values. The study considered the setting of Student’s and Welch’s two-sample t-test often used in medical research. It investigated the influence of the sample size, noise, the selected prior hyperparameters and the sensitivity to type I errors. The posterior indices used included the Bayes factor, the region of practical equivalence, the probability of direction, the MAP-based p-value and the e-value in the Full Bayesian Significance Test. The simulation study was conducted in the statistical programming language R.
Data description
The R script files for simulation of the datasets used in the study are presented in this article. These script files can both simulate the raw datasets and run the analyses. As researchers may be faced with different effect sizes, noise levels or priors in their domain than the ones studied in the original paper, the scripts extend the original results by allowing to recreate all analyses of interest in different contexts. Therefore, they should be relevant to other researchers.
Journal Article
Long-range and local circuits for top-down modulation of visual cortex processing
2014
Top-down modulation of sensory processing allows the animal to select inputs most relevant to current tasks. We found that the cingulate (Cg) region of the mouse frontal cortex powerfully influences sensory processing in the primary visual cortex (V1) through long-range projections that activate local γ-aminobutyric acid–ergic (GABAergic) circuits. Optogenetic activation of Cg neurons enhanced V1 neuron responses and improved visual discrimination. Focal activation of Cg axons in V1 caused a response increase at the activation site but a decrease at nearby locations (center-surround modulation). Whereas somatostatin-positive GABAergic interneurons contributed preferentially to surround suppression, vasoactive intestinal peptide-positive interneurons were crucial for center facilitation. Long-range corticocortical projections thus act through local microcircuits to exert spatially specific top-down modulation of sensory processing.
Journal Article
PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-TESTS WITH FALSE DISCOVERY RATE CONTROL
2014
Applying the Benjamini and Hochberg (B-H) method to multiple Student's t tests is a popular technique for gene selection in microarray data analysis. Given the nonnormality of the population, the true p-values of the hypothesis tests are typically unknown. Hence it is common to use the standard normal distribution N(0, 1), Student's t distribution tn-1 or the bootstrap method to estimate the p-values. In this paper, we prove that when the population has the finite 4th moment and the dimension m and the sample size n satisfy log m = o(n⅓), the B-H method controls the false discovery rate (FDR) and the false discovery proportion (FDP) at a given level α asymptotically with p-values estimated from N(0, 1) or tn-1 distribution. However, a phase transition phenomenon occurs when logm ≥con⅓. In this case, the FDR and the FDP of the B-H method may be larger than α or even converge to one. In contrast, the bootstrap calibration is accurate for log m = o(n½) as long as the underlying distribution has the sub-Gaussian tails. However, such a light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap calibration is very conservative for the heavy tailed distributions. To solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than its usual counterpart.
Journal Article
Phonetic Feature Encoding in Human Superior Temporal Gyrus
by
Chang, Edward F.
,
Mesgarani, Nima
,
Cheung, Connie
in
Acoustic spectra
,
Acoustics
,
Auditory Cortex - anatomy & histology
2014
During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.
Journal Article
Bayesian Hodges-Lehmann tests for statistical equivalence in the two-sample setting: Power analysis, type I error rates and equivalence boundary selection in biomedical research
2021
Background
Null hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences. However, the problems of NHST and
p
-values have been discussed widely and various Bayesian alternatives have been proposed. Some proposals focus on equivalence testing, which aims at testing an interval hypothesis instead of a precise hypothesis. An interval hypothesis includes a small range of parameter values instead of a single null value and the idea goes back to Hodges and Lehmann. As researchers can always expect to observe some (although often negligibly small) effect size, interval hypotheses are more realistic for biomedical research. However, the selection of an equivalence region (the interval boundaries) often seems arbitrary and several Bayesian approaches to equivalence testing coexist.
Methods
A new proposal is made how to determine the equivalence region for Bayesian equivalence tests based on objective criteria like type I error rate and power. Existing approaches to Bayesian equivalence testing in the two-sample setting are discussed with a focus on the Bayes factor and the region of practical equivalence (ROPE). A simulation study derives the necessary results to make use of the new method in the two-sample setting, which is among the most frequently carried out procedures in biomedical research.
Results
Bayesian Hodges-Lehmann tests for statistical equivalence differ in their sensitivity to the prior modeling, power, and the associated type I error rates. The relationship between type I error rates, power and sample sizes for existing Bayesian equivalence tests is identified in the two-sample setting. Results allow to determine the equivalence region based on the new method by incorporating such objective criteria. Importantly, results show that not only can prior selection influence the type I error rate and power, but the relationship is even reverse for the Bayes factor and ROPE based equivalence tests.
Conclusion
Based on the results, researchers can select between the existing Bayesian Hodges-Lehmann tests for statistical equivalence and determine the equivalence region based on objective criteria, thus improving the reproducibility of biomedical research.
Journal Article