Catalogue Search | MBRL

Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies

by Sayyari, Erfan , Mirarab, Siavash in Algorithms , Computation , Conditional probability

2016

Species tree reconstruction is complicated by effects of incomplete lineage sorting, commonly modeled by the multi-species coalescent model (MSC). While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this article, we propose a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees. We then show how the quartet support can be used in the context of the MSC to compute (1) the local posterior probability (PP) that the branch is in the species tree and (2) the length of the branch in coalescent units. We evaluate the precision and recall of the local PP on a wide set of simulated and biological datasets, and show that it has very high precision and improved recall compared with multi-locus bootstrapping. The estimated branch lengths are highly accurate when gene tree estimation error is low, but are underestimated when gene tree estimation error increases. Computation of both the branch length and local PP is implemented as new features in ASTRAL.

Journal Article

Share this book

Add to My Shelf

Controversy and Debate: Questionable utility of the relative risk in clinical research: Paper 1: A call for change to practice

by Thalib, Lukman , Furuya-Kanamori, Luis , Xu, Chang in Binary effect measure , Clinical trial , Clinical trials

2022

In clinical trials, the relative risk or risk ratio (RR) is a mainstay of reporting of the effect magnitude for an intervention. The RR is the ratio of the probability of an outcome in an intervention group to its probability in a control group. Thus, the RR provides a measure of change in the likelihood of an event linked to a given intervention. This measure has been widely used because it is today considered a measure with “portability” across varying outcome prevalence, especially when the outcome is rare. It turns out, however, that there is a much more important problem with this ratio, and this paper aims to demonstrate this problem. We used mathematical derivation to determine if the RR is a measure of effect magnitude alone (i.e., a larger absolute value always indicating a stronger effect) or not. We also used the same derivation to determine its relationship to the prevalence of an outcome. We confirm the derivation results with a follow-up analysis of 140,620 trials scraped from the Cochrane. We demonstrate that the RR varies for reasons other than the magnitude of the effect because it is a ratio of two posterior probabilities, both of which are dependent on baseline prevalence of an outcome. In addition, we demonstrate that the RR shifts toward its null value with increasing outcome prevalence. The shift toward the null happens regardless of the strength of the association between intervention and outcome. The odds ratio (OR), the other commonly used ratio, measures solely the effect magnitude and has no relationship to the prevalence of an outcome in a study nor does it overestimate the RR as is commonly thought. The results demonstrate the need to (1) end the primary use of the RR in clinical trials and meta-analyses as its direct interpretation is not meaningful, (2) replace the RR by the OR, and (3) only use the postintervention risk recalculated from the OR for any expected level of baseline risk in absolute terms for purposes of interpretation such as the number needed to treat. These results will have far-reaching implications such as reducing misleading results from clinical trials and meta-analyses and ushering in a new era in the reporting of such trials or meta-analyses in practice.

Journal Article

Share this book

Add to My Shelf

Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don't Expect Replication

by Greenland, Sander , Trafimow, David , Amrhein, Valentin in Adopting More Holistic Approaches , Assumptions , Auxiliary hypotheses

2019

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a \"replication crisis\" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as providing generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced.

Journal Article

Share this book

Add to My Shelf

Does group-based trajectory modeling estimate spurious trajectories?

by Rousseau, Marie-Claude , O’Loughlin, Jennifer , Mésidor, Miceline in Average posterior probability , Bayes Theorem , Classification

2022

Background Group-based trajectory modelling (GBTM) is increasingly used to identify subgroups of individuals with similar patterns. In this paper, we use simulated and real-life data to illustrate that GBTM is susceptible to generating spurious findings in some circumstances. Methods Six plausible scenarios, two of which mimicked published analyses, were simulated. Models with 1 to 10 trajectory subgroups were estimated and the model that minimized the Bayes criterion was selected. For each scenario, we assessed whether the method identified the correct number of trajectories, the correct shapes of the trajectories, and the mean number of participants of each trajectory subgroup. The performance of the average posterior probabilities, relative entropy and mismatch criteria to assess classification adequacy were compared. Results Among the six scenarios, the correct number of trajectories was identified in two, the correct shapes in four and the mean number of participants of each trajectory subgroup in only one. Relative entropy and mismatch outperformed the average posterior probability in detecting spurious trajectories. Conclusion Researchers should be aware that GBTM can generate spurious findings, especially when the average posterior probability is used as the sole criterion to evaluate model fit. Several model adequacy criteria should be used to assess classification adequacy.

Journal Article

Share this book

Add to My Shelf

A tutorial on Bayesian model-averaged meta-analysis in JASP

by Haaf, Julia M. , Gronau, Quentin F. , Berkhout, Sophie W. in Bayes Theorem , Behavioral Science and Psychology , Child

2024

Researchers conduct meta-analyses in order to synthesize information across different studies. Compared to standard meta-analytic methods, Bayesian model-averaged meta-analysis offers several practical advantages including the ability to quantify evidence in favor of the absence of an effect, the ability to monitor evidence as individual studies accumulate indefinitely, and the ability to draw inferences based on multiple models simultaneously. This tutorial introduces the concepts and logic underlying Bayesian model-averaged meta-analysis and illustrates its application using the open-source software JASP. As a running example, we perform a Bayesian meta-analysis on language development in children. We show how to conduct a Bayesian model-averaged meta-analysis and how to interpret the results.

Journal Article

Share this book

Add to My Shelf

Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees

by Zhu, Tianqi , Yang, Ziheng in Asymptotic properties , Bayes Theorem , Bayesian analysis

2018

The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.

Journal Article

Share this book

Add to My Shelf

Bayes Factors Unmask Highly Variable Information Content, Bias, and Extreme Influence in Phylogenomic Analyses

by Thomson, Robert C. , Brown, Jeremy M. in Bayes Theorem , Bayesian analysis , Bias

2017

As the application of genomic data in phylogenetics has become routine, a number of cases have arisen where alternative data sets strongly support conflicting conclusions. This sensitivity to analytical decisions has prevented firm resolution of some of the most recalcitrant nodes in the tree of life. To better understand the causes and nature of this sensitivity, we analyzed several phylogenomic data sets using an alternative measure of topological support (the Bayes factor) that both demonstrates and averts several limitations of more frequently employed support measures (such as Markov chain Monte Carlo estimates of posterior probabilities). Bayes factors reveal important, previously hidden, differences across six \"phylogenomic\" data sets collected to resolve the phylogenetic placement of turtles within Amniota. These data sets vary substantially in their support for well-established amniote relationships, particularly in the proportion of genes that contain extreme amounts of information as well as the proportion that strongly reject these uncontroversial relationships. All six data sets contain little information to resolve the phylogenetic placement of turtles relative to other amniotes. Bayes factors also reveal that a very small number of extremely influential genes (less than 1% of genes in a data set) can fundamentally change significant phylogenetic conclusions. In one example, these genes are shown to contain previously unrecognized paralogs. This study demonstrates both that the resolution of difficult phylogenomic problems remains sensitive to seemingly minor analysis details and that Bayes factors are a valuable tool for identifying and solving these challenges.

Journal Article

Share this book

Add to My Shelf

A Solution to the Ecological Inference Problem

by Gary King in Accounting identity , Accuracy and precision , Addition

2013

This book provides a solution to the ecological inference problem, which has plagued users of statistical methods for over seventy-five years: How can researchers reliably infer individual-level behavior from aggregate (ecological) data? In political science, this question arises when individual-level surveys are unavailable (for instance, local or comparative electoral politics), unreliable (racial politics), insufficient (political geography), or infeasible (political history). This ecological inference problem also confronts researchers in numerous areas of major significance in public policy, and other academic disciplines, ranging from epidemiology and marketing to sociology and quantitative history. Although many have attempted to make such cross-level inferences, scholars agree that all existing methods yield very inaccurate conclusions about the world. In this volume, Gary King lays out a unique--and reliable--solution to this venerable problem. King begins with a qualitative overview, readable even by those without a statistical background. He then unifies the apparently diverse findings in the methodological literature, so that only one aggregation problem remains to be solved. He then presents his solution, as well as empirical evaluations of the solution that include over 16,000 comparisons of his estimates from real aggregate data to the known individual-level answer. The method works in practice. King's solution to the ecological inference problem will enable empirical researchers to investigate substantive questions that have heretofore proved unanswerable, and move forward fields of inquiry in which progress has been stifled by this problem.

eBook

Share this book

Add to My Shelf

A note on Platt’s probabilistic outputs for support vector machines

by Lin, Hsuan-Tien , Lin, Chih-Jen , Weng, Ruby C. in Applied sciences , Artificial intelligence , Computer science; control theory; systems

2007

Platt's probabilistic outputs for Support Vector Machines (Platt, J. in Smola, A., et al. (eds.) Advances in large margin classifiers. Cambridge, 2000) has been popular for applications that require posterior class probabilities. In this note, we propose an improved algorithm that theoretically converges and avoids numerical difficulties. A simple and ready-to-use pseudo code is included. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

A Fast Likelihood Method to Reconstruct and Visualize Ancestral Scenarios

by Iwasaki, Wataru , Ishikawa, Sohta A , Zhukova, Anna in Datasets , Decision theory , Drug resistance

2019

The reconstruction of ancestral scenarios is widely used to study the evolution of characters along phylogenetic trees. One commonly uses the marginal posterior probabilities of the character states, or the joint reconstruction of the most likely scenario. However, marginal reconstructions provide users with state probabilities, which are difficult to interpret and visualize, whereas joint reconstructions select a unique state for every tree node and thus do not reflect the uncertainty of inferences. We propose a simple and fast approach, which is in between these two extremes. We use decision-theory concepts (namely, the Brier score) to associate each node in the tree to a set of likely states. A unique state is predicted in tree regions with low uncertainty, whereas several states are predicted in uncertain regions, typically around the tree root. To visualize the results, we cluster the neighboring nodes associated with the same states and use graph visualization tools. The method is implemented in the PastML program and web server. The results on simulated data demonstrate the accuracy and robustness of the approach. PastML was applied to the phylogeography of Dengue serotype 2 (DENV2), and the evolution of drug resistances in a large HIV data set. These analyses took a few minutes and provided convincing results. PastML retrieved the main transmission routes of human DENV2 and showed the uncertainty of the human-sylvatic DENV2 geographic origin. With HIV, the results show that resistance mutations mostly emerge independently under treatment pressure, but resistance clusters are found, corresponding to transmissions among untreated patients.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter