Catalogue Search | MBRL

Bayesian Long Branch Attraction Bias and Corrections

by Susko, Edward in Bayes Theorem , Bayesian analysis , Bias

2015

Previous work on the star-tree paradox has shown that Bayesian methods suffer from a long branch attraction bias. That work is extended to settings involving more taxa and partially resolved trees. The long branch attraction bias is confirmed to arise more broadly and an additional source of bias is found. A by-product of the analysis is methods that correct for biases toward particular topologies. The corrections can be easily calculated using existing Bayesian software. Posterior support for a set of two or more trees can thus be supplemented with corrected versions to cross-check or replace results. Simulations show the corrections to be highly effective.

Journal Article

Share this book

Add to My Shelf

On the Distributions of Bootstrap Support and Posterior Distributions for a Star Tree

by Susko, Edward in Algorithms , Approximation , Bootstrap method

2008

Several authors have recently noted that when data are generated from a star topology, posterior probabilities can often be very large, even with arbitrarily large sequence lengths. This is counter to intuition, which suggests convergence to the limit of equal probability for each topology. Here the limiting distributions of bootstrap support and posterior probabilities are obtained for a four-taxon star tree. Theoretical results are given, providing confirmation that this counterintuitive phenomenon holds for both posterior probabilities and bootstrap support. For large samples the limiting results for posterior probabilities are the same regardless of the prior. With equal-length terminal edges, the limiting distribution is similar but not the same across different choices for the lengths of the edges. In contrast to previous results, the case of unequal lengths of terminal edges is considered. With two long edges, the posterior probability of the tree with long edges together tends to be much larger. Using the neighbor-joining algorithm, with equal edge lengths, the distribution of bootstrap support tends to be qualitatively comparable to posterior probabilities. As with posterior probabilities, when two of the edges are long, bootstrap support for the tree with long branches together tends to be large. The bias is less pronounced, however, as the distribution of bootstrap support gets close to uniform for this tree, whereas posterior probabilities are much more likely to be large. Our findings for maximum likelihood estimation are based entirely on simulation and in contrast suggest that bootstrap support tends to be fairly constant across edge-length choices.

Journal Article

Share this book

Add to My Shelf

An updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales and Holosporales have independent origins

by Muñoz-Gómez, Sergio A , Slamovits, Claudio H , Lang, B Franz in Alphaproteobacteria , Alphaproteobacteria - classification , Alphaproteobacteria - genetics

2019

The Alphaproteobacteria is an extraordinarily diverse and ancient group of bacteria. Previous attempts to infer its deep phylogeny have been plagued with methodological artefacts. To overcome this, we analyzed a dataset of 200 single-copy and conserved genes and employed diverse strategies to reduce compositional artefacts. Such strategies include using novel dataset-specific profile mixture models and recoding schemes, and removing sites, genes and taxa that are compositionally biased. We show that the Rickettsiales and Holosporales (both groups of intracellular parasites of eukaryotes) are not sisters to each other, but instead, the Holosporales has a derived position within the Rhodospirillales. A synthesis of our results also leads to an updated proposal for the higher-level taxonomy of the Alphaproteobacteria. Our robust consensus phylogeny will serve as a framework for future studies that aim to place mitochondria, and novel environmental diversity, within the Alphaproteobacteria. The Alphaproteobacteria form one of the most abundant groups of bacteria on Earth, and one that is closely linked to all complex forms of life. Many bacteria within this class live inside the cells of other organisms. For example, mitochondria – the powerhouses of animal, plant and other eukaryotic cells – evolved from bacteria within this group. Other alphaproteobacteria act as parasites or beneficial symbionts within cells. The history of life on Earth can be thought of as a tree, with each branch representing the evolution of a new species from a common ancestor. But for many bacteria, the earliest stages of their evolutionary history are so tangled and complex that their origin remains largely unknown. For example, efforts to study the earliest history of the Alphaproteobacteria have been plagued with errors and artefacts. The extreme variation in the genetic sequences of different bacteria in the group make it particularly challenging to uncover relationships between the species. To overcome this problem, Muñoz-Gómez et al. focused on a set of 200 genes that occur in all alphaproteobacteria, and used a range of strategies to reduce potential errors in the data. The results propose a new general structure for the evolutionary tree of the Alphaproteobacteria. This shows that two groups of alphaproteobacteria that were thought to be closely related to each other – the parasites Rickettsiales and Holosporales – are unrelated. Instead, these groups evolved independently from different free-living alphaproteobacteria. The abundance and diversity of the Alphaproteobacteria means that the improved understanding of their evolutionary origins could influence the work of a wide range of scientists. Further research could help to shed light on how parasitic bacteria interact with the cells they invade; reveal how bacteria evolved certain abilities, such as the ability to photosynthesize; and uncover the precise origin of mitochondria.

Journal Article

Share this book

Add to My Shelf

Performance of Topology Tests under Extreme Selection Bias

by Susko, Edward , Markowski, Etai in Analysis , Bias , Chi-square test

2024

Abstract Tree tests like the Kishino–Hasegawa (KH) test and chi-square test suffer a selection bias that tests like the Shimodaira–Hasegawa (SH) test and approximately unbiased test were intended to correct. We investigate tree-testing performance in the presence of severe selection bias. The SH test is found to be very conservative and, surprisingly, its uncorrected analog, the KH test has low Type I error even in the presence of extreme selection bias, leading to a recommendation that the SH test be abandoned. A chi-square test is found to usually behave well and but to require correction in extreme cases. We show how topology testing procedures can be used to get support values for splits and compare the likelihood-based support values to the approximate likelihood ratio test (aLRT) support values. We find that the aLRT support values are reasonable even in settings with severe selection bias that they were not designed for. We also show how they can be used to construct tests of topologies and, in doing so, point out a multiple comparisons issue that should be considered when looking at support values for splits.

Journal Article

Share this book

Add to My Shelf

Long Branch Attraction Biases in Phylogenetics

by Susko, Edward , Roger, Andrew J. in Phylogeny , Points of View , Trees

2021

Long branch attraction (LBA) is a prevalent form of bias in phylogenetic estimation but the reasons for it are only partially understood. We argue here that it is largely due to differences in the sizes of the model spaces corresponding to different trees. Trees with long branches together allow much more flexible internal branch length parameter estimation. Consequently, although each tree has the same number of parameters, trees with long branches together have larger effective model spaces. The problem of LBA becomes particularly pronounced with partitioned data. Formulation of tree estimation as model selection leads us to propose bootstrap bias corrections as cross-checks on estimation when long branches end up being estimated together.

Journal Article

Share this book

Add to My Shelf

Tests for Two Trees Using Likelihood Methods

by Susko, Edward in Error correction , Likelihood ratio , Statistical analysis

2014

This article considers two similar likelihood-based test statistics for comparing two fixed trees, the Kishino-Hasegawa (KH) test statistic and the likelihood ratio (LR) statistic, as well as a number of different methods for determining thresholds to declare a significant result. An explanation is given for why the KH test, which uses the KH test statistic and normal theory thresholds, need not give correct type I error probabilities under the appropriate null hypothesis. Simulations show that the KH test tends to give much smaller type I error probabilities than expected. The article presents a computationally efficient normal-theory parametric bootstrap method for determining better KH test statistic thresholds. For the LR statistic, existing mixture of chi-squares results for determining thresholds are extended to cases in which a tree with two or three zero edge-lengths exhibits the two trees being compared. The resulting chi-bar test and use of the KH test statistic with normal bootstrap are shown through simulation to give good performance but are more difficult to implement than the KH test. Two conservative approaches are presented which require only log likelihoods and simple chi-square thresholds. While they did not perform as well as chi-bar and normal bootstrap methods in the simulations considered, they gave better performance than the KH test and have just as simple an implementation. As a by-product of parametric bootstrap considerations, an adjustment to the Swofford-Olsen-Waddell-Hillis (SOWH) test is proposed.

Journal Article

Share this book

Add to My Shelf

Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7

by Baele, Guy , Xie, Dong , Suchard, Marc A. in Bayes Theorem , Bayesian analysis , Bayesian theory

2018

Bayesian inference of phylogeny using Markov chain Monte Carlo (MCMC) plays a central role in understanding evolutionary history from molecular sequence data. Visualizing and analyzing the MCMC-generated samples from the posterior distribution is a key step in any non-trivial Bayesian inference. We present the software package Tracer (version 1.7) for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference. Tracer provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more. Tracer is open-source and available at http://beast.community/tracer.

Journal Article

Share this book

Add to My Shelf

On the Use of Information Criteria for Model Selection in Phylogenetics

by Susko, Edward , Roger, Andrew J in Approximation , Bayesian analysis , Bias

2020

The information criteria Akaike information criterion (AIC), AICc, and Bayesian information criterion (BIC) are widely used for model selection in phylogenetics, however, their theoretical justification and performance have not been carefully examined in this setting. Here, we investigate these methods under simple and complex phylogenetic models. We show that AIC can give a biased estimate of its intended target, the expected predictive log likelihood (EPLnL) or, equivalently, expected Kullback–Leibler divergence between the estimated model and the true distribution for the data. Reasons for bias include commonly occurring issues such as small edge-lengths or, in mixture models, small weights. The use of partitioned models is another issue that can cause problems with information criteria. We show that for partitioned models, a different BIC correction is required for it to be a valid approximation to a Bayes factor. The commonly used AICc correction is not clearly defined in partitioned models and can actually create a substantial bias when the number of parameters gets large as is the case with larger trees and partitioned models. Bias-corrected cross-validation corrections are shown to provide better approximations to EPLnL than AIC. We also illustrate how EPLnL, the estimation target of AIC, can sometimes favor an incorrect model and give reasons for why selection of incorrectly under-partitioned models might be desirable in partitioned model settings.

Journal Article

Share this book

Add to My Shelf

Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation

by Susko, Edward , Minh, Bui Quang , Roger, Andrew J. in Amino Acid Substitution , Amino acids , Bayesian analysis

2018

Proteins have distinct structural and functional constraints at different sites that lead to site-specific preferences for particular amino acid residues as the sequences evolve. Heterogeneity in the amino acid substitution process between sites is not modeled by commonly used empirical amino acid exchange matrices. Such model misspecification can lead to artefacts in phylogenetic estimation such as long-branch attraction. Although sophisticated site-heterogeneous mixture models have been developed to address this problem in both Bayesian and maximum likelihood (ML) frameworks, their formidable computational time and memory usage severely limits their use in large phylogenomic analyses. Here we propose a posterior mean site frequency (PMSF) method as a rapid and efficient approximation to full empirical profile mixture models for ML analysis. The PMSF approach assigns a conditional mean amino acid frequency profile to each site calculated based on a mixture model fitted to the data using a preliminary guide tree. These PMSF profiles can then be used for in-depth tree-searching in place of the full mixture model. Compared with widely used empirical mixture models with k classes, our implementation of PMSF in IQ-TREE (http://www.iqtree.org) speeds up the computation by approximately k/1.5-fold and requires a small fraction of the RAM. Furthermore, this speedup allows, for the first time, full nonparametric bootstrap analyses to be conducted under complex site-heterogeneous models on large concatenated data matrices. Our simulations and empirical data analyses demonstrate that PMSF can effectively ameliorate long-branch attraction artefacts. In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive.

Journal Article

Share this book

Add to My Shelf

Bayes factor biases for non-nested models and corrections

by SUSKO, Edward in Bayes factor , Bayesian analysis , Bias

2017

With the advent of simulation-based methods to obtain samples from posteriors and due to increases in computational power, Bayesian methods are increasingly applied to complex problems, sometimes providing the only available methods where likelihood implementations are difficult. As a consequence a large body of research in science and social science increasingly utilizes Bayesian tools, often applying them with default settings. A fundamental problem of interest is model selection, and Bayes factors provide a natural approach to Bayesian model selection. Using Laplace approximations and illustrative examples we demonstrate that Bayes factors can have strong biases toward particular models even in non-nested settings with the same number of parameters. Several easily implemented corrections are shown to provide effective cross-checks to default Bayes Factors. Grâce aux méthodes de simulation qui permettent d’obtenir des échantillons suivant la loi a posteriori et à la progression rapide de la capacité de calcul, les méthodes bayésiennes sont utilisées de plus en plus fréquemment pour résoudre des problémes complexes. Elles constituent parfois la seule approche possible lorsque le calcul de la vraisemblance est difficile. Par conséquent, de nombreux travaux de recherche en sciences naturelles et sociales utilisent des méthodes bayésiennes, souvent en conservant la valeur par défaut des paramètres. La sélection de modéle est un probléme fondamental d’intérêt et le facteur de Bayes offre une approche naturelle pour le résoudre. À l’aide d’une approximation de Laplace et d’exemples éloquents, l’auteur démontre que le facteur de Bayes peut comporter de forts biais envers certains modéles, même lorsque ceux-ci ne sont pas imbriqués et comportent le même nombre de paramétres. L’auteur montre que plusieurs correctifs faciles à mettre en place offrent une solution efficace aux problèmes du facteur de Bayes par défaut.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter