Catalogue Search | MBRL

Rethinking phylogenetic comparative methods

by Pennell, Matthew W. , Zenil-Ferguson, Rosana , Uyeda, Josef C. in Case studies , Classification - methods , Hypothesis testing

2018

As a result of the process of descent with modification, closely related species tend to be similar to one another in a myriad different ways. In statistical terms, this means that traits measured on one species will not be independent of traits measured on others. Since their introduction in the 1980s, phylogenetic comparative methods (PCMs) have been framed as a solution to this problem. In this article, we argue that this way of thinking about PCMs is deeply misleading. Not only has this sowed widespread confusion in the literature about what PCMs are doing but has led us to develop methods that are susceptible to the very thing we sought to build defenses against—unreplicated evolutionary events. Through three Case Studies, we demonstrate that the susceptibility to singular events is indeed a recurring problem in comparative biology that links several seemingly unrelated controversies. In each Case Study, we propose a potential solution to the problem. While the details of our proposed solutions differ, they share a common theme: unifying hypothesis testing with data-driven approaches (which we term “phylogenetic natural history”) to disentangle the impact of singular evolutionary events from that of the factors we are investigating. More broadly, we argue that our field has, at times, been sloppy when weighing evidence in support of causal hypotheses. We suggest that one way to refine our inferences is to re-imagine phylogenies as probabilistic graphical models; adopting this way of thinking will help clarify precisely what we are testing and what evidence supports our claims.

Journal Article

Share this book

Add to My Shelf

Long Branch Attraction Biases in Phylogenetics

by Susko, Edward , Roger, Andrew J. in Phylogeny , Points of View , Trees

2021

Long branch attraction (LBA) is a prevalent form of bias in phylogenetic estimation but the reasons for it are only partially understood. We argue here that it is largely due to differences in the sizes of the model spaces corresponding to different trees. Trees with long branches together allow much more flexible internal branch length parameter estimation. Consequently, although each tree has the same number of parameters, trees with long branches together have larger effective model spaces. The problem of LBA becomes particularly pronounced with partitioned data. Formulation of tree estimation as model selection leads us to propose bootstrap bias corrections as cross-checks on estimation when long branches end up being estimated together.

Journal Article

Share this book

Add to My Shelf

Defining Coalescent Genes

by Doyle, Jeff J. in Chloroplasts , Genes , Genomes

2022

The species tree paradigm that dominates current molecular systematic practice infers species trees from collections of sequences under assumptions of the multispecies coalescent (MSC), that is, that there is free recombination between the sequences and no (or very low) recombination within them. These coalescent genes (c-genes) are thus defined in an historical rather than molecular sense and can in theory be as large as an entire genome or as small as a single nucleotide. A debate about how to define c-genes centers on the contention that nuclear gene sequences used in many coalescent analyses undergo toomuch recombination, such that their introns comprise multiple c-genes, violating a key assumption of theMSC. Recently a similar argument has been made for the genes of plastid (e.g., chloroplast) and mitochondrial genomes, which for the last 30 or more years have been considered to represent a single c-gene for the purposes of phylogeny reconstruction because they are nonrecombining in an historical sense. Consequently, it has been suggested that these genomes should be analyzed using coalescent methods that treat their genes—over 70 protein-coding genes in the case of most plastid genomes (plastomes)—as independent estimates of species phylogeny, in contrast to the usual practice of concatenation, which is appropriate for generating gene trees. However, although recombination certainly occurs in the plastome, as has been recognized since the 1970’s, it is unlikely to be phylogenetically relevant. This is because such historically effective recombination can only occur when plastomes with incongruent histories are brought together in the same plastid. However, plastids sort rapidly into different cell lineages and rarely fuse. Thus, because of plastid biology, the plastome is a more canonical c-gene than is the average multi-intron mammalian nuclear gene. The plastome should thus continue to be treated as a single estimate of the underlying species phylogeny, as should the mitochondrial genome. The implications of this long-held insight of molecular systematics for studies in the phylogenomic era are explored.

Journal Article

Share this book

Add to My Shelf

Estimating Diversification Rates on Incompletely Sampled Phylogenies

by Chang, Jonathan , Alfaro, Michael E. , Rabosky, Daniel L. in Biodiversity , Classification - methods , Heterogeneity

2020

Molecular phylogenies are a key source of information about the tempo and mode of species diversification. However, most empirical phylogenies do not contain representatives of all species, such that diversification rates are typically estimated from incompletely sampled data. Most researchers recognize that incomplete sampling can lead to biased rate estimates, but the statistical properties of methods for accommodating incomplete sampling remain poorly known. In this point of view, we demonstrate theoretical concerns with the widespread use of analytical sampling corrections for sparsely sampled phylogenies of higher taxonomic groups. In particular, corrections based on “sampling fractions” can lead to low statistical power to infer rate variation when it is present, depending on the likelihood function used for inference. In the extreme, the sampling fraction correction can lead to spurious patterns of diversification that are driven solely by unbalanced sampling across the tree in concert with lowoverall power to infer shifts. Stochastic polytomy resolution provides an alternative to sampling fraction approaches that avoids some of these biases.We showthat stochastic polytomy resolvers can greatly improve the power of common analyses to estimate shifts in diversification rates.We introduce a new stochastic polytomy resolution method (Taxonomic Addition for Complete Trees [TACT]) that uses birth–death-sampling estimators across an ultrametric phylogeny to estimate branching times for unsampled taxa, with taxonomic information to compatibly place new taxa onto a backbone phylogeny. We close with practical recommendations for diversification inference under several common scenarios of incomplete sampling.

Journal Article

Share this book

Add to My Shelf

Best Practices for Justifying Fossil Calibrations

by Müller, Johannes , Smith, Krister T. , van Tuinen, Marcel in Animals , Calibration , Classification - methods

2012

Journal Article

Share this book

Add to My Shelf

The Multispecies Coalescent Over-Splits Species in the Case of Geographically Widespread Taxa

by Hillis, David M. , Chambers, E. Anne in Animals , Bayesian analysis , Classification - methods

2020

Many recent species delimitation studies rely exclusively on limited analyses of genetic data analyzed under the multispecies coalescent (MSC) model, and results from these studies often are regarded as conclusive support for taxonomic changes. However, most MSC-based species delimitation methods have well-known and often unmet assumptions. Uncritical application of these genetic-based approaches (without due consideration of sampling design, the effects of a priori group designations, isolation by distance, cytoplasmic–nuclear mismatch, and population structure) can lead to over-splitting of species. Here, we argue that in many common biological scenarios, researchers must be particularly cautious regarding these limitations, especially in cases of well-studied, geographically variable, and parapatrically distributed species complexes. We consider these points with respect to a historically controversial species group, the American milksnakes (Lampropeltis triangulum complex), using genetic data from a recent analysis (Ruane et al. 2014). We show that over-reliance on the program Bayesian Phylogenetics and Phylogeography, without adequate consideration of its assumptions and of sampling limitations, resulted in over-splitting of species in this study. Several of the hypothesized species of milksnakes instead appear to represent arbitrary slices of continuous geographic clines. We conclude that the best available evidence supports three, rather than seven, species within this complex. More generally, we recommend that coalescent-based species delimitation studies incorporate thorough analyses of geographic variation and carefully examine putative contact zones among delimited species before making taxonomic changes.

Journal Article

Share this book

Add to My Shelf

The Spectre of Too Many Species

by Leaché, Adam D. , Yang, Ziheng , Zhu, Tianqi in Allopatric populations , allopatry , Bayes Theorem

2019

Recent simulation studies examining the performance of Bayesian species delimitation as implemented in the BPP program have suggested that BPP may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here, we confirm these results and provide the mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model (PSM) has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the PSM is unrealistic as its mechanism for assigning species status assumes instantaneous speciation, contradicting prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in BPP tends to detect population splits when the amount of data (the number of loci) increases. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in BPP provide much more reliable inference under the gdi than the approximate method PHRAPL. We distinguish between Bayesian model selection and parameter estimation and suggest that the model selection approach is useful for identifying sympatric cryptic species, while the parameter estimation approach may be used to implement empirical criteria for determining species status among allopatric populations.

Journal Article

Share this book

Add to My Shelf

Phylogenetic Conflicts, Combinability, and Deep Phylogenomics in Plants

by Walker-Hale, Nathanael , Brown, Joseph W. , Smith, Stephen A. in Angiosperms , Classification - methods , Datasets

2020

Studies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a data set in order to resolve recalcitrant relationships and, importantly, identify what the data set is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods.We demonstrate these on a large phylogenomic plant data set. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and themonophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the lownumber of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific data set to address deep phylogenetic relationships while also identifying the inferential boundaries of the data set.

Journal Article

Share this book

Add to My Shelf

From Integrative Taxonomy to Species Description: One Step Beyond

by Puillandre, N. , Pante, E. , Schoelinck, C. in Animals , Classification - methods , Journal Impact Factor

2015

Journal Article

Share this book

Add to My Shelf

Analysis of Paralogs in Target Enrichment Data Pinpoints Multiple Ancient Polyploidy Events in Alchemilla s.l. (Rosaceae)

by Yang, Ya , Morales-Briones, Diego F , Liston, A in Alchemilla , Allopolyploidy , Automation

2022

Target enrichment is becoming increasingly popular for phylogenomic studies. Although baits for enrichment are typically designed to target single-copy genes, paralogs are often recovered with increased sequencing depth, sometimes from a significant proportion of loci, especially in groups experiencing whole-genome duplication (WGD) events. Common approaches for processing paralogs in target enrichment data sets include random selection, manual pruning, and mainly, the removal of entire genes that show any evidence of paralogy. These approaches are prone to errors in orthology inference or removing large numbers of genes. By removing entire genes, valuable information that could be used to detect and place WGD events is discarded. Here, we used an automated approach for orthology inference in a target enrichment data set of 68 species of Alchemilla s.l. (Rosaceae), a widely distributed clade of plants primarily from temperate climate regions. Previous molecular phylogenetic studies and chromosome numbers both suggested ancient WGDs in the group. However, both the phylogenetic location and putative parental lineages of these WGD events remain unknown. By taking paralogs into consideration and inferring orthologs from target enrichment data, we identified four nodes in the backbone of Alchemilla s.l. with an elevated proportion of gene duplication. Furthermore, using a gene-tree reconciliation approach, we established the autopolyploid origin of the entire Alchemilla s.l. and the nested allopolyploid origin of four major clades within the group. Here, we showed the utility of automated tree-based orthology inference methods, previously designed for genomic or transcriptomic data sets, to study complex scenarios of polyploidy and reticulate evolution from target enrichment data sets.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter