Catalogue Search | MBRL

by Aandahl, Zach , Brook, Barry W. , Richards, Shane A. in Bias , CONCEPTS & SYNTHESIS , cross validation

2023

Specifying, assessing, and selecting among candidate statistical models is fundamental to ecological research. Commonly used approaches to model selection are based on predictive scores and include information criteria such as Akaike’s information criterion, and cross validation. Based on data splitting, cross validation is particularly versatile because it can be used even when it is not possible to derive a likelihood (e.g., many forms of machine learning) or count parameters precisely (e.g., mixed-effects models). However, much of the literature on cross validation is technical and spread across statistical journals, making it difficult for ecological analysts to assess and choose among the wide range of options. Here we provide a comprehensive, accessible review that explains important—but often overlooked—technical aspects of cross validation for model selection, such as: bias correction, estimation uncertainty, choice of scores, and selection rules to mitigate overfitting. We synthesize the relevant statistical advances to make recommendations for the choice of cross-validation technique and we present two ecological case studies to illustrate their application. In most instances, we recommend using exact or approximate leave-one-out cross validation to minimize bias, or otherwise k-fold with bias correction if k < 10. To mitigate overfitting when using cross validation, we recommend calibrated selection via our recently introduced modified one-standard-error rule. We advocate for the use of predictive scores in model selection across a range of typical modeling goals, such as exploration, hypothesis testing, and prediction, provided that models are specified in accordance with the stated goal. We also emphasize, as others have done, that inference on parameter estimates is biased if preceded by model selection and instead requires a carefully specified single model or further technical adjustments.

Journal Article

Share this book

Add to My Shelf

Improved Maximum Parsimony Models for Phylogenetic Networks

by van Iersel, Leo , Jones, Mark , Scornavacca, Celine in algorithms , Bioinformatics , biological models

2018

Phylogenetic networks are well suited to represent evolutionary histories comprising reticulate evolution. Several methods aiming at reconstructing explicit phylogenetic networks have been developed in the last two decades. In this article, we propose a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modeled by the definitions currently present in the literature (namely, the “hardwired” and “softwired” parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.

Journal Article

Share this book

Add to My Shelf

Inferring Phylogenetic Networks Using PhyloNet

by Wen, Dingqiao , Zhu, Jiafan , Nakhleh, Luay in Bayes Theorem , Bayesian analysis , Bayesian theory

2018

PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the “minimizing deep coalescences” criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

Journal Article

Share this book

Add to My Shelf

New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0

by Lefort, Vincent , Hordijk, Wim , Gascuel, Olivier in Accuracy , Algorithms , Amino acids

2010

PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira—Hasegawa—like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.

Journal Article

Share this book

Add to My Shelf

Method for Inferring the Rate of Evolution of Homologous Characters that Can Potentially Improve Phylogenetic Inference, Resolve Deep Divergence and Correct Systematic Biases

by McInerney, James O. , Cummins, Carla A. in Animals , Bias , Biological Evolution

2011

Current phylogenetic methods attempt to account for evolutionary rate variation across characters in a matrix. This is generally achieved by the use of sophisticated evolutionary models, combined with dense sampling of large numbers of characters. However, systematic biases and superimposed substitutions make this task very difficult. Model adequacy can sometimes be achieved at the cost of adding large numbers of free parameters, with each parameter being optimized according to some criterion, resulting in increased computation times and large variances in the model estimates. In this study, we develop a simple approach that estimates the relative evolutionary rate of each homologous character. The method that we describe uses the similarity between characters as a proxy for evolutionary rate. In this article, we work on the premise that if the character-state distribution of a homologous character is similar to many other characters, then this character is likely to be relatively slowly evolving. If the character-state distribution of a homologous character is not similar to many or any of the rest of the characters in a data set, then it is likely to be the result of rapid evolution. We show that in some test cases, at least, the premise can hold and the inferences are robust. Importantly, the method does not use a \"starting tree\" to make the inference and therefore is tree independent. We demonstrate that this approach can work as well as a maximum likelihood (ML) approach, though the ML method needs to have a known phylogeny, or at least a very good estimate of that phylogeny. We then demonstrate some uses for this method of analysis, including the improvement in phylogeny reconstruction for both deep-level and recent relationships and overcoming systematic biases such as base composition bias. Furthermore, we compare this approach to two well-established methods for reweighting or removing characters. These other methods are tree-based and we show that they can be systematically biased. We feel this method can be useful for phylogeny reconstruction, understanding evolutionary rate variation, and for understanding selection variation on different characters.

Journal Article

Share this book

Add to My Shelf

New Heuristic Methods for Joint Species Delimitation and Species Tree Inference

by O'Meara, Brian C. in Algorithms , Animals , Basidiomycota - classification

2010

Species delimitation and species tree inference are difficult problems in cases of recent divergence, especially when different loci have different histories. This paper quantifies the difficulty of jointly finding the division of samples to species and estimating a species tree without constraining the possible assignments a priori. It introduces a parametric and a nonparametric method, including new heuristic search strategies, to do this delimitation and tree inference using individual gene trees as input. The new methods were evaluated using thousands of simulations and 4 empirical data sets. These analyses suggest that the new methods, especially the nonparametric one, may provide useful insights for systematists working at the species level with molecular data. However, they still often return incorrect results.

Journal Article

Share this book

Add to My Shelf

Bounding the Softwired Parsimony Score of a Phylogenetic Network

by Döcker, Janosch , Linz, Simone , Wicke, Kristina in Algorithms , Alignment , Approximation

2024

In comparison to phylogenetic trees, phylogenetic networks are more suitable to represent complex evolutionary histories of species whose past includes reticulation such as hybridisation or lateral gene transfer. However, the reconstruction of phylogenetic networks remains challenging and computationally expensive due to their intricate structural properties. For example, the small parsimony problem that is solvable in polynomial time for phylogenetic trees, becomes NP-hard on phylogenetic networks under softwired and parental parsimony, even for a single binary character and structurally constrained networks. To calculate the parsimony score of a phylogenetic network N , these two parsimony notions consider different exponential-size sets of phylogenetic trees that can be extracted from N and infer the minimum parsimony score over all trees in the set. In this paper, we ask: What is the maximum difference between the parsimony score of any phylogenetic tree that is contained in the set of considered trees and a phylogenetic tree whose parsimony score equates to the parsimony score of N ? Given a gap-free sequence alignment of multi-state characters and a rooted binary level- k phylogenetic network, we use the novel concept of an informative blob to show that this difference is bounded by k + 1 times the softwired parsimony score of N . In particular, the difference is independent of the alignment length and the number of character states. We show that an analogous bound can be obtained for the softwired parsimony score of semi-directed networks, while under parental parsimony on the other hand, such a bound does not hold.

Journal Article

Share this book

Add to My Shelf

Why Concatenation Fails Near the Anomaly Zone

by Hahn, Matthew W. , Mendes, Fábio K. in Asymmetry , Best use , Branches

2018

Genome-scale sequencing has been of great benefit in recovering species trees but has not provided final answers. Despite the rapid accumulation of molecular sequences, resolving short and deep branches of the tree of life has remained a challenge and has prompted the development of new strategies that can make the best use of available data. One such strategy—the concatenation of gene alignments—can be successful when coupled with many tree estimation methods, but has also been shown to fail when there are high levels of incomplete lineage sorting. Here, we focus on the failure of likelihood-based methods in retrieving a rooted, asymmetric four-taxon species tree from concatenated data when the species tree is in or near the anomaly zone—a region of parameter space where the most common gene tree does not match the species tree because of incomplete lineage sorting. First, we use coalescent theory to prove that most informative sites will support the species tree in the anomaly zone, and that as a consequence maximum-parsimony succeeds in recovering the species tree from concatenated data. We further show that maximum-likelihood tree estimation from concatenated data fails both inside and outside the anomaly zone, and that this failure cannot be easily predicted from the topology of the most common gene tree. We demonstrate that likelihood-based methods often fail in a region partially overlapping the anomaly zone, likely because of the lower relative cost of substitutions on discordant gene tree branches that are absent from the species tree. Our results confirm and extend previous reports on the performance of these methods applied to concatenated data from a rooted, asymmetric four-taxon species tree, and highlight avenues for future work improving the performance of methods aimed at recovering species tree.

Journal Article

Share this book

Add to My Shelf

Molecular systematics and remodelling of Chirita and associated genera (Gesneriaceae)

by Weber, Anton , Möller, Michael , Middleton, David J. in Bayesian inference analysis , Biological taxonomies , Capsules

2011

The polyphyletic genus Chirita is remodelled after an extensive molecular phylogenetic study of species assigned to it and to other associated genera. Most of Chirita sect. Chirita and the monotypic Hemiboeopsis are amalgamated with Henckelia sect. Henckelia, resulting in a very differently circumscribed genus Henckelia and the synonymisation of Chirita. The remaining species of Chirita sect.Chirita are accommodated in the revived genus Damrongia. Chirita sect.Liebigia is recognised as the genus Liebigia. Chirita sect. Microchirita is recognised as the genus Microchirita. Chirita sect. Gibbosaccus is, together with Chiritopsis and Wentsaiboea, included in the originally monotypic and now enormously expanded genus Primulina. The necessary combinations are made and a general list showing the present accommodation of the species previously described under Chirita, Chiritopsis, Hemiboeopsis, Primulina and Wentsaiboea is provided.

Journal Article

Share this book

Add to My Shelf

Complete Generic-Level Phylogenetic Analyses of Palms (Arecaceae) with Comparisons of Supertree and Supermatrix Approaches

by Forest, Félix , Uhl, Natalie W. , Baker, William J. in Arecaceae , Arecaceae - classification , Arecaceae - genetics

2009

Supertree and supermatrix methods have great potential in the quest to build the tree of life and yet they remain controversial, with most workers opting for one approach or the other, but rarely both. Here, we employed both methods to construct phylogenetic trees of all genera of palms (Arecaceae/Palmae), an iconic angiosperm family of great economic importance. We assembled a supermatrix consisting of 16 partitions, comprising DNA sequence data, plastid restriction fragment length polymorphism data, and morphological data for all genera, from which a highly resolved and well-supported phylogenetic tree was built despite abundant missing data. To construct supertrees, we used variants of matrix representation with parsimony (MRP) analysis based on input trees generated directly from subsamples of the supermatrix. All supertrees were highly resolved. Standard MRP with bootstrap-weighted matrix elements performed most effectively in this case, generating trees with the greatest congruence with the supermatrix tree and fewest clades unsupported by any input tree. Nonindependence due to input trees based on combinations of data partitions was an acceptable trade-off for improvements in supertree performance. Irreversible MRP and the use of strictly independent input trees only provided no obvious benefits. Contrary to previous claims, we found that unsupported clades are not infrequent under some MRP implementations, with up to 13% of clades lacking support from any input tree in some irreversible MRP supertrees. To build a formal synthesis, we assessed the cross-corroboration between supermatrix trees and the variant supertrees using semistrict consensus, enumerating shared clades and compatible clades. The semistrict consensus of the supermatrix tree and the most congruent supertree contained 160 clades (of a maximum of 204), 137 of which were present in both trees. The relationships recovered by these trees strongly support the current phylogenetic classification of palms. We evaluate 2 composite supertree support measures (rQS and V) and conclude that it is more informative to report numbers of input trees that support or conflict with a given supertree clade. This study demonstrates that supertree and supermatrix methods can provide effective, explicit, and complimentary mechanisms for synthesizing disjointed phylogenetic evidence while emphasizing the need for further refinement of supertree methods.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter