Catalogue Search | MBRL

IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era

by Arndt von Haeseler , Bui, Quang Minh , Chernomor, Olga in Genomics , Inference , Intelligence

2020

IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.

Journal Article

Share this book

Add to My Shelf

How much can reticulate evolution entangle plant systematics? Revisiting subfamilial classification of the Malvatheca clade (Malvaceae) on the basis of phylogenomics

by Carvalho-Sobrinho, Jefferson , Karimi, Nisa , Costa, Lucas in bombacoideae , malvoideae , matisioideae

2026

Reticulate evolution (RE), involving hybridization and related processes, generates network-like rather than strictly bifurcating relationships among lineages and can obscure phylogenetic relationships. Detecting ancient hybridization is particularly challenging, as genomic signals may erode over time. The Malvatheca clade (Malvaceae), marked by multiple paleopolyploidy events since it’s estimated origin 66 my, offers a useful model for examining RE. Its three subfamilies—Bombacoideae (with high chromosome numbers, mostly trees), Malvoideae (lower chromosome numbers, mostly herbs), and the recently described Matisioideae—show unresolved relationships, with several taxa of uncertain placement. We conducted a phylogenomic analysis of 69 Malvatheca species via complete plastomes, 35S rDNA cistrons, nuclear low copy genes and comparative repeatome data. Most of the datasets consistently resolved four clades: (I) Bombacoideae, (II) Malvoideae, (III) Matisioideae, and (IV) a heterogeneous assemblage including representatives of Malvoideae, Matisioideae and several incertae sedis taxa. Chromosome numbers were negatively correlated with repeatome diversity: Bombacoideae presented higher counts but lower repeat diversity, possibly reflecting slower repeat evolution associated with woody growth forms. In contrast, clades III and IV showed marked heterogeneity in both chromosome number and repeat composition, which is consistent with a reticulate origin. Overall, our results show evidence of ancient hybridization and polyploidy in shaping Malvatheca evolution. These results highlight that reticulation and genome dynamics, rather than taxonomic boundaries alone, are central to understanding the diversification of Malvatheca.

Journal Article

Share this book

Add to My Shelf

MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing

by Kumar, Sudhir , Tamura, Koichiro , Suleski, Michael in Analysis , Computing time , Datasets

2024

Abstract We introduce the 12th version of the Molecular Evolutionary Genetics Analysis (MEGA12) software. This latest version brings many significant improvements by reducing the computational time needed for selecting optimal substitution models and conducting bootstrap tests on phylogenies using maximum likelihood (ML) methods. These improvements are achieved by implementing heuristics that minimize likely unnecessary computations. Analyses of empirical and simulated datasets show substantial time savings by using these heuristics without compromising the accuracy of results. MEGA12 also links-in an evolutionary sparse learning approach to identify fragile clades and associated sequences in evolutionary trees inferred through phylogenomic analyses. In addition, this version includes fine-grained parallelization for ML analyses, support for high-resolution monitors, and an enhanced Tree Explorer. MEGA12 can be downloaded from https://www.megasoftware.net.

Journal Article

Share this book

Add to My Shelf

New Methods to Calculate Concordance Factors for Phylogenomic Datasets

by Bui, Quang Minh , Lanfear, Robert , Hahn, Matthew W in Trees

2020

We implement two measures for quantifying genealogical concordance in phylogenomic data sets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of “decisive” gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org/doc/Concordance-Factor, last accessed May 13, 2020).

Journal Article

Share this book

Add to My Shelf

Plastid phylogenomic analysis of green plants

by Ruhfel, Brad R. , Soltis, Pamela S. , Wong, Gane K.-S. in Amborella , Amino Acid Sequence , Amino Acids

2018

Premise of the Study For the past one billion years, green plants (Viridiplantae) have dominated global ecosystems, yet many key branches in their evolutionary history remain poorly resolved. Using the largest analysis of Viridiplantae based on plastid genome sequences to date, we examined the phylogeny and implications for morphological evolution at key nodes. Methods We analyzed amino acid sequences from protein‐coding genes from complete (or nearly complete) plastomes for 1879 taxa, including representatives across all major clades of Viridiplantae. Much of the data used was derived from transcriptomes from the One Thousand Plants Project (1KP); other data were taken from GenBank. Key Results Our results largely agree with previous plastid‐based analyses. Noteworthy results include (1) the position of Zygnematophyceae as sister to land plants (Embryophyta), (2) a bryophyte clade (hornworts, mosses + liverworts), (3) Equisetum + Psilotaceae as sister to Marattiales + leptosporangiate ferns, (4) cycads + Ginkgo as sister to the remaining extant gymnosperms, within which Gnetophyta are placed within conifers as sister to non‐Pinaceae (Gne‐Cup hypothesis), and (5) Amborella, followed by water lilies (Nymphaeales), as successive sisters to all other extant angiosperms. Within angiosperms, there is support for Mesangiospermae, a clade that comprises magnoliids, Chloranthales, monocots, Ceratophyllum, and eudicots. The placements of Ceratophyllum and Dilleniaceae remain problematic. Within Pentapetalae, two major clades (superasterids and superrosids) are recovered. Conclusions This plastid data set provides an important resource for elucidating morphological evolution, dating divergence times in Viridiplantae, comparisons with emerging nuclear phylogenies, and analyses of molecular evolutionary patterns and dynamics of the plastid genome.

Journal Article

Share this book

Add to My Shelf

High-Throughput Genomic Data in Systematics and Phylogenetics

by Lemmon, Emily Moriarty , Lemmon, Alan R. in Accuracy , Biological taxonomies , Comparative analysis

2013

High-throughput genomic sequencing is rapidly changing the field of phylogenetics by decreasing the cost and increasing the quantity and rate of data collection by several orders of magnitude. This deluge of data is exerting tremendous pressure on downstream data-analysis methods providing new opportunities for method development. In this review, we present (a) recent advances in laboratory methods for collection of high-throughput phylogenetic data and (b) challenges and constraints for phylogenetic analysis of these data. We compare the merits of multiple laboratory approaches, compare methods of data analysis, and offer recommendations for the most promising protocols and data-analysis workflows currently available for phylogenetics. We also discuss several strategies for increasing accuracy, with an emphasis on locus selection and proper model choice.

Journal Article

Share this book

Add to My Shelf

Revisiting metazoan phylogeny with genomic sampling of all phyla

by Combosch, David , Laumer, Christopher E. , Fernández, Rosa in Animals , Classification , Evolution

2019

Proper biological interpretation of a phylogeny can sometimes hinge on the placement of key taxa—or fail when such key taxa are not sampled. In this light, we here present the first attempt to investigate (though not conclusively resolve) animal relationships using genome-scale data from all phyla. Results from the site-heterogeneous CAT + GTR model recapitulate many established major clades, and strongly confirm some recent discoveries, such as a monophyletic Lophophorata, and a sister group relationship between Gnathifera and Chaetognatha, raising continued questions on the nature of the spiralian ancestor. We also explore matrix construction with an eye towards testing specific relationships; this approach uniquely recovers support for Panarthropoda, and shows that Lophotrochozoa (a subclade of Spiralia) can be constructed in strongly conflicting ways using different taxon- and/or orthologue sets. Dayhoff-6 recoding sacrifices information, but can also reveal surprising outcomes, e.g. full support for a clade of Lophophorata and Entoprocta + Cycliophora, a clade of Placozoa + Cnidaria, and raising support for Ctenophora as sister group to the remaining Metazoa, in a manner dependent on the gene and/or taxon sampling of the matrix in question. Future work should test the hypothesis that the few remaining uncertainties in animal phylogeny might reflect violations of the various stationarity assumptions used in contemporary inference methods.

Journal Article

Share this book

Add to My Shelf

ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees

by Zhang, Chao , Rabiee, Maryam , Sayyari, Erfan in Algorithms , Animals , ASTRAL

2018

Background Evolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions. Results We introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species ( n ) and the number of genes ( k ). ASTRAL-III limits the bipartition constraint set ( X ) to grow at most linearly with n and k . Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is O ( nk ) 1.726 D where D = O ( nk ) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results. Conclusions ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy.

Journal Article

Share this book

Add to My Shelf

Disentangling Sources of Gene Tree Discordance in Phylogenomic Data Sets

by Kadereit, Gudrun , Tefarikis, Delphine T. , Yim, Won C. in Amaranthaceae , BASIC BIOLOGICAL SCIENCES , Data processing

2021

Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. thatwas previously supported bymorphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.

Journal Article

Share this book

Add to My Shelf

MACSE v2: Toolkit for the Alignment of Coding Sequences Accounting for Frameshifts and Stop Codons

by Chantret, Nathalie , Ranwez, Vincent , Douzery, Emmanuel J P in Algorithms , Alignment , Amino acids

2018

Multiple sequence alignment is a prerequisite for many evolutionary analyses. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that explicitly accounts for the underlying codon structure of protein-coding nucleotide sequences. Its unique characteristic allows building reliable codon alignments even in the presence of frameshifts. This facilitates downstream analyses such as selection pressure estimation based on the ratio of nonsynonymous to synonymous substitutions. Here, we present MACSE v2, a major update with an improved version of the initial algorithm enriched with a complete toolkit to handle multiple alignments of protein-coding sequences. A graphical interface now provides user-friendly access to the different subprograms.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter