Catalogue Search | MBRL

MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets

by Kumar, Sudhir , Tamura, Koichiro , Stecher, Glen in Datasets , Evolutionary genetics , Family trees

2016

We present the latest version of the Molecular Evolutionary Genetics Analysis (M ega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, M ega has been optimized for use on 64-bit computing systems for analyzing larger datasets. Researchers can now explore and analyze tens of thousands of sequences in M ega . The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit M ega is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OS X. The command line M ega is available as native applications for Windows, Linux, and Mac OS X. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.

Journal Article

Share this book

Add to My Shelf

MACSE v2: Toolkit for the Alignment of Coding Sequences Accounting for Frameshifts and Stop Codons

by Chantret, Nathalie , Ranwez, Vincent , Douzery, Emmanuel J P in Algorithms , Alignment , Amino acids

2018

Multiple sequence alignment is a prerequisite for many evolutionary analyses. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that explicitly accounts for the underlying codon structure of protein-coding nucleotide sequences. Its unique characteristic allows building reliable codon alignments even in the presence of frameshifts. This facilitates downstream analyses such as selection pressure estimation based on the ratio of nonsynonymous to synonymous substitutions. Here, we present MACSE v2, a major update with an improved version of the initial algorithm enriched with a complete toolkit to handle multiple alignments of protein-coding sequences. A graphical interface now provides user-friendly access to the different subprograms.

Journal Article

Share this book

Add to My Shelf

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

by Huerta-Cepas, Jaime , Luis Pedro Coelho , slund, Kristoffer in Annotations , Genomes , Homology

2017

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.

Journal Article

Share this book

Add to My Shelf

Using PhyloSuite for molecular phylogeny and tree‐based analyses

by Zou, Hong , Zhang, Dong , Xiang, Chuan‐Yu in Algorithms , annotation , concatenation

2023

Phylogenetic analysis has entered the genomics (multilocus) era. For less experienced researchers, conquering the large number of software programs required for a multilocus‐based phylogenetic reconstruction can be somewhat daunting and time‐consuming. PhyloSuite, a software with a user‐friendly GUI, was designed to make this process more accessible by integrating multiple software programs needed for multilocus and single‐gene phylogenies and further streamlining the whole process. In this protocol, we aim to explain how to conduct each step of the phylogenetic pipeline and tree‐based analyses in PhyloSuite. We also present a new version of PhyloSuite (v1.2.3), wherein we fixed some bugs, made some optimizations, and introduced some new functions, including a number of tree‐based analyses, such as signal‐to‐noise calculation, saturation analysis, spurious species identification, and etc. The step‐by‐step protocol includes background information (i.e., what the step does), reasons (i.e., why do the step), and operations (i.e., how to do it). This protocol will help researchers quick‐start their way through the multilocus phylogenetic analysis, especially those interested in conducting organelle‐based analyses. A new release of PhyloSuite, capable of conducting tree‐based analyses. Detailed guidelines for each step of phylogenetic and tree‐based analyses, following the “What? Why? and How?” structure. This protocol will help beginners learn how to conduct multilocus phylogenetic analyses and help experienced scientists improve their efficiency. Highlights A new release of PhyloSuite, capable of conducting tree‐based analyses. Detailed guidelines for each step of phylogenetic and tree‐based analyses, following the “What, Why, and How” structure. This protocol will help beginners learn how to conduct multilocus phylogenetic analyses and help experienced scientists improve their efficiency.

Journal Article

Share this book

Add to My Shelf

Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega

by Thompson, Julie D , Lopez, Rodrigo , McWilliam, Hamish in Algorithms , Alignment , Amino Acid Sequence

2011

Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high‐quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high‐quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam. Multiple sequence alignments are fundamental to many sequence analysis methods. The new program Clustal Omega can align virtually any number of protein sequences quickly and has powerful features for adding sequences to existing precomputed alignments.

Journal Article

Share this book

Add to My Shelf

Progressive Cactus is a multiple-genome aligner for the thousand-genome era

by Jarvis, Erich D. , Fiddes, Ian T. , Armstrong, Joel in 631/114/2785 , 631/114/739 , 631/114/794

2020

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies 1 – 3 . For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database 4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies 5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus 6 , a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far. The Progressive Cactus program can create reference-free alignments of hundreds of large vertebrate genomes efficiently, and is used for the alignment of more than 600 amniote genomes.

Journal Article

Share this book

Add to My Shelf

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

by Ledergerber, Christian , Herrero, Javier , Gil, Manuel in Algorithms , Classification - methods , Comparative analysis

2015

Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms,

Journal Article

Share this book

Add to My Shelf

DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment

by Wright, Erik S. in Algorithms , Amino Acid Sequence , Benchmarks

2015

Background Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments. Results Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on large sequence sets. Conclusions Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins. Moreover, secondary structure predictions increase in accuracy as more sequences are used in the prediction. This enables the scalable generation of large sequence alignments that maintain high accuracy even on diverse sequence sets. The DECIPHER R package and source code are freely available for download at DECIPHER.cee.wisc.edu and from the Bioconductor repository.

Journal Article

Share this book

Add to My Shelf

LCSkPOA: enabling banded semi-global partial order alignments via efficient and accurate backbone generation through extended LCSk

by Weerakoon, Minindu , Saunders, Christopher T. , Heaton, Haynes in Algorithms , Amino acid sequence , Bioinformatics

2025

Background Most multiple sequence alignment and string-graph alignment algorithms focus on global alignment, but many applications exist for semi-global and local string-graph alignment. Long reads require enormous amounts of memory and runtime to fill out large dynamic programming tables. Effective algorithms for finding the backbone and thus defining a band of an alignment such as the longest common subsequence with kmer matches (LCSk++) exist but do not work with graphs. This study introduces an adaptation of the Longest Common Subsequence with kmer matches (LCSk++) algorithm tailored for graph structures, particularly focusing on Partial Order Alignment (POA) graphs. POA graphs, which are directed acyclic graphs, represent multiple sequence alignments and effectively capture the relationships between sequences. State-of-the-art methods like ABPOA and SPOA improve upon POA, while ABPOA incorporates banding, SPOA does not; however, neither utilizes parallel processing despite leveraging SIMD for faster matrix calculations. Our approach addresses these limitations by extending the LCSk++ algorithm to handle the complexities of graph-based alignment while incorporating SIMD, banding, and parallel processing for enhanced efficiency. Results Our extended LCSk++ algorithm integrates dynamic programming and graph traversal techniques to detect conserved regions within POA graphs, termed the LCSk++ backbone. This backbone enables precise banding of the POA matrix for all alignment modes (global, semi-global, and local). Unlike ABPOA, which only allows banded global alignment, our approach enables broader flexibility and significantly improves consensus sequence construction. While supporting more alignment modes than ABPOA, it also outperforms SPOA’s global alignment, with substantial memory savings (up to 98%) and significant run-time reductions (up to 25x), particularly for long sequences (> 30,000 bp). Our method maintains high alignment accuracy and proves effective across various string lengths and datasets, including synthetic and PacBio HiFi reads. Parallel processing further enhances runtime efficiency, achieving up to 150x speed improvements on conventional PCs. Conclusion The extended LCSk++ algorithm for graph structures offers a substantial advancement in sequence alignment technology. It effectively reduces memory consumption and optimizes run times without compromising alignment quality, thus providing a robust solution for all alignment modes (global, local, and semi-global) in POA. This method enhances the utility of POA in critical applications such as multiple sequence alignment for phylogeny construction and graph-based reference alignment.

Journal Article

Share this book

Add to My Shelf

Introducing difference recurrence relations for faster semi-global alignment of long sequences

by Kasahara, Masahiro , Suzuki, Hajime in Algorithms , Alignment , Base sequence

2018

Background The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith–Waterman–Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. Results We proposed a faster semi-global alignment algorithm, “difference recurrence relations,” that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Conclusions Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter