Catalogue Search | MBRL

GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes

by Li, De-Zhu , Jin, Jian-Jun , Yang, Jun-Bo in Animal Genetics and Genomics , Assembler , Assembly graph

2020

GetOrganelle is a state-of-the-art toolkit to accurately assemble organelle genomes from whole genome sequencing data. It recruits organelle-associated reads using a modified “baiting and iterative mapping” approach, conducts de novo assembly, filters and disentangles the assembly graph, and produces all possible configurations of circular organelle genomes. For 50 published plant datasets, we are able to reassemble the circular plastomes from 47 datasets using GetOrganelle. GetOrganelle assemblies are more accurate than published and/or NOVOPlasty-reassembled plastomes as assessed by mapping. We also assemble complete mitochondrial genomes using GetOrganelle. GetOrganelle is freely released under a GPL-3 license ( https://github.com/Kinggerm/GetOrganelle ).

Journal Article

Share this book

Add to My Shelf

Oatk: a de novo assembly tool for complex plant organelle genomes

by Zhou, Chenxi , Brown, Max , Blaxter, Mark in Animal Genetics and Genomics , Assembly graph , Bioinformatics

2025

Plant organelle genomes, particularly large mitochondrial genomes with complex repeats, present significant challenges for assembly. The advent of long-read sequencing enables the assembly of complete genomes, but problems of resolving alternative structures remain. Here we introduce a novel tool that employs a syncmer-based assembler for rapid assembly graph construction, integrates a profile-HMM database for robust organelle identification, and leverages a new search method to find the best supported path through the assembly graph. We describe high-quality organelle assemblies for 195 plant species, demonstrating improvements over other methods, and providing multiple insights into structural complexity, heteroplasmy, and DNA exchange between organelles.

Journal Article

Share this book

Add to My Shelf

METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs

by Zhang, Zhenmiao , Zhang, Lu in Algorithms , Analysis , Assembly

2021

Background Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. Results We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Conclusions Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.

Journal Article

Share this book

Add to My Shelf

STRONG: metagenomics strain resolution on assembly graphs

by Quince, Christopher , James, Robert , Eren, A. Murat in Algorithms , Animal Genetics and Genomics , Assembly graph

2021

We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.

Journal Article

Share this book

Add to My Shelf

Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing

by Henglin, Mir , Koren, Sergey , Ebert, Peter in Animal Genetics and Genomics , Assembly graph , Bioinformatics

2024

Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.

Journal Article

Share this book

Add to My Shelf

SPAligner: alignment of long diverged molecular sequences to assembly graphs

by Dvorkina, Tatiana , Antipov, Dmitry , Nurk, Sergey in Algorithms , Alignment , Amino acids

2020

Background Graph-based representation of genome assemblies has been recently used in different contexts — from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts — in particular the analysis of metagenomic sequencing data. Results In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets. Conclusions Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.

Journal Article

Share this book

Add to My Shelf

SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs

by Pevzner, Pavel , Tolstoganov, Ivan , Korobeynikov, Anton in Animals , Assembly graph , Bar codes

2024

Recent advances in long-read sequencing technologies enabled accurate and contiguous assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform them into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes. We developed the SpLitteR tool for diploid genome assembly using linked-reads and assembly graphs and benchmarked it against state-of-the-art linked-read scaffolders ARKS and SLR-superscaffolder using human HG002 genome and sheep gut microbiome datasets. The benchmark showed that SpLitteR scaffolding results in 1.5-fold increase in NGA50 compared to the baseline LJA assembly and other scaffolders while introducing no additional misassemblies on the human dataset. We developed the SpLitteR tool for assembly graph phasing and scaffolding using barcoded linked-reads. We benchmarked SpLitteR on assembly graphs produced by various long-read assemblers and have demonstrated that TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics. SpLitteR is implemented in C++ as a part of the freely available SPAdes package and is available at https://github.com/ablab/spades/releases/tag/splitter-preprint.

Journal Article

Share this book

Add to My Shelf

Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing

by Meyer, Sebastian A. , Dilthey, Alexander T. , Kaasch, Achim J. in Accuracy , Animal Genetics and Genomics , Assembly graph

2020

Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequencing cost and hands-on time of Nanopore sequencing for hybrid assembly by at least 50% compared to molecular barcoding while maintaining high assembly quality. Ultraplexing requires the availability of Illumina data and uses inter-sample genetic variability to assign reads to isolates, which obviates the need for molecular barcoding. Thus, Ultraplexing can enable significant sequencing and labor cost reductions in large-scale bacterial genome projects.

Journal Article

Share this book

Add to My Shelf

Ariadne: synthetic long read deconvolution using assembly graphs

by Barakzai, Waris N. , Danko, David C. , Belchikov, Natan in Algorithms , Animal Genetics and Genomics , Ariadne

2023

Synthetic long read sequencing techniques such as UST’s TELL-Seq and Loop Genomics’ LoopSeq combine 3 ′ barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3 ′ unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.

Journal Article

Share this book

Add to My Shelf

Improving metagenomic binning results with overlapped bins using assembly graphs

by Mallawaarachchi, Vijini G. , Lin, Yu , Wickramarachchi, Anuradha S. in Algorithms , Assembly , Assembly graphs

2021

Background Metagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for binning contigs only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). Results In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins. Conclusion GraphBin2 incorporates the coverage information into the assembly graph to refine the binning results obtained from existing binning tools. GraphBin2 also enables the detection of contigs that may belong to multiple species. We show that GraphBin2 outperforms its predecessor GraphBin on both simulated and real datasets. GraphBin2 is freely available at https://github.com/Vini2/GraphBin2 .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter