Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
134
result(s) for
"Pevzner, Pavel"
Sort by:
Assembly of long, error-prone reads using repeat graphs
2019
Accurate genome assembly is hampered by repetitive regions. Although long single molecule sequencing reads are better able to resolve genomic repeats than short-read data, most long-read assembly algorithms do not provide the repeat characterization necessary for producing optimal assemblies. Here, we present Flye, a long-read assembly algorithm that generates arbitrary paths in an unknown repeat graph, called disjointigs, and constructs an accurate repeat graph from these error-riddled disjointigs. We benchmark Flye against five state-of-the-art assemblers and show that it generates better or comparable assemblies, while being an order of magnitude faster. Flye nearly doubled the contiguity of the human genome assembly (as measured by the NGA50 assembly quality metric) compared with existing assemblers.Flye improves the speed and accuracy of genome assembly by using repeat graphs to resolve repeat regions.
Journal Article
Assembly of long error-prone reads using de Bruijn graphs
by
Lin, Yu
,
Pevzner, Pavel A.
,
Kolmogorov, Mikhail
in
Biological Sciences
,
Biophysics and Computational Biology
,
Computer science
2016
The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.
Journal Article
Automated assembly of centromeres from ultra-long error-prone reads
2020
Centromeric variation has been linked to cancer and infertility, but centromere sequences contain multiple tandem repeats and can only be assembled manually from long error-prone reads. Here we describe the centroFlye algorithm for centromere assembly using long error-prone reads, and apply it to assemble human centromeres on chromosomes 6 and X. Our analyses reveal putative breakpoints in the manual reconstruction of the human X centromere, demonstrate that human X chromosome is partitioned into repeat subfamilies and provide initial insights into centromere evolution. We anticipate that centroFlye could be applied to automatically close remaining multimegabase gaps in the reference human genome.CentroFlye resolves tandem repeats to assemble human centromeres from nanopore reads.
Journal Article
viralFlye: assembling viruses and identifying their hosts from long-read metagenomics data
by
Antipov, Dmitry
,
Pevzner, Pavel A.
,
Kolmogorov, Mikhail
in
Animal Genetics and Genomics
,
Assembly
,
Bioinformatics
2022
Although the use of long-read sequencing improves the contiguity of assembled viral genomes compared to short-read methods, assembling complex viral communities remains an open problem. We describe the viralFlye tool for identification and analysis of metagenome-assembled viruses in long-read assemblies. We show it significantly improves viral assemblies and demonstrate that long-reads result in a much larger array of predicted virus-host associations as compared to short-read assemblies. We demonstrate that the identification of novel CRISPR arrays in bacterial genomes from a newly assembled metagenomic sample provides information for predicting novel hosts for novel viruses.
Journal Article
How to apply de Bruijn graphs to genome assembly
by
Compeau, Phillip E C
,
Pevzner, Pavel A
,
Tesler, Glenn
in
631/114/2785/2302
,
631/61/514
,
639/705/1041
2011
A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling a contiguous genome from billions of short sequencing reads into a tractable computational problem. The development of algorithmic ideas for next-generation sequencing is examined.
Journal Article
Dereplication of microbial metabolites through database search of mass spectra
2018
Natural products have traditionally been rich sources for drug discovery. In order to clear the road toward the discovery of unknown natural products, biologists need dereplication strategies that identify known ones. Here we report DEREPLICATOR+, an algorithm that improves on the previous approaches for identifying peptidic natural products, and extends them for identification of polyketides, terpenes, benzenoids, alkaloids, flavonoids, and other classes of natural products. We show that DEREPLICATOR+ can search all spectra in the recently launched Global Natural Products Social molecular network and identify an order of magnitude more natural products than previous dereplication efforts. We further demonstrate that DEREPLICATOR+ enables cross-validation of genome-mining and peptidogenomics/glycogenomics results.
New natural products can be identified via mass spectrometry by excluding all known ones from the analysis, a process called dereplication. Here, the authors extend a previously published dereplication algorithm to different classes of secondary metabolites.
Journal Article
Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads
by
Antipov, Dmitry
,
Pevzner, Pavel A.
,
Kolmogorov, Mikhail
in
631/114/2785/2302
,
631/61/212/2302
,
Accuracy
2022
Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large
k
-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large
k
-mer sizes and transforms it into a multiplex de Bruijn graph with varying
k
-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.
A multiplex de Bruijn graph algorithm allows high-accuracy genome assembly from long, high-fidelity reads.
Journal Article
Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities
by
Pevzner, Pavel A.
,
Shin, Sung Bong
,
Smith, Timothy P. L.
in
631/114/2785
,
631/208/728
,
631/326/325/2482
2022
Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs. To resolve closely related strains (lineages), we developed MAGPhase, which separates lineages of related organisms by discriminating variant haplotypes across hundreds of kilobases of genomic sequence. MAGPhase identified 220 lineage-resolved MAGs in our dataset. The ability to resolve closely related microbes in complex microbial communities improves the identification of biosynthetic gene clusters and the precision of assigning mobile genetic elements to host genomes. We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 (298) potential host–viral (host–plasmid) associations using Hi-C data.
Metagenome sequencing can now distinguish closely related microbes using long reads and haplotype phasing.
Journal Article
metaFlye: scalable long-read metagenome assembly using repeat graphs
by
Behsaz, Bahar
,
Pevzner, Pavel A.
,
Gurevich, Alexey
in
631/114/2785/2302
,
631/326/2565/2142
,
Algorithms
2020
Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products.
Long-read metagenomics offers a valuable approach for profiling bacterial communities. This work presents a long-read assembler, metaFlye, that specifically addresses the challenges of assembling metagenomes.
Journal Article
Single-molecule protein identification by sub-nanopore sensors
by
Pevzner, Pavel A.
,
Kolmogorov, Mikhail
,
Timp, Gregory
in
Algorithms
,
Artificial intelligence
,
Bacteria
2017
Recent advances in top-down mass spectrometry enabled identification of intact proteins, but this technology still faces challenges. For example, top-down mass spectrometry suffers from a lack of sensitivity since the ion counts for a single fragmentation event are often low. In contrast, nanopore technology is exquisitely sensitive to single intact molecules, but it has only been successfully applied to DNA sequencing, so far. Here, we explore the potential of sub-nanopores for single-molecule protein identification (SMPI) and describe an algorithm for identification of the electrical current blockade signal (nanospectrum) resulting from the translocation of a denaturated, linearly charged protein through a sub-nanopore. The analysis of identification p-values suggests that the current technology is already sufficient for matching nanospectra against small protein databases, e.g., protein identification in bacterial proteomes.
Journal Article