Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
878
result(s) for
"Long reads"
Sort by:
Trycycler: consensus long-read assemblies for bacterial genomes
by
Méric, Guillaume
,
Holt, Kathryn E.
,
Hawkey, Jane
in
Animal Genetics and Genomics
,
automation
,
Bacterial genomics
2021
While long-read sequencing allows for the complete assembly of bacterial genomes, long-read assemblies contain a variety of errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking showed that Trycycler assemblies contained fewer errors than assemblies constructed with a single tool. Post-assembly polishing further reduced errors and Trycycler+polishing assemblies were the most accurate genomes in our study. As Trycycler requires manual intervention, its output is not deterministic. However, we demonstrated that multiple users converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools.
Journal Article
Accurate long-read de novo assembly evaluation with Inspector
by
Chong, Zechen
,
Zhang, Yixin
,
Wang, Amy Y.
in
Accuracy
,
Algorithms
,
Animal Genetics and Genomics
2021
Long-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.
Journal Article
DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation
by
Cheng, Albert
,
Rosikiewicz, Wojciech
,
Li, Sheng
in
5-Methylcytosine - analysis
,
Accuracy
,
Animal Genetics and Genomics
2021
Background
Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies.
Results
We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database (
https://nanome.jax.org
) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts.
Conclusions
Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.
Journal Article
Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing
by
Peng, Hongke
,
Amarasinghe, Shanika L.
,
Su, Shian
in
Alternative Splicing
,
Animal Genetics and Genomics
,
Animals
2021
A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (
FLAMES
) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.
Journal Article
The pan‐genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content
by
Torkamaneh, Davoud
,
Belzile, François
,
Lemay, Marc‐André
in
biotechnology
,
Cultivation
,
de novo assembly
2021
Summary
Studies on structural variation in plants have revealed the inadequacy of a single reference genome for an entire species and suggest that it is necessary to build a species‐representative genome called a pan‐genome to better capture the extent of both structural and nucleotide variation. Here, we present a pan‐genome of cultivated soybean (Glycine max), termed PanSoy, constructed using the de novo genome assembly of 204 phylogenetically and geographically representative improved accessions selected from the larger GmHapMap collection. PanSoy uncovers 108 Mb (˜11%) of novel nonreference sequences encompassing 3621 protein‐coding genes (including 1659 novel genes) absent from the soybean ‘Williams 82’ reference genome. Nonetheless, the core genome represents an exceptionally large proportion of the genome, with >90.6% of genes being shared by >99% of the accessions. A majority of PAVs encompassing genes could be confirmed with long‐read sequencing on a subset of accessions. The PanSoy is a major step towards capturing the extent of genetic variation in cultivated soybean and provides a resource for soybean genomics research and breeding.
Journal Article
A chromosome‐scale assembly of allotetraploid Brassica juncea (AABB) elucidates comparative architecture of the A and B genomes
2021
Summary
Brassica juncea (AABB), commonly referred to as mustard, is a natural allopolyploid of two diploid species—B. rapa (AA) and B. nigra (BB). We report a highly contiguous genome assembly of an oleiferous type of B. juncea variety Varuna, an archetypical Indian gene pool line of mustard, with ~100× PacBio single‐molecule real‐time (SMRT) long reads providing contigs with an N50 value of >5 Mb. Contigs were corrected for the misassemblies and scaffolded with BioNano optical mapping. We also assembled a draft genome of B. nigra (BB) variety Sangam using Illumina short‐read sequencing and Oxford Nanopore long reads and used it to validate the assembly of the B genome of B. juncea. Two different linkage maps of B. juncea, containing a large number of genotyping‐by‐sequencing markers, were developed and used to anchor scaffolds/contigs to the 18 linkage groups of the species. The resulting chromosome‐scale assembly of B. juncea Varuna is a significant improvement over the previous draft assembly of B. juncea Tumida, a vegetable type of mustard. The assembled genome was characterized for transposons, centromeric repeats, gene content and gene block associations. In comparison to the A genome, the B genome contains a significantly higher content of LTR/Gypsy retrotransposons, distinct centromeric repeats and a large number of B. nigra specific gene clusters that break the gene collinearity between the A and the B genomes. The B. juncea Varuna assembly will be of major value to the breeding work on oleiferous types of mustard that are grown extensively in south Asia and elsewhere.
Journal Article
PacBio sequencing of Glomeromycota rDNA
by
Kolaříková, Zuzana
,
Kohout, Petr
,
Slavíková, Renata
in
Arbuscular mycorrhizas
,
Archaeosporales
,
data collection
2021
• There is no consensus barcoding region for determination of arbuscular mycorrhizal fungal (AMF) taxa. To overcome this obstacle, we have developed an approach to sequence an AMF marker within the ribosome-encoding operon (rDNA) that covers all three widely applied variable molecular markers.
• Using a nested PCR approach specific to AMF, we amplified a part (c. 2.5 kb) of the rDNA spanning the majority of the small subunit rRNA (SSU) gene, the complete internal transcribed spacer (ITS) region and a part of the large subunit (LSU) rRNA gene. The PCR products were sequenced on the PacBio platform utilizing Single Molecule Real Time (SMRT) sequencing.
• Employing this method for selected environmental DNA samples, we were able to describe complex AMF communities consisting of various glomeromycotan lineages.
• We demonstrate the applicability of this new 2.5 kb approach to provide robust phylogenetic assignment of AMF lineages without known sequences from pure cultures and to consolidate information about AMF taxon distributions coming from three widely used barcoding regions into one integrative dataset.
Journal Article
Charting the complexity of the activated sludge microbiome through a hybrid sequencing strategy
by
Wang, Depeng
,
Cheng, Suk Hang
,
Zheng, Chunmiao
in
Accuracy
,
Activated sludge
,
Activated sludge microbiome
2021
Background
Long-read sequencing has shown its tremendous potential to address genome assembly challenges, e.g., achieving the first telomere-to-telomere assembly of a gapless human chromosome. However, many issues remain unresolved when leveraging error-prone long reads to characterize high-complexity metagenomes, for instance, complete/high-quality genome reconstruction from highly complex systems.
Results
Here, we developed an iterative haplotype-resolved hierarchical clustering-based hybrid assembly (HCBHA) approach that capitalizes on a hybrid (error-prone long reads and high-accuracy short reads) sequencing strategy to reconstruct (near-) complete genomes from highly complex metagenomes. Using the HCBHA approach, we first phase short and long reads from the highly complex metagenomic dataset into different candidate bacterial haplotypes, then perform hybrid assembly of each bacterial genome individually. We reconstructed 557 metagenome-assembled genomes (MAGs) with an average N50 of 574 Kb from a deeply sequenced, highly complex activated sludge (AS) metagenome. These high-contiguity MAGs contained 14 closed genomes and 111 high-quality (HQ) MAGs including full-length rRNA operons, which accounted for 61.1% of the microbial community. Leveraging the near-complete genomes, we also profiled the metabolic potential of the AS microbiome and identified 2153 biosynthetic gene clusters (BGCs) encoded within the recovered AS MAGs.
Conclusion
Our results established the feasibility of an iterative haplotype-resolved HCBHA approach to reconstruct (near-) complete genomes from highly complex ecosystems, providing new insights into “complete metagenomics”. The retrieved high-contiguity MAGs illustrated that various biosynthetic gene clusters (BGCs) were harbored in the AS microbiome. The high diversity of BGCs highlights the potential to discover new natural products biosynthesized by the AS microbial community, aside from the traditional function (e.g., organic carbon and nitrogen removal) in wastewater treatment.
9KSDkcVmw3WwVH4iLno4ie
Video Abstract
Journal Article
JAFFAL: detecting fusion genes with long-read transcriptome sequencing
by
Sadras, Teresa
,
Göke, Jonathan
,
Chen, Ying
in
Algorithms
,
Animal Genetics and Genomics
,
Bioinformatics
2022
In cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at
https://github.com/Oshlack/JAFFA/wiki
.
Journal Article
A high‐quality Brassica napus genome reveals expansion of transposable elements, subgenome evolution and disease resistance
2021
Summary
Rapeseed (Brassica napus L.) is a recent allotetraploid crop, which is well known for its high oil production. Here, we report a high‐quality genome assembly of a typical semi‐winter rapeseed cultivar, 'Zhongshuang11' (hereafter 'ZS11'), using a combination of single‐molecule sequencing and chromosome conformation capture (Hi‐C) techniques. Most of the high‐confidence sequences (93.1%) were anchored to the individual chromosomes with a total of 19 centromeres identified, matching the exact chromosome count of B. napus. The repeat sequences in the A and C subgenomes in B. napus expanded significantly from 500 000 years ago, especially over the last 100 000 years. These young and recently amplified LTR‐RTs showed dispersed chromosomal distribution but significantly preferentially clustered into centromeric regions. We exhaustively annotated the nucleotide‐binding leucine‐rich repeat (NLR) gene repertoire, yielding a total of 597 NLR genes in B. napus genome and 17.4% of which are paired (head‐to‐head arrangement). Based on the resequencing data of 991 B. napus accessions, we have identified 18 759 245 single nucleotide polymorphisms (SNPs) and detected a large number of genomic regions under selective sweep among the three major ecotype groups (winter, semi‐winter and spring) in B. napus. We found 49 NLR genes and five NLR gene pairs colocated in selective sweep regions with different ecotypes, suggesting a rapid diversification of NLR genes during the domestication of B. napus. The high quality of our B. napus 'ZS11' genome assembly could serve as an important resource for the study of rapeseed genomics and reveal the genetic variations associated with important agronomic traits.
Journal Article