Catalogue Search | MBRL

Opportunities and challenges in long-read sequencing data analysis

by Amarasinghe, Shanika L. , Su, Shian , Dong, Xueyi in Accuracy , Animal Genetics and Genomics , Animals

2020

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.

Journal Article

Share this book

Add to My Shelf

Homopolish: a method for the removal of systematic errors in nanopore sequencing by homologous polishing

by Shih, Pei-Wen , Huang, Yao-Ting , Liu, Po-Yu in Accuracy , Algorithms , Animal Genetics and Genomics

2021

Nanopore sequencing has been widely used for the reconstruction of microbial genomes. Owing to higher error rates, errors on the genome are corrected via neural networks trained by Nanopore reads. However, the systematic errors usually remain uncorrected. This paper designs a model that is trained by homologous sequences for the correction of Nanopore systematic errors. The developed program, Homopolish, outperforms Medaka and HELEN in bacteria, viruses, fungi, and metagenomic datasets. When combined with Medaka/HELEN, the genome quality can exceed Q50 on R9.4 flow cells. We show that Nanopore-only sequencing can produce high-quality microbial genomes sufficient for downstream analysis.

Journal Article

Share this book

Add to My Shelf

NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads

by Wang, Depeng , Sandoval, José R. , Hu, Jiang in Algorithms , Animal Genetics and Genomics , Bioinformatics

2024

Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.

Journal Article

Share this book

Add to My Shelf

Recent advances in the detection of base modifications using the Nanopore sequencer

by Seki, Masahide , Liu, Xu in Chromatin , Chromosomes , Deoxyribonucleic acid

2020

DNA and RNA modifications have important functions, including the regulation of gene expression. Existing methods based on short-read sequencing for the detection of modifications show difficulty in determining the modification patterns of single chromosomes or an entire transcript sequence. Furthermore, the kinds of modifications for which detection methods are available are very limited. The Nanopore sequencer is a single-molecule, long-read sequencer that can directly sequence RNA as well as DNA. Moreover, the Nanopore sequencer detects modifications on long DNA and RNA molecules. In this review, we mainly focus on base modification detection in the DNA and RNA of mammals using the Nanopore sequencer. We summarize current studies of modifications using the Nanopore sequencer, detection tools using statistical tests or machine learning, and applications of this technology, such as analyses of open chromatin, DNA replication, and RNA metabolism.

Journal Article

Share this book

Add to My Shelf

RNA modifications detection by comparative Nanopore direct RNA sequencing

by Migliori, Valentina , Capitanchik, Charlotte , Toolan-Kerr, Patrick in 38/91 , 631/114/794 , 631/1647/48

2021

RNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. In recent years, a growing number of PTMs have been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing has been shown to be sensitive to RNA modifications. We developed and validated Nanocompore, a robust analytical framework that identifies modifications from these data. Our strategy compares an RNA sample of interest against a non-modified control sample, not requiring a training set and allowing the use of replicates. We show that Nanocompore can detect different RNA modifications with position accuracy in vitro, and we apply it to profile m 6 A in vivo in yeast and human RNAs, as well as in targeted non-coding RNAs. We confirm our results with orthogonal methods and provide novel insights on the co-occurrence of multiple modified residues on individual RNA molecules. Nanopore direct RNA Sequencing data contain information about the presence of RNA modifications, but their detection poses substantial challenges. Here the authors introduce Nanocompore, a new methodology for modification detection from Nanopore data.

Journal Article

Share this book

Add to My Shelf

Sequencing accuracy and systematic errors of nanopore direct RNA sequencing

by Smyth, Redmond P. , Liu-Wei, Wang , van der Toorn, Wiep in Accuracy , Adenine , Algorithms

2024

Background Direct RNA sequencing (dRNA-seq) on the Oxford Nanopore Technologies (ONT) platforms can produce reads covering up to full-length gene transcripts, while containing decipherable information about RNA base modifications and poly-A tail lengths. Although many published studies have been expanding the potential of dRNA-seq, its sequencing accuracy and error patterns remain understudied. Results We present the first comprehensive evaluation of sequencing accuracy and characterisation of systematic errors in dRNA-seq data from diverse organisms and synthetic in vitro transcribed RNAs. We found that for sequencing kits SQK-RNA001 and SQK-RNA002, the median read accuracy ranged from 87% to 92% across species, and deletions significantly outnumbered mismatches and insertions. Due to their high abundance in the transcriptome, heteropolymers and short homopolymers were the major contributors to the overall sequencing errors. We also observed systematic biases across all species at the levels of single nucleotides and motifs. In general, cytosine/uracil-rich regions were more likely to be erroneous than guanines and adenines. By examining raw signal data, we identified the underlying signal-level features potentially associated with the error patterns and their dependency on sequence contexts. While read quality scores can be used to approximate error rates at base and read levels, failure to detect DNA adapters may be a source of errors and data loss. By comparing distinct basecallers, we reason that some sequencing errors are attributable to signal insufficiency rather than algorithmic (basecalling) artefacts. Lastly, we generated dRNA-seq data using the latest SQK-RNA004 sequencing kit released at the end of 2023 and found that although the overall read accuracy increased, the systematic errors remain largely identical compared to the previous kits. Conclusions As the first systematic investigation of dRNA-seq errors, this study offers a comprehensive overview of reproducible error patterns across diverse datasets, identifies potential signal-level insufficiency, and lays the foundation for error correction methods.

Journal Article

Share this book

Add to My Shelf

Ultrarapid Nanopore Genome Sequencing in a Critical Care Setting

by Chubb, Henry , Gorzynski, John E , Christle, Jeffrey W in Adolescent , Bioinformatics , Child, Preschool

2022

Because a genetic diagnosis can guide clinical management and improve prognosis in critically ill patients, much effort has gone into developing methods that result in rapid, reliable results. The authors describe extremely rapid sequencing and analysis of the genomes of 12 patients, 5 of whom received a diagnosis.

Journal Article

Share this book

Add to My Shelf

NanoVar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing

by Tham, Cheng Yong , Goh, Yufen , Wang, Wilson in Animal Genetics and Genomics , Bioinformatics , Biomarkers

2020

The recent advent of third-generation sequencing technologies brings promise for better characterization of genomic structural variants by virtue of having longer reads. However, long-read applications are still constrained by their high sequencing error rates and low sequencing throughput. Here, we present NanoVar, an optimized structural variant caller utilizing low-depth (8X) whole-genome sequencing data generated by Oxford Nanopore Technologies. NanoVar exhibits higher structural variant calling accuracy when benchmarked against current tools using low-depth simulated datasets. In patient samples, we successfully validate structural variants characterized by NanoVar and uncover normal alternative sequences or alleles which are present in healthy individuals.

Journal Article

Share this book

Add to My Shelf

Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling

by de Ridder, Jeroen , Pagès-Gallego, Marc in Algorithms , Animal Genetics and Genomics , Basecalling

2023

Background Nanopore-based DNA sequencing relies on basecalling the electric current signal. Basecalling requires neural networks to achieve competitive accuracies. To improve sequencing accuracy further, new models are continuously proposed with new architectures. However, benchmarking is currently not standardized, and evaluation metrics and datasets used are defined on a per publication basis, impeding progress in the field. This makes it impossible to distinguish data from model driven improvements. Results To standardize the process of benchmarking, we unified existing benchmarking datasets and defined a rigorous set of evaluation metrics. We benchmarked the latest seven basecaller models by recreating and analyzing their neural network architectures. Our results show that overall Bonito’s architecture is the best for basecalling. We find, however, that species bias in training can have a large impact on performance. Our comprehensive evaluation of 90 novel architectures demonstrates that different models excel at reducing different types of errors and using recurrent neural networks (long short-term memory) and a conditional random field decoder are the main drivers of high performing models. Conclusions We believe that our work can facilitate the benchmarking of new basecaller tools and that the community can further expand on this work.

Journal Article

Share this book

Add to My Shelf

Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing

by Peng, Hongke , Amarasinghe, Shanika L. , Su, Shian in Alternative Splicing , Animal Genetics and Genomics , Animals

2021

A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline ( FLAMES ) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter