Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
440
result(s) for
"Correction of sequencing errors"
Sort by:
Hybrid-hybrid correction of errors in long reads with HERO
by
Kang, Xiongbin
,
Xu, Jialu
,
Schönhuth, Alexander
in
Accuracy
,
Algorithms
,
Animal Genetics and Genomics
2023
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first “hybrid-hybrid” approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27
∼
95%) and 20% (4
∼
61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Journal Article
Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data
by
Orton, Richard J
,
King, Donald P
,
Morelli, Marco J
in
Animal Genetics and Genomics
,
Biomedical and Life Sciences
,
Gene Frequency
2015
Background
RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing is revolutionising the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants within the population. Identification of low frequency variants is important for our understanding of mutational dynamics, disease progression, immune pressure, and for the detection of drug resistant or pathogenic mutations. However, the current challenge is to accurately model the errors in the sequence data and distinguish real viral variants, particularly those that exist at low frequency, from errors introduced during sequencing and sample processing, which can both be substantial.
Results
We have created a novel set of laboratory control samples that are derived from a plasmid containing a full-length viral genome with extremely limited diversity in the starting population. One sample was sequenced without PCR amplification whilst the other samples were subjected to increasing amounts of RT and PCR amplification prior to ultra-deep sequencing. This enabled the level of error introduced by the RT and PCR processes to be assessed and minimum frequency thresholds to be set for true viral variant identification. We developed a genome-scale computational model of the sample processing and NGS calling process to gain a detailed understanding of the errors at each step, which predicted that RT and PCR errors are more likely to occur at some genomic sites than others. The model can also be used to investigate whether the number of observed mutations at a given site of interest is greater than would be expected from processing errors alone in any NGS data set. After providing basic sample processing information and the site’s coverage and quality scores, the model utilises the fitted RT-PCR error distributions to simulate the number of mutations that would be observed from processing errors alone.
Conclusions
These data sets and models provide an effective means of separating true viral mutations from those erroneously introduced during sample processing and sequencing.
Journal Article
Measurable (Minimal) Residual Disease in Myelodysplastic Neoplasms (MDS): Current State and Perspectives
2024
Myelodysplastic Neoplasms (MDS) have been traditionally studied through the assessment of blood counts, cytogenetics, and morphology. In recent years, the introduction of molecular assays has improved our ability to diagnose MDS. The role of Measurable (minimal) Residual Disease (MRD) in MDS is evolving, and molecular and flow cytometry techniques have been used in several studies. In this review, we will highlight the evolving concept of MRD in MDS, outline the various techniques utilized, and provide an overview of the studies reporting MRD and the correlation with outcomes.
Journal Article
Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs
by
Allen, Jonathan E
,
Chen-Harris, Haiyin
,
Slezak, Tom R
in
Accuracy
,
Analysis
,
Animal Genetics and Genomics
2013
Backgound
High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors.
Results
We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%).
Conclusions
Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.
Journal Article
Applications of High‐Fidelity Sequencing Protocol to RNA Viruses
by
Sun, Ren
,
Nenastyeva, Ekaterina
,
Mangul, Serghei
in
high‐fidelity sequencing protocol
,
next‐generation sequencing
,
post‐sequencing error correction techniques
2016
This chapter describes the used high‐fidelity sequencing protocol, and introduces the approach for viral genome assembly (VGA) based on high‐fidelity sequencing data. It presents the results of performance of VGA and some other viral assemblers on simulated data, describes the performance of VGA on real HIV data. The chapter compares different aligners to investigate the effect of their alignment on mapping statistics. Post‐sequencing error correction techniques are available for reads obtained by regular protocol offering the possibility to partially correct sequencing errors trading off for real biological mutations. HCV virus exhibits more complex genomic architecture with lower population diversity and longer conserved regions than HIV. QuasiRecomb is designed to handle paired‐end read data and manages to produce full‐length viral genomes. The chapter discusses the application of the high‐fidelity protocol that is the evaluation of error correction methods for next‐generation sequencing (NGS) reads.
Book Chapter
Using a VOM model for reconstructing potential coding regions in EST sequences
2007
This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences’ statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposed method is found to be robust to random errors in these sequences.
Journal Article
Sequencing accuracy and systematic errors of nanopore direct RNA sequencing
2024
Background
Direct RNA sequencing (dRNA-seq) on the Oxford Nanopore Technologies (ONT) platforms can produce reads covering up to full-length gene transcripts, while containing decipherable information about RNA base modifications and poly-A tail lengths. Although many published studies have been expanding the potential of dRNA-seq, its sequencing accuracy and error patterns remain understudied.
Results
We present the first comprehensive evaluation of sequencing accuracy and characterisation of systematic errors in dRNA-seq data from diverse organisms and synthetic in vitro transcribed RNAs. We found that for sequencing kits SQK-RNA001 and SQK-RNA002, the median read accuracy ranged from 87% to 92% across species, and deletions significantly outnumbered mismatches and insertions. Due to their high abundance in the transcriptome, heteropolymers and short homopolymers were the major contributors to the overall sequencing errors. We also observed systematic biases across all species at the levels of single nucleotides and motifs. In general, cytosine/uracil-rich regions were more likely to be erroneous than guanines and adenines. By examining raw signal data, we identified the underlying signal-level features potentially associated with the error patterns and their dependency on sequence contexts. While read quality scores can be used to approximate error rates at base and read levels, failure to detect DNA adapters may be a source of errors and data loss. By comparing distinct basecallers, we reason that some sequencing errors are attributable to signal insufficiency rather than algorithmic (basecalling) artefacts. Lastly, we generated dRNA-seq data using the latest SQK-RNA004 sequencing kit released at the end of 2023 and found that although the overall read accuracy increased, the systematic errors remain largely identical compared to the previous kits.
Conclusions
As the first systematic investigation of dRNA-seq errors, this study offers a comprehensive overview of reproducible error patterns across diverse datasets, identifies potential signal-level insufficiency, and lays the foundation for error correction methods.
Journal Article
NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads
by
Wang, Depeng
,
Sandoval, José R.
,
Hu, Jiang
in
Algorithms
,
Animal Genetics and Genomics
,
Bioinformatics
2024
Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.
Journal Article
NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors
2018
Background
Advances in Illumina DNA sequencing technology have produced longer paired-end reads that increasingly have sequence overlaps. These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction and accurate determination of read coverage. Extant merging programs utilize simplistic or unverified models for the selection of bases and quality scores for the overlapping region of merged reads.
Results
We first examined the baseline quality score - error rate relationship using sequence reads derived from PhiX. In contrast to numerous published reports, we found that the quality scores produced by Illumina were not substantially inflated above the theoretical values, once the reference genome was corrected for unreported sequence variants. The PhiX reads were then used to create empirical models of sequencing errors in overlapping regions of paired-end reads, and these models were incorporated into a novel merging program, NGmerge. We demonstrate that NGmerge corrects errors and ambiguous bases better than other merging programs, and that it assigns quality scores for merged bases that accurately reflect the error rates. Our results also show that, contrary to published analyses, the sequencing errors of paired-end reads are not independent.
Conclusions
We provide a free and open-source program, NGmerge, that performs better than existing read merging programs. NGmerge is available on GitHub (
https://github.com/harvardinformatics/NGmerge
) under the MIT License; it is written in C and supported on Linux.
Journal Article
Kastor: a reference-based comparative approach for assessment and correction of gene-fragmenting errors in long-read assemblies of small genomes
by
Lorv, Janet S.H.
,
McConkey, Brendan J.
in
Accuracy
,
Animal Genetics and Genomics
,
Applications software
2025
Long read sequencing technologies provide an efficient approach to generating highly contiguous and informative assemblies. However, higher relative error rates can introduce frameshifts and premature stop codons that pseudogenize genes, hindering downstream analyses. We developed a software tool that detects gene-fragmenting errors in draft assemblies of small genomes through comparison with a curated set of reference genome sequences and raw read information. In our presented example, detected errors represent less than 0.05% of the genome, but when corrected reduced the rate of pseudogenes from 23.3 to 5.6% in example long read assemblies, comparable to the rate of pseudogenes in short read assemblies. We demonstrate that this software can detect assembly errors in long read assemblies generated from small genomes and correct them to de-fragment genes.
Journal Article