Catalogue Search | MBRL

NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors

by Gaspar, John M. in Algorithms , Analysis , Bioinformatics

2018

Background Advances in Illumina DNA sequencing technology have produced longer paired-end reads that increasingly have sequence overlaps. These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction and accurate determination of read coverage. Extant merging programs utilize simplistic or unverified models for the selection of bases and quality scores for the overlapping region of merged reads. Results We first examined the baseline quality score - error rate relationship using sequence reads derived from PhiX. In contrast to numerous published reports, we found that the quality scores produced by Illumina were not substantially inflated above the theoretical values, once the reference genome was corrected for unreported sequence variants. The PhiX reads were then used to create empirical models of sequencing errors in overlapping regions of paired-end reads, and these models were incorporated into a novel merging program, NGmerge. We demonstrate that NGmerge corrects errors and ambiguous bases better than other merging programs, and that it assigns quality scores for merged bases that accurately reflect the error rates. Our results also show that, contrary to published analyses, the sequencing errors of paired-end reads are not independent. Conclusions We provide a free and open-source program, NGmerge, that performs better than existing read merging programs. NGmerge is available on GitHub ( https://github.com/harvardinformatics/NGmerge ) under the MIT License; it is written in C and supported on Linux.

Journal Article

Share this book

Add to My Shelf

CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing

by Yoon, Sungroh , Kwon, Sunyoung , Lee, Byunghan in Algorithms , Ambient intelligence , Assembly

2014

Merging the forward and reverse reads from paired-end sequencing is a critical task that can significantly improve the performance of downstream tasks, such as genome assembly and mapping, by providing them with virtually elongated reads. However, due to the inherent limitations of most paired-end sequencers, the chance of observing erroneous bases grows rapidly as the end of a read is approached, which becomes a critical hurdle for accurately merging paired-end reads. Although there exist several sophisticated approaches to this problem, their performance in terms of quality of merging often remains unsatisfactory. To address this issue, here we present a c ontext- a ware scheme for p aired- e nd r eads (CASPER): a computational method to rapidly and robustly merge overlapping paired-end reads. Being particularly well suited to amplicon sequencing applications, CASPER is thoroughly tested with both simulated and real high-throughput amplicon sequencing data. According to our experimental results, CASPER significantly outperforms existing state-of-the art paired-end merging tools in terms of accuracy and robustness. CASPER also exploits the parallelism in the task of paired-end merging and effectively speeds up by multithreading. CASPER is freely available for academic use at http://best.snu.ac.kr/casper.

Journal Article

Share this book

Add to My Shelf

A Read-Write Optimization Scheme for Flash Memory Storage Systems

by Yang, Yin , Li, Wen Yi , Wang, Kai in Blocking , Counting , Data storage

2014

In this paper, we propose a novel and efficient read-write optimization scheme for flash memory storage systems, we have named RWF: Read-Write FTL. In the proposed scheme, we effectively connect Logical Sector Number, Logical Block Number, Logical Page Number, Physical Page Number and Physical Block Number. RWF through uniting log blocks and physical blocks, all blocks can be used for servicing update requests. The invalid blocks could be reclaimed properly and intensively, it can avoid merging log blocks with physical blocks. At last, through the simulation test on RWF and the comparison with other schemes, which demonstrate the RWF can effectively solve data storage problems, and it greatly reduces erase count of flash devices and efficiency improves the performance of flash memory storage systems.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter