Catalogue Search | MBRL

HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly

by Corpuz, Renee L. , Sim, Sheina B. , Simmonds, Tyler J. in Adapter , Animal Genetics and Genomics , BASIC BIOLOGICAL SCIENCES

2022

Background Pacific Biosciences HiFi read technology is currently the industry standard for high accuracy long-read sequencing that has been widely adopted by large sequencing and assembly initiatives for generation of de novo assemblies in non-model organisms. Though adapter contamination filtering is routine in traditional short-read analysis pipelines, it has not been widely adopted for HiFi workflows. Results Analysis of 55 publicly available HiFi datasets revealed that a read-sanitation step to remove sequence artifacts derived from PacBio library preparation from read pools is necessary as adapter sequences can be erroneously integrated into assemblies. Conclusions Here we describe the nature of adapter contaminated reads, their consequences in assembly, and present HiFiAdapterFilt, a simple and memory efficient solution for removing adapter contaminated reads prior to assembly.

Journal Article

Share this book

Add to My Shelf

TopoQual polishes circular consensus sequencing data and accurately predicts quality scores

by Weerakoon, Minindu , Lee, Sangjin , Heaton, Haynes in Accuracy , Algorithms , Bioinformatics

2025

Background Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads. Results To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data. Conclusion TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.

Journal Article

Share this book

Add to My Shelf

High‐accuracy de novo assembly and SNP detection of chloroplast genomes using a SMRT circular consensus sequencing strategy

by Li, Ying , Qian, Jun , Song, Jingyuan in Assembly , Biological taxonomies , Chloroplast DNA

2014

A circular consensus sequencing (CCS) strategy involving single molecule, real‐time (SMRT) DNA sequencing technology was applied to de novo assembly and single nucleotide polymorphism (SNP) detection of chloroplast genomes. Chloroplast DNA was purified from enriched chloroplasts of pooled individuals to construct a shotgun library for each species. The sequencing reactions were performed on a PacBio RS platform. CCS sub‐reads were generated from polymerase reads that passed the native dumbbell‐shaped DNA templates multiple times. The complete chloroplast genome sequence was generated by mapping all reads to the draft sequence constructed in a step‐by‐step manner. The full‐chain, PCR‐free approach eliminates the possible context‐specific biases in library construction and sequencing reaction. The chloroplast genome was easily and completely assembled using the data generated from one SMRT Cell without requiring a reference genome. Comparisons of the three assembled Fritillaria genomes to 34.1 kb of validation Sanger sequences revealed 100% concordance, and the detected intraspecies SNPs at a minimum variant frequency of 15% were all confirmed. This simple approach with potential for parallel sequencing yields high‐quality chloroplast genomes for sensitive SNP detection and comparative analyses. We recommend this approach for its powerful applicability for evolutionary genetics and genomics studies in plants based on the sequences of chloroplast genomes.

Journal Article

Share this book

Add to My Shelf

Annotated genome and transcriptome of the endangered Caribbean mountainous star coral (Orbicella faveolata) using PacBio long-read sequencing

by Muller, Erinn M. , Traylor-Knowles, Nikki , MacKnight, Nicholas J. in Analysis , Animal Genetics and Genomics , Anthropogenic factors

2024

Long-read sequencing is revolutionizing de-novo genome assemblies, with continued advancements making it more readily available for previously understudied, non-model organisms. Stony corals are one such example, with long-read de-novo genome assemblies now starting to be publicly available, opening the door for a wide array of ‘omics-based research. Here we present a new de-novo genome assembly for the endangered Caribbean star coral, Orbicella faveolata , using PacBio circular consensus reads. Our genome assembly improved the contiguity (51 versus 1,933 contigs) and complete and single copy BUSCO orthologs (93.6% versus 85.3%, database metazoa_odb10), compared to the currently available reference genome generated using short-read methodologies. Our new de-novo assembled genome also showed comparable quality metrics to other coral long-read genomes. Telomeric repeat analysis identified putative chromosomes in our scaffolded assembly, with these repeats at either one, or both ends, of scaffolded contigs. We identified 32,172 protein coding genes in our assembly through use of long-read RNA sequencing (ISO-seq) of additional O. faveolata fragments exposed to a range of abiotic and biotic treatments, and publicly available short-read RNA-seq data. With anthropogenic influences heavily affecting O. faveolata , as well as it s increasing incorporation into reef restoration activities, this updated genome resource can be used for population genomics and other ‘omics analyses to aid in the conservation of this species.

Journal Article

Share this book

Add to My Shelf

Quality Control of the Traditional Patent Medicine Yimu Wan Based on SMRT Sequencing and DNA Barcoding

by Xu, Zhichao , Shi, Linchun , Jia, Jing in Chinese history , Chromatography , circular-consensus sequencing (CCS)

2017

Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT) sequencing and DNA barcoding. Yimu Wan (YMW), a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS) reads from two reference and three commercial YMW samples were mapped to the ITS2 and regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines.

Journal Article

Share this book

Add to My Shelf

Selective expression of Pneumocystis antigens in different patients during a suspected outbreak of Pneumocystis pneumonia

by Richard, Sophie , Mühlethaler, Konrad , Meier, Caroline S. in Adult , Aged , Antigens

2025

The fungus Pneumocystis causes severe pneumonia in patients with weakened immune systems. It possesses a genetic system to vary the antigens at the surface of its cells that are presented to the immune system of the patient. We report for the first time that this system may have been implicated in the infections of renal transplant recipients involved in a suspected outbreak. Our observations suggest that the antigens presented might be selected to avoid the elimination of the fungus by the immune response specific to each patient. The resistance of the fungus to the immunosuppressant mycophenolate administered to these patients to prevent organ rejection probably also played a role in the infections during the suspected outbreak.

Journal Article

Share this book

Add to My Shelf

Chloroplast genome of Aconitum barbatum var. puberulum (Ranunculaceae) derived from CCS reads using the PacBio RS platform

by Li, Qiushi , Chen, Xiaochen , Li, Ying in Aconitum , Chloroplast genome , Chloroplasts

2015

The chloroplast genome (cp genome) of Aconitum barbatum var. puberulum was sequenced using the third-generation sequencing platform based on the single-molecule real-time (SMRT) sequencing approach. To our knowledge, this is the first reported complete cp genome of Aconitum, and we anticipate that it will have great value for phylogenetic studies of the Ranunculaceae family. In total, 23,498 CCS reads and 20,685,462 base pairs were generated, the mean read length was 880 bp, and the longest read was 2,261 bp. Genome coverage of 100% was achieved with a mean coverage of 132× and no gaps. The accuracy of the assembled genome is 99.973%; the assembly was validated using Sanger sequencing of six selected genes from the cp genome. The complete cp genome of A. barbatum var. puberulum is 156,749 bp in length, including a large single-copy region of 87,630 bp and a small single-copy region of 16,941 bp separated by two inverted repeats of 26,089 bp. The cp genome contains 130 genes, including 84 protein-coding genes, 34 tRNA genes and eight rRNA genes. Four forward, five inverted and eight tandem repeats were identified. According to the SSR analysis, the longest poly structure is a 20-T repeat. Our results presented in this paper will facilitate the phylogenetic studies and molecular authentication on Aconitum.

Journal Article

Share this book

Add to My Shelf

Endosymbiotic Fungal Diversity and Dynamics of the Brown Planthopper across Developmental Stages, Tissues, and Sexes Revealed Using Circular Consensus Sequencing

by Cheng, Yichen , Li, Tianzhu , Li, Jiamei in Accuracy , Adults , Bar codes

2024

Endosymbiotic fungi play an important role in the growth and development of insects. Understanding the endosymbiont communities hosted by the brown planthopper (BPH; Nilaparvata lugens Stål), the most destructive pest in rice, is a prerequisite for controlling BPH rice infestations. However, the endosymbiont diversity and dynamics of the BPH remain poorly studied. Here, we used circular consensus sequencing (CCS) to obtain 87,131 OTUs (operational taxonomic units), which annotated 730 species of endosymbiotic fungi in the various developmental stages and tissues. We found that three yeast-like symbionts (YLSs), Polycephalomyces prolificus, Ophiocordyceps heteropoda, and Hirsutella proturicola, were dominant in almost all samples, which was especially pronounced in instar nymphs 4–5, female adults, and the fat bodies of female and male adult BPH. Interestingly, honeydew as the only in vitro sample had a unique community structure. Various diversity indices might indicate the different activity of endosymbionts in these stages and tissues. The biomarkers analyzed using LEfSe suggested some special functions of samples at different developmental stages of growth and the active functions of specific tissues in different sexes. Finally, we found that the incidence of occurrence of three species of Malassezia and Fusarium sp. was higher in males than in females in all comparison groups. In summary, our study provides a comprehensive survey of symbiotic fungi in the BPH, which complements the previous research on YLSs. These results offer new theoretical insights and practical implications for novel pest management strategies to understand the BPH–microbe symbiosis and devise effective pest control strategies.

Journal Article

Share this book

Add to My Shelf

Detection of Rare Thalassemia Variants Using Accurate Circular Consensus Long‐Read Sequencing

by Lian, Jingli , Yang, Xingkun , Chen, Shufen in Anemia , Asymptomatic , Bar codes

2026

Objective The aim of this study is to evaluate the efficacy of accurate circular consensus long‐read sequencing in the detection of rare thalassemia. Methods Conventional molecular analysis on globin genes has limitations because of the broad spectrum of genetic variants, complex genetics, and genotype–phenotype correlation. Accurate circular consensus long‐read sequencing is a novel tool that detects complex variants in the thalassemia gene based on third‐generation sequencing. In this study, we screen out suspected rare thalassemia carriers by hemoglobin analysis and conventional molecular analysis, and evaluate the efficacy of accurate circular consensus long‐read sequencing in the detection of rare thalassemia. Results Based on the traditional screening of thalassemia gene, an additional 16 (17.67%) cases of clinically significant variants of rare thalassemia were identified by accurate circular consensus long‐read sequencing in this study, including 12‐point variants and 4 deletion variants: HBB: (SEA)‐HPFH, HBB: c.268_281delAGTGAGCTGCACTG, HBB: (Chinese) Gγ + (Aγδβ)0, and HBA2:c.91‐93delGAG. Conclusion Accurate circular consensus long‐read sequencing has a promising prospect in detecting rare thalassemia gene variants and may improve the detection rate of carriers.

Journal Article

Share this book

Add to My Shelf

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing

by Sawyer, Sara L. , McBee, Ross M. , Andino, Raul in Biological Sciences , Circles , Computational Biology - methods

2013

A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ∼0.1–1 × 10−2 per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, \"circle sequencing,\" which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10−6 per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter