Catalogue Search | MBRL

Comprehensive assessment of multiple biases in small RNA sequencing reveals significant differences in the performance of widely used methods

by Cross, Alan J. , Williams, Courtney , Brandon, Nicholas J. in Analysis , Animal Genetics and Genomics , Bias (Statistics)

2019

Background RNA sequencing offers advantages over other quantification methods for microRNA (miRNA), yet numerous biases make reliable quantification challenging. Previous evaluations of these biases have focused on adapter ligation bias with limited evaluation of reverse transcription bias or amplification bias. Furthermore, evaluations of the quantification of isomiRs (miRNA isoforms) or the influence of starting amount on performance have been very limited. No study had yet evaluated the quantification of isomiRs of altered length or compared the consistency of results derived from multiple moderate starting inputs. We therefore evaluated quantifications of miRNA and isomiRs using four library preparation kits, with various starting amounts, as well as quantifications following removal of duplicate reads using unique molecular identifiers (UMIs) to mitigate reverse transcription and amplification biases. Results All methods resulted in false isomiR detection; however, the adapter-free method tested was especially prone to false isomiR detection. We demonstrate that using UMIs improves accuracy and we provide a guide for input amounts to improve consistency. Conclusions Our data show differences and limitations of current methods, thus raising concerns about the validity of quantification of miRNA and isomiRs across studies. We advocate for the use of UMIs to improve accuracy and reliability of miRNA quantifications.

Journal Article

Share this book

Add to My Shelf

CRISPR/Cas9 screening using unique molecular identifiers

by Zhang, Jilin , Kivioja, Teemu , Turunen, Mikko in Cell Line , CRISPR , CRISPR-Cas Systems

2017

Loss‐of‐function screening by CRISPR/Cas9 gene knockout with pooled, lentiviral guide libraries is a widely applicable method for systematic identification of genes contributing to diverse cellular phenotypes. Here, Random Sequence Labels (RSLs) are incorporated into the guide library, which act as unique molecular identifiers (UMIs) to allow massively parallel lineage tracing and lineage dropout screening. RSLs greatly improve the reproducibility of results by increasing both the precision and the accuracy of screens. They reduce the number of cells needed to reach a set statistical power, or allow a more robust screen using the same number of cells. Synopsis Genetic screens with pooled, lentiviral CRISPR guide libraries allow the identification of genes contributing to cellular phenotypes. The precision and accuracy of such screens is dramatically improved by the incorporation of inert, random sequence labels (RSLs) into CRISPR guides. Compared to the conventional method, inclusion of RSLs generates considerably more information at an identical experimental scale and enables data analysis by simple statistics. RSL‐based analysis requires fewer cells per guide to reach a set statistical power. This is important if cell numbers are limiting, such as in very large, genome‐wide screens and/or screens in primary cells. RSLs can be used as Unique Molecular Identifiers (UMIs), allowing tracking of single cells and their progeny throughout a pooled screen. This yields hundreds of independent measurements per guide. Graphical Abstract Genetic screens with pooled, lentiviral CRISPR guide libraries allow the identification of genes contributing to cellular phenotypes. The precision and accuracy of such screens is dramatically improved by the incorporation of inert, random sequence labels (RSLs) into CRISPR guides.

Journal Article

Share this book

Add to My Shelf

UTAP: User-friendly Transcriptome Analysis Pipeline

by Leshkowitz, Dena , Feldmesser, Ester , Safran, Marilyn in Algorithms , Bioinformatics , Bioinformatics workflow

2019

Background RNA-Seq technology is routinely used to characterize the transcriptome, and to detect gene expression differences among cell types, genotypes and conditions. Advances in short-read sequencing instruments such as Illumina Next-Seq have yielded easy-to-operate machines, with high throughput, at a lower price per base. However, processing this data requires bioinformatics expertise to tailor and execute specific solutions for each type of library preparation. Results In order to enable fast and user-friendly data analysis, we developed an intuitive and scalable transcriptome pipeline that executes the full process, starting from cDNA sequences derived by RNA-Seq [Nat Rev Genet 10:57-63, 2009] and bulk MARS-Seq [Science 343:776-779, 2014] and ending with sets of differentially expressed genes. Output files are placed in structured folders, and results summaries are provided in rich and comprehensive reports, containing dozens of plots, tables and links. Conclusion Our User -friendly T ranscriptome A nalysis P ipeline (UTAP) is an open source, web-based intuitive platform available to the biomedical research community, enabling researchers to efficiently and accurately analyse transcriptome sequence data.

Journal Article

Share this book

Add to My Shelf

Heterogeneity of genetic sequence within quasi-species of influenza virus revealed by single-molecule sequencing

by Noji, Hiroyuki , Tamao, Kenji , Tabata, Kazuhito in Analysis , Evolution , Evolution, Molecular

2026

Influenza viruses exhibit high mutation rates and extensive genetic diversity, which hinder effective vaccine development and facilitate immune evasion (Taubenberger and Morens, 2006; Barr et al., 2010). These mutations arise from the error-prone viral RNA-dependent RNA polymerase, generating highly heterogeneous viral populations within individual hosts that conform to the quasi-species model of a cloud of related genomes evolving under selection (Domingo et al., 2012). Accurate characterization of this intra-host diversity is crucial for understanding viral evolution and improving vaccine design, yet conventional RNA sequencing often fails to detect low-frequency variants because of technical errors during sample preparation and sequencing. Here, we implement a single unique molecular identifier strategy that reduces sequencing artifacts and achieves an error rate of ~10⁻⁵, enabling single-particle–level quantification of quasi-species diversity. Mutation frequencies greatly exceeding background error confirm their biological origin, while information-theoretic metrics such as Shannon entropy and Jensen–Shannon divergence reveal non-random mutation distributions under selective constraints. This framework supports detailed studies of intra-host viral evolution and may inform artificial intelligence-driven prediction of mutational trajectories and more effective influenza vaccine strategies.

Journal Article

Share this book

Add to My Shelf

Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers

by Zamore, Phillip D. , Weng, Zhiping , Fu, Yu in Animal Genetics and Genomics , Animals , Binding sites

2018

Background RNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules. Results We have incorporated UMIs into RNA-seq and small RNA-seq protocols and developed tools to analyze the resulting data. Our UMIs contain stretches of random nucleotides whose lengths sufficiently capture diverse molecule species in both RNA-seq and small RNA-seq libraries generated from mouse testis. Our approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries. Conclusions Using simulated and real datasets, we demonstrate that our methods increase the reproducibility of RNA-seq and small RNA-seq data. Notably, we find that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency. Finally, we show that computational removal of PCR duplicates based only on their mapping coordinates introduces substantial bias into data analysis.

Journal Article

Share this book

Add to My Shelf

UMI-count modeling and differential expression analysis for single-cell RNA sequencing

by Finkelstein, David , Chen, Wenan , Wu, Gang in Algorithms , Animal Genetics and Genomics , binomial distribution

2018

Read counting and unique molecular identifier (UMI) counting are the principal gene expression quantification schemes used in single-cell RNA-sequencing (scRNA-seq) analysis. By using multiple scRNA-seq datasets, we reveal distinct distribution differences between these schemes and conclude that the negative binomial model is a good approximation for UMI counts, even in heterogeneous populations. We further propose a novel differential expression analysis algorithm based on a negative binomial model with independent dispersions in each group (NBID). Our results show that this properly controls the FDR and achieves better power for UMI counts when compared to other recently developed packages for scRNA-seq analysis.

Journal Article

Share this book

Add to My Shelf

Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

by Chen, Shifu , Li, Zhicheng , Chen, Yaru in Algorithms , Bioinformatics , Biomedical and Life Sciences

2019

Background Removing duplicates might be considered as a well-resolved problem in next-generation sequencing (NGS) data processing domain. However, as NGS technology gains more recognition in clinical application, researchers start to pay more attention to its sequencing errors, and prefer to remove these errors while performing deduplication operations. Recently, a new technology called unique molecular identifier (UMI) has been developed to better identify sequencing reads derived from different DNA fragments. Most existing duplicate removing tools cannot handle the UMI-integrated data. Some modern tools can work with UMIs, but are usually slow and use too much memory. Furthermore, existing tools rarely report rich statistical results, which are very important for quality control and downstream analysis. These unmet requirements drove us to develop an ultra-fast, simple, little-weighted but powerful tool for duplicate removing and sequence error suppressing, with features of handling UMIs and reporting informative results. Results This paper presents an efficient tool gencore for duplicate removing and sequence error suppressing of NGS data. This tool clusters the mapped sequencing reads and merges reads in each cluster to generate one single consensus read. While the consensus read is generated, the random errors introduced by library construction and sequencing can be removed. This error-suppressing feature makes gencore very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data. When unique molecular identifier (UMI) technology is applied, gencore can use them to identify the reads derived from same original DNA fragment. Gencore reports statistical results in both HTML and JSON formats. The HTML format report contains many interactive figures plotting statistical coverage and duplication information. The JSON format report contains all the statistical results, and is interpretable for downstream programs. Conclusions Comparing to the conventional tools like Picard and SAMtools, gencore greatly reduces the output data’s mapping mismatches, which are mostly caused by errors. Comparing to some new tools like UMI-Reducer and UMI-tools, gencore runs much faster, uses less memory, generates better consensus reads and provides simpler interfaces. To our best knowledge, gencore is the only duplicate removing tool that generates both informative HTML and JSON reports. This tool is available at: https://github.com/OpenGene/gencore

Journal Article

Share this book

Add to My Shelf

Quantitative Characterization of the T Cell Receptor Repertoire of Naïve and Memory Subsets Using an Integrated Experimental and Computational Pipeline Which Is Robust, Economical, and Versatile

by Joshi, Kroopa , Oakes, Theres , Byng-Maddick, Rachel in Antigens , Biopsy , Cloning

2017

The T cell receptor (TCR) repertoire can provide a personalized biomarker for infectious and non-infectious diseases. We describe a protocol for amplifying, sequencing, and analyzing TCRs which is robust, sensitive, and versatile. The key experimental step is ligation of a single-stranded oligonucleotide to the 3' end of the TCR cDNA. This allows amplification of all possible rearrangements using a single set of primers per locus. It also introduces a unique molecular identifier to label each starting cDNA molecule. This molecular identifier is used to correct for sequence errors and for effects of differential PCR amplification efficiency, thus producing more accurate measures of the true TCR frequency within the sample. This integrated experimental and computational pipeline is applied to the analysis of human memory and naive subpopulations, and results in consistent measures of diversity and inequality. After error correction, the distribution of TCR sequence abundance in all subpopulations followed a power law over a wide range of values. The power law exponent differed between naïve and memory populations, but was consistent between individuals. The integrated experimental and analysis pipeline we describe is appropriate to studies of T cell responses in a broad range of physiological and pathological contexts.

Journal Article

Share this book

Add to My Shelf

Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR

by Coassin, Stefan , Kronenberg, Florian , Amstler, Stephan in Analysis , Bioinformatics , Biomedical and Life Sciences

2024

Background Repetitive genome regions, such as variable number of tandem repeats (VNTR) or short tandem repeats (STR), are major constituents of the uncharted dark genome and evade conventional sequencing approaches. The protein-coding LPA kringle IV type-2 (KIV-2) VNTR (5.6 kb per unit, 1–40 units per allele) is a medically highly relevant example with a particularly intricate structure, multiple haplotypes, intragenic homologies, and an intra-VNTR STR. It is the primary regulator of plasma lipoprotein(a) [Lp(a)] concentrations, an important cardiovascular risk factor. Lp(a) concentrations vary widely between individuals and ancestries. Multiple variants and functional haplotypes in the LPA gene and especially in the KIV-2 VNTR strongly contribute to this variance. Methods We evaluated the performance of amplicon-based nanopore sequencing with unique molecular identifiers (UMI-ONT-Seq) for SNP detection, haplotype mapping, VNTR unit consensus sequence generation, and copy number estimation via coverage-corrected haplotypes quantification in the KIV-2 VNTR. We used 15 human samples and low-level mixtures (0.5 to 5%) of KIV-2 plasmids as a validation set. We then applied UMI-ONT-Seq to extract KIV-2 VNTR haplotypes in 48 multi-ancestry 1000 Genome samples and analyzed at scale a poorly characterized STR within the KIV-2 VNTR. Results UMI-ONT-Seq detected KIV-2 SNPs down to 1% variant level with high sensitivity, specificity, and precision (0.977 ± 0.018; 1.000 ± 0.0005; 0.993 ± 0.02) and accurately retrieved the full-length haplotype of each VNTR unit. Human variant levels were highly correlated with next-generation sequencing ( R 2 = 0.983) without bias across the whole variant level range. Six reads per UMI produced sequences of each KIV-2 unit with Q40 quality. The KIV-2 repeat number determined by coverage-corrected unique haplotype counting was in close agreement with droplet digital PCR (ddPCR), with 70% of the samples falling even within the narrow confidence interval of ddPCR. We then analyzed 62,679 intra-KIV-2 STR sequences and explored KIV-2 SNP haplotype patterns across five ancestries. Conclusions UMI-ONT-Seq accurately retrieves the SNP haplotype and precisely quantifies the VNTR copy number of each repeat unit of the complex KIV-2 VNTR region across multiple ancestries. This study utilizes the KIV-2 VNTR, presenting a novel and potent tool for comprehensive characterization of medically relevant complex genome regions at scale.

Journal Article

Share this book

Add to My Shelf

A Comprehensive Characterization of Small RNA Profiles by Massively Parallel Sequencing in Six Forensic Body Fluids/Tissue

by Liu, Zhiyong , Sun, Hongyu , Zang, Yu in Analysis , Bioinformatics , blood

2022

Body fluids/tissue identification (BFID) is an essential procedure in forensic practice, and RNA profiling has become one of the most important methods. Small non-coding RNAs, being expressed in high copy numbers and resistant to degradation, have great potential in BFID but have not been comprehensively characterized in common forensic stains. In this study, the miRNA, piRNA, snoRNA, and snRNA were sequenced in 30 forensic relevant samples (menstrual blood, saliva, semen, skin, venous blood, and vaginal secretion) using the BGI platform. Based on small RNA profiles, relative specific markers (RSM) and absolute specific markers (ASM) were defined, which can be used to identify a specific body fluid/tissue out of two or six, respectively. A total of 5204 small RNAs were discovered including 1394 miRNAs (including 236 novel miRNA), 3157 piRNAs, 636 snoRNAs, and 17 snRNAs. RSMs for 15 pairwise body fluid/tissue groups were discovered by differential RNA analysis. In addition, 90 ASMs that were specifically expressed in a certain type of body fluid/tissue were screened, among them, snoRNAs were reported first in forensic genetics. In brief, our study deepened the understanding of small RNA profiles in forensic stains and offered potential BFID markers that can be applied in different forensic scenarios.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter