Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
35
result(s) for
"Unique molecular identifiers"
Sort by:
Comprehensive assessment of multiple biases in small RNA sequencing reveals significant differences in the performance of widely used methods
by
Cross, Alan J.
,
Williams, Courtney
,
Brandon, Nicholas J.
in
Analysis
,
Animal Genetics and Genomics
,
Bias (Statistics)
2019
Background
RNA sequencing offers advantages over other quantification methods for microRNA (miRNA), yet numerous biases make reliable quantification challenging. Previous evaluations of these biases have focused on adapter ligation bias with limited evaluation of reverse transcription bias or amplification bias. Furthermore, evaluations of the quantification of isomiRs (miRNA isoforms) or the influence of starting amount on performance have been very limited. No study had yet evaluated the quantification of isomiRs of altered length or compared the consistency of results derived from multiple moderate starting inputs. We therefore evaluated quantifications of miRNA and isomiRs using four library preparation kits, with various starting amounts, as well as quantifications following removal of duplicate reads using unique molecular identifiers (UMIs) to mitigate reverse transcription and amplification biases.
Results
All methods resulted in false isomiR detection; however, the adapter-free method tested was especially prone to false isomiR detection. We demonstrate that using UMIs improves accuracy and we provide a guide for input amounts to improve consistency.
Conclusions
Our data show differences and limitations of current methods, thus raising concerns about the validity of quantification of miRNA and isomiRs across studies. We advocate for the use of UMIs to improve accuracy and reliability of miRNA quantifications.
Journal Article
CRISPR/Cas9 screening using unique molecular identifiers
2017
Loss‐of‐function screening by CRISPR/Cas9 gene knockout with pooled, lentiviral guide libraries is a widely applicable method for systematic identification of genes contributing to diverse cellular phenotypes. Here, Random Sequence Labels (RSLs) are incorporated into the guide library, which act as unique molecular identifiers (UMIs) to allow massively parallel lineage tracing and lineage dropout screening. RSLs greatly improve the reproducibility of results by increasing both the precision and the accuracy of screens. They reduce the number of cells needed to reach a set statistical power, or allow a more robust screen using the same number of cells.
Synopsis
Genetic screens with pooled, lentiviral CRISPR guide libraries allow the identification of genes contributing to cellular phenotypes. The precision and accuracy of such screens is dramatically improved by the incorporation of inert, random sequence labels (RSLs) into CRISPR guides.
Compared to the conventional method, inclusion of RSLs generates considerably more information at an identical experimental scale and enables data analysis by simple statistics.
RSL‐based analysis requires fewer cells per guide to reach a set statistical power. This is important if cell numbers are limiting, such as in very large, genome‐wide screens and/or screens in primary cells.
RSLs can be used as Unique Molecular Identifiers (UMIs), allowing tracking of single cells and their progeny throughout a pooled screen. This yields hundreds of independent measurements per guide.
Graphical Abstract
Genetic screens with pooled, lentiviral CRISPR guide libraries allow the identification of genes contributing to cellular phenotypes. The precision and accuracy of such screens is dramatically improved by the incorporation of inert, random sequence labels (RSLs) into CRISPR guides.
Journal Article
UTAP: User-friendly Transcriptome Analysis Pipeline
by
Leshkowitz, Dena
,
Feldmesser, Ester
,
Safran, Marilyn
in
Algorithms
,
Bioinformatics
,
Bioinformatics workflow
2019
Background
RNA-Seq technology is routinely used to characterize the transcriptome, and to detect gene expression differences among cell types, genotypes and conditions. Advances in short-read sequencing instruments such as Illumina Next-Seq have yielded easy-to-operate machines, with high throughput, at a lower price per base. However, processing this data requires bioinformatics expertise to tailor and execute specific solutions for each type of library preparation.
Results
In order to enable fast and user-friendly data analysis, we developed an intuitive and scalable transcriptome pipeline that executes the full process, starting from cDNA sequences derived by RNA-Seq [Nat Rev Genet 10:57-63, 2009] and bulk MARS-Seq [Science 343:776-779, 2014] and ending with sets of differentially expressed genes. Output files are placed in structured folders, and results summaries are provided in rich and comprehensive reports, containing dozens of plots, tables and links.
Conclusion
Our
User
-friendly
T
ranscriptome
A
nalysis
P
ipeline (UTAP) is an open source, web-based intuitive platform available to the biomedical research community, enabling researchers to efficiently and accurately analyse transcriptome sequence data.
Journal Article
Heterogeneity of genetic sequence within quasi-species of influenza virus revealed by single-molecule sequencing
2026
Influenza viruses exhibit high mutation rates and extensive genetic diversity, which hinder effective vaccine development and facilitate immune evasion (Taubenberger and Morens, 2006; Barr et al., 2010). These mutations arise from the error-prone viral RNA-dependent RNA polymerase, generating highly heterogeneous viral populations within individual hosts that conform to the quasi-species model of a cloud of related genomes evolving under selection (Domingo et al., 2012). Accurate characterization of this intra-host diversity is crucial for understanding viral evolution and improving vaccine design, yet conventional RNA sequencing often fails to detect low-frequency variants because of technical errors during sample preparation and sequencing. Here, we implement a single unique molecular identifier strategy that reduces sequencing artifacts and achieves an error rate of ~10⁻⁵, enabling single-particle–level quantification of quasi-species diversity. Mutation frequencies greatly exceeding background error confirm their biological origin, while information-theoretic metrics such as Shannon entropy and Jensen–Shannon divergence reveal non-random mutation distributions under selective constraints. This framework supports detailed studies of intra-host viral evolution and may inform artificial intelligence-driven prediction of mutational trajectories and more effective influenza vaccine strategies.
Journal Article
Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers
by
Zamore, Phillip D.
,
Weng, Zhiping
,
Fu, Yu
in
Animal Genetics and Genomics
,
Animals
,
Binding sites
2018
Background
RNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules.
Results
We have incorporated UMIs into RNA-seq and small RNA-seq protocols and developed tools to analyze the resulting data. Our UMIs contain stretches of random nucleotides whose lengths sufficiently capture diverse molecule species in both RNA-seq and small RNA-seq libraries generated from mouse testis. Our approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries.
Conclusions
Using simulated and real datasets, we demonstrate that our methods increase the reproducibility of RNA-seq and small RNA-seq data. Notably, we find that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency. Finally, we show that computational removal of PCR duplicates based only on their mapping coordinates introduces substantial bias into data analysis.
Journal Article
UMI-count modeling and differential expression analysis for single-cell RNA sequencing
by
Finkelstein, David
,
Chen, Wenan
,
Wu, Gang
in
Algorithms
,
Animal Genetics and Genomics
,
binomial distribution
2018
Read counting and unique molecular identifier (UMI) counting are the principal gene expression quantification schemes used in single-cell RNA-sequencing (scRNA-seq) analysis. By using multiple scRNA-seq datasets, we reveal distinct distribution differences between these schemes and conclude that the negative binomial model is a good approximation for UMI counts, even in heterogeneous populations. We further propose a novel differential expression analysis algorithm based on a negative binomial model with independent dispersions in each group (NBID). Our results show that this properly controls the FDR and achieves better power for UMI counts when compared to other recently developed packages for scRNA-seq analysis.
Journal Article
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data
by
Chen, Shifu
,
Li, Zhicheng
,
Chen, Yaru
in
Algorithms
,
Bioinformatics
,
Biomedical and Life Sciences
2019
Background
Removing duplicates might be considered as a well-resolved problem in next-generation sequencing (NGS) data processing domain. However, as NGS technology gains more recognition in clinical application, researchers start to pay more attention to its sequencing errors, and prefer to remove these errors while performing deduplication operations. Recently, a new technology called unique molecular identifier (UMI) has been developed to better identify sequencing reads derived from different DNA fragments. Most existing duplicate removing tools cannot handle the UMI-integrated data. Some modern tools can work with UMIs, but are usually slow and use too much memory. Furthermore, existing tools rarely report rich statistical results, which are very important for quality control and downstream analysis. These unmet requirements drove us to develop an ultra-fast, simple, little-weighted but powerful tool for duplicate removing and sequence error suppressing, with features of handling UMIs and reporting informative results.
Results
This paper presents an efficient tool
gencore
for duplicate removing and sequence error suppressing of NGS data. This tool clusters the mapped sequencing reads and merges reads in each cluster to generate one single consensus read. While the consensus read is generated, the random errors introduced by library construction and sequencing can be removed. This error-suppressing feature makes
gencore
very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data. When unique molecular identifier (UMI) technology is applied,
gencore
can use them to identify the reads derived from same original DNA fragment.
Gencore
reports statistical results in both HTML and JSON formats. The HTML format report contains many interactive figures plotting statistical coverage and duplication information. The JSON format report contains all the statistical results, and is interpretable for downstream programs.
Conclusions
Comparing to the conventional tools like Picard and SAMtools,
gencore
greatly reduces the output data’s mapping mismatches, which are mostly caused by errors. Comparing to some new tools like UMI-Reducer and UMI-tools,
gencore
runs much faster, uses less memory, generates better consensus reads and provides simpler interfaces. To our best knowledge,
gencore
is the only duplicate removing tool that generates both informative HTML and JSON reports. This tool is available at:
https://github.com/OpenGene/gencore
Journal Article
Quantitative Characterization of the T Cell Receptor Repertoire of Naïve and Memory Subsets Using an Integrated Experimental and Computational Pipeline Which Is Robust, Economical, and Versatile
2017
The T cell receptor (TCR) repertoire can provide a personalized biomarker for infectious and non-infectious diseases. We describe a protocol for amplifying, sequencing, and analyzing TCRs which is robust, sensitive, and versatile. The key experimental step is ligation of a single-stranded oligonucleotide to the 3' end of the TCR cDNA. This allows amplification of all possible rearrangements using a single set of primers per locus. It also introduces a unique molecular identifier to label each starting cDNA molecule. This molecular identifier is used to correct for sequence errors and for effects of differential PCR amplification efficiency, thus producing more accurate measures of the true TCR frequency within the sample. This integrated experimental and computational pipeline is applied to the analysis of human memory and naive subpopulations, and results in consistent measures of diversity and inequality. After error correction, the distribution of TCR sequence abundance in all subpopulations followed a power law over a wide range of values. The power law exponent differed between naïve and memory populations, but was consistent between individuals. The integrated experimental and analysis pipeline we describe is appropriate to studies of T cell responses in a broad range of physiological and pathological contexts.
Journal Article
Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR
by
Coassin, Stefan
,
Kronenberg, Florian
,
Amstler, Stephan
in
Analysis
,
Bioinformatics
,
Biomedical and Life Sciences
2024
Background
Repetitive genome regions, such as variable number of tandem repeats (VNTR) or short tandem repeats (STR), are major constituents of the uncharted dark genome and evade conventional sequencing approaches. The protein-coding
LPA
kringle IV type-2 (KIV-2) VNTR (5.6 kb per unit, 1–40 units per allele) is a medically highly relevant example with a particularly intricate structure, multiple haplotypes, intragenic homologies, and an intra-VNTR STR. It is the primary regulator of plasma lipoprotein(a) [Lp(a)] concentrations, an important cardiovascular risk factor. Lp(a) concentrations vary widely between individuals and ancestries. Multiple variants and functional haplotypes in the
LPA
gene and especially in the KIV-2 VNTR strongly contribute to this variance.
Methods
We evaluated the performance of amplicon-based nanopore sequencing with unique molecular identifiers (UMI-ONT-Seq) for SNP detection, haplotype mapping, VNTR unit consensus sequence generation, and copy number estimation via coverage-corrected haplotypes quantification in the KIV-2 VNTR. We used 15 human samples and low-level mixtures (0.5 to 5%) of KIV-2 plasmids as a validation set. We then applied UMI-ONT-Seq to extract KIV-2 VNTR haplotypes in 48 multi-ancestry 1000 Genome samples and analyzed at scale a poorly characterized STR within the KIV-2 VNTR.
Results
UMI-ONT-Seq detected KIV-2 SNPs down to 1% variant level with high sensitivity, specificity, and precision (0.977 ± 0.018; 1.000 ± 0.0005; 0.993 ± 0.02) and accurately retrieved the full-length haplotype of each VNTR unit. Human variant levels were highly correlated with next-generation sequencing (
R
2
= 0.983) without bias across the whole variant level range. Six reads per UMI produced sequences of each KIV-2 unit with Q40 quality. The KIV-2 repeat number determined by coverage-corrected unique haplotype counting was in close agreement with droplet digital PCR (ddPCR), with 70% of the samples falling even within the narrow confidence interval of ddPCR. We then analyzed 62,679 intra-KIV-2 STR sequences and explored KIV-2 SNP haplotype patterns across five ancestries.
Conclusions
UMI-ONT-Seq accurately retrieves the SNP haplotype and precisely quantifies the VNTR copy number of each repeat unit of the complex KIV-2 VNTR region across multiple ancestries. This study utilizes the KIV-2 VNTR, presenting a novel and potent tool for comprehensive characterization of medically relevant complex genome regions at scale.
Journal Article
A Comprehensive Characterization of Small RNA Profiles by Massively Parallel Sequencing in Six Forensic Body Fluids/Tissue
2022
Body fluids/tissue identification (BFID) is an essential procedure in forensic practice, and RNA profiling has become one of the most important methods. Small non-coding RNAs, being expressed in high copy numbers and resistant to degradation, have great potential in BFID but have not been comprehensively characterized in common forensic stains. In this study, the miRNA, piRNA, snoRNA, and snRNA were sequenced in 30 forensic relevant samples (menstrual blood, saliva, semen, skin, venous blood, and vaginal secretion) using the BGI platform. Based on small RNA profiles, relative specific markers (RSM) and absolute specific markers (ASM) were defined, which can be used to identify a specific body fluid/tissue out of two or six, respectively. A total of 5204 small RNAs were discovered including 1394 miRNAs (including 236 novel miRNA), 3157 piRNAs, 636 snoRNAs, and 17 snRNAs. RSMs for 15 pairwise body fluid/tissue groups were discovered by differential RNA analysis. In addition, 90 ASMs that were specifically expressed in a certain type of body fluid/tissue were screened, among them, snoRNAs were reported first in forensic genetics. In brief, our study deepened the understanding of small RNA profiles in forensic stains and offered potential BFID markers that can be applied in different forensic scenarios.
Journal Article