Catalogue Search | MBRL

Measuring quality of DNA sequence data via degradation

by Porter, Adam A. , Hauzel, Jason , Schaefer, Marcel in Analysis , Anomalies , Biology and Life Sciences

2022

We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes, illustrated by outlier detection. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.

Journal Article

Share this book

Add to My Shelf

Measuring quality of DNA sequence data via degradation

by Adam A Porter , Alan F Karr , Jason Hauzel

2022

Journal Article

Share this book

Add to My Shelf

Measuring Quality of DNA Sequence Data via Degradation

by Porter, Adam A , Hauzel, Jason , Karr, Alan F in Anomalies , Data analysis , Degradation

2021

We propose and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.

Paper

Share this book

Add to My Shelf

Specified Certainty Classification, with Application to Read Classification for Reference-Guided Metagenomic Assembly

by Porter, Adam A , Hauzel, Jason , Menon, Prahlad

2021

Specified Certainty Classification (SCC) is a new paradigm for employing classifiers whose outputs carry uncertainties, typically in the form of Bayesian posterior probabilities. By allowing the classifier output to be less precise than one of a set of atomic decisions, SCC allows all decisions to achieve a specified level of certainty, as well as provides insights into classifier behavior by examining all decisions that are possible. Our primary illustration is read classification for reference-guided genome assembly, but we demonstrate the breadth of SCC by also analyzing COVID-19 vaccination data.

Journal Article

Share this book

Add to My Shelf

Application of Markov Structure of Genomes to Outlier Identification and Read Classification

by Porter, Adam A , Hauzel, Jason , Karr, Alan F in Bioinformatics , Classification , Data analysis

2021

In this paper we apply the structure of genomes as second-order Markov processes specified by the distributions of successive triplets of bases to two bioinformatics problems: identification of outliers in genome databases and read classification in metagenomics, using real coronavirus and adenovirus data.

Paper

Share this book

Add to My Shelf

Specified Certainty Classification, with Application to Read Classification for Reference-Guided Metagenomic Assembly

by Porter, Adam A , Menon, Prahlad , Hauzel, Jason in Assembly , Classification , Classifiers

2021

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter