MbrlCatalogueTitleDetail

Do you wish to reserve the book?
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data
Hey, we have placed the reservation for you!
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Title added to your shelf!
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
How would you like to get it?
We have requested the book for you! Sorry the robot delivery is not available at the moment
We have requested the book for you!
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data
Paper

BDQC: a general-purpose analytics tool for domain-blind validation of Big Data

2018
Request Book From Autostore and Choose the Collection Method
Overview
Translational biomedical research is generating exponentially more data: thousands of whole-genome sequences (WGS) are now available; brain data are doubling every two years. Analyses of Big Data, including imaging, genomic, phenotypic, and clinical data, present qualitatively new challenges as well as opportunities. Among the challenges is a proliferation in ways analyses can fail, due largely to the increasing length and complexity of processing pipelines. Anomalies in input data, runtime resource exhaustion or node failure in a distributed computation can all cause pipeline hiccups that are not necessarily obvious in the output. Flaws that can taint results may persist undetected in complex pipelines, a danger amplified by the fact that research is often concurrent with the development of the software on which it depends. On the positive side, the huge sample sizes increase statistical power, which in turn can shed new insight and motivate innovative analytic approaches. We have developed a framework for Big Data Quality Control (BDQC) including an extensible set of heuristic and statistical analyses that identify deviations in data without regard to its meaning (domain-blind analyses). BDQC takes advantage of large sample sizes to classify the samples, estimate distributions and identify outliers. Such outliers may be symptoms of technology failure (e.g., truncated output of one step of a pipeline for a single genome) or may reveal unsuspected \"signal\" in the data (e.g., evidence of aneuploidy in a genome). We have applied the framework to validate real-world WGS analysis pipelines. BDQC successfully identified data outliers representing various failure classes, including genome analyses missing a whole chromosome or part thereof, hidden among thousands of intermediary output files. These failures could then be resolved by reanalyzing the affected samples. BDQC both identified hidden flaws as well as yielded new insights into the data. BDQC is designed to complement quality software development practices. There are multiple benefits from the application of BDQC at all pipeline stages. By verifying input correctness, it can help avoid expensive computations on flawed data. Analysis of intermediary and final results facilitates recovery from aberrant termination of processes. All these computationally inexpensive verifications reduce cryptic analytical artifacts that could seriously preclude clinical-grade genome interpretation. BDQC is available at https://github.com/ini-bdds/bdqc.
Publisher
Cold Spring Harbor Laboratory Press,Cold Spring Harbor Laboratory