Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Making inference with messy (citizen science) data

by Townsend, Philip A. , Martin, Karl J. , Anhalt-Depies, Christine , Frett, Susan , Clare, John D. J. , Van Deelen, Timothy R. , Stenglein, Jennifer L. , Singh, Aditya , Zuckerberg, Benjamin , Locke, Christina

in Accuracy / Algorithms / automated classification / Automation / citizen science / citizen scientists / Classification / Computer simulation / Crowdsourcing / data quality / Ecological monitoring / Empirical analysis / Error detection / false‐positive error / Inference / misclassification / Model accuracy / Occupancy / Regression analysis / Remediation / remote camera / Scientists / screening / Sensitivity analysis / species distribution model

2019

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Making inference with messy (citizen science) data

2019

Confirm

Do you wish to request the book?

Making inference with messy (citizen science) data

2019

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Making inference with messy (citizen science) data

Townsend, Philip A.,

Martin, Karl J.,

Anhalt-Depies, Christine,

Frett, Susan,

Clare, John D. J.,

Van Deelen, Timothy R.,

Stenglein, Jennifer L.,

Singh, Aditya,

Zuckerberg, Benjamin,

Locke, Christina

2019

Overview

Measurement or observation error is common in ecological data: as citizen scientists and automated algorithms play larger roles processing growing volumes of data to address problems at large scales, concerns about data quality and strategies for improving it have received greater focus. However, practical guidance pertaining to fundamental data quality questions for data users or managers—how accurate do data need to be and what is the best or most efficient way to improve it?—remains limited. We present a generalizable framework for evaluating data quality and identifying remediation practices, and demonstrate the framework using trail camera images classified using crowdsourcing to determine acceptable rates of misclassification and identify optimal remediation strategies for analysis using occupancy models. We used expert validation to estimate baseline classification accuracy and simulation to determine the sensitivity of two occupancy estimators (standard and false-positive extensions) to different empirical misclassification rates. We used regression techniques to identify important predictors of misclassification and prioritize remediation strategies. More than 93% of images were accurately classified, but simulation results suggested that most species were not identified accurately enough to permit distribution estimation at our predefined threshold for accuracy (<5% absolute bias). A model developed to screen incorrect classifications predicted misclassified images with >97% accuracy: enough to meet our accuracy threshold. Occupancy models that accounted for false-positive error provided even more accurate inference even at high rates of misclassification (30%). As simulation suggested occupancy models were less sensitive to additional false-negative error, screening models or fitting occupancy models accounting for false-positive error emerged as efficient data remediation solutions. Combining simulation-based sensitivity analysis with empirical estimation of baseline error and its variability allows users and managers of potentially error-prone data to identify and fix problematic data more efficiently. It may be particularly helpful for “big data” efforts dependent upon citizen scientists or automated classification algorithms with many downstream users, but given the ubiquity of observation or measurement error, even conventional studies may benefit from focusing more attention upon data quality.

Share this book

Add to My Shelf

Publisher

UNKNOWN,Ecological Society of America

Subject

Accuracy

/ Algorithms

/ automated classification