Catalogue Search | MBRL

Making inference with messy (citizen science) data

by Townsend, Philip A. , Martin, Karl J. , Anhalt-Depies, Christine in Accuracy , Algorithms , automated classification

2019

Measurement or observation error is common in ecological data: as citizen scientists and automated algorithms play larger roles processing growing volumes of data to address problems at large scales, concerns about data quality and strategies for improving it have received greater focus. However, practical guidance pertaining to fundamental data quality questions for data users or managers—how accurate do data need to be and what is the best or most efficient way to improve it?—remains limited. We present a generalizable framework for evaluating data quality and identifying remediation practices, and demonstrate the framework using trail camera images classified using crowdsourcing to determine acceptable rates of misclassification and identify optimal remediation strategies for analysis using occupancy models. We used expert validation to estimate baseline classification accuracy and simulation to determine the sensitivity of two occupancy estimators (standard and false-positive extensions) to different empirical misclassification rates. We used regression techniques to identify important predictors of misclassification and prioritize remediation strategies. More than 93% of images were accurately classified, but simulation results suggested that most species were not identified accurately enough to permit distribution estimation at our predefined threshold for accuracy (<5% absolute bias). A model developed to screen incorrect classifications predicted misclassified images with >97% accuracy: enough to meet our accuracy threshold. Occupancy models that accounted for false-positive error provided even more accurate inference even at high rates of misclassification (30%). As simulation suggested occupancy models were less sensitive to additional false-negative error, screening models or fitting occupancy models accounting for false-positive error emerged as efficient data remediation solutions. Combining simulation-based sensitivity analysis with empirical estimation of baseline error and its variability allows users and managers of potentially error-prone data to identify and fix problematic data more efficiently. It may be particularly helpful for “big data” efforts dependent upon citizen scientists or automated classification algorithms with many downstream users, but given the ubiquity of observation or measurement error, even conventional studies may benefit from focusing more attention upon data quality.

Journal Article

Share this book

Add to My Shelf

Agricultural lands offer seasonal habitats to tigers in a human‐dominated and fragmented landscape in India

by Bailey, Larissa , Warrier, Rekha , Noon, Barry R. in Agricultural ecosystems , Agricultural land , Agricultural production

2020

Conserving wide‐ranging large carnivores in human‐dominated landscapes is contingent on acknowledging the conservation value of human‐modified lands. This is particularly true for tigers (Panthera tigris), now largely dependent on small and fragmented habitats, embedded within densely populated agroecosystems in India. Devising a comprehensive conservation strategy for the species requires an understanding of the temporal patterns of space use by tiger within these human‐modified areas. These areas are often characterized by altered prey communities, novel risks resulting from high human densities and seasonally dynamic vegetative cover. Understanding space use within these areas is vital to devising human‐tiger conflict prevention measures and for conserving landscape elements critical to maintain functional connectivity between populations. We documented seasonal space‐use patterns of tigers in agricultural lands surrounding protected areas in the Central Terai Landscape (CTL) in northern India. We estimated the probability of space use and its drivers by applying dynamic occupancy models that correct for false‐positive and false‐negative errors to tiger detection‐detection data within agricultural areas. These data were generated by conducting local interviews, sign surveys, and camera trapping within 94 randomly selected 2.5‐km2 grid cells. We found that agricultural areas were used with high probability in the winter (0.64; standard error [SE] 0.08), a period of high vegetative cover availability. The use of agricultural lands was lower in the summer (0.56; SE 0.09) and was lowest in the monsoon season (0.21; SE 0.07), tracking a decline in vegetative cover and available habitat across the landscape. Availability of vegetative cover and drainage features positively influenced space use, whereas use declined with increasing distance to protected areas and the extent of human settlements. These findings highlight the role of agricultural areas in providing seasonal habitats for tigers and offer a basis for understanding where tigers and humans co‐occur in these landscapes. These findings help expand our current understanding of what constitutes large carnivore habitats to include human‐dominated agricultural areas. They underscore the need for greater integration of land‐sharing and land‐sparing initiatives to conserve large carnivores within human‐dominated agroecosystems.

Journal Article

Share this book

Add to My Shelf

Is it Possible to Individually Identify Red Foxes from Photographs?

by Denise Güthlin , Helmut Küchenhoff , Ilse Storch in Cameras , camera‐trap , Density estimation

2014

The individual identification of animals from photographs is increasingly used to obtain density estimates not only for animals with distinct natural markings but also for species with little or no distinct markings, such as coyotes (Canis latrans), pumas (Puma concolor) and tapirs (Tapirus terrestris). This lack of distinct natural markings may lead to large error rates in the assessment of photographs, and as a consequence result in poor abundance estimates. We conducted an experiment asking expert observers to identify individual red foxes (Vulpes vulpes) from a set of photographs taken by automatic camera-traps. Our objectives were to determine whether reliable individual identification of red foxes from photographs is possible and, if possible, to improve the identification process. Exact assessment of error rates in individual identification can only be achieved if photographs of known individuals are available. This is rarely the case; therefore, we used photographs of red foxes from different study sites and determined the lower limit of the proportion of false positive matches. Our analysis, based on 10 expert responses, suggested that individual identification of red foxes is not reliable. The number of individual foxes assessed by the observers varied between 4 and 23 individuals. The minimal proportion of false positive matches was very large (>50% of the photographs considered to be of the same individual were from 2 different study sites) and there was little agreement among experts on which photographs showed the same individuals. Hence, we caution against individual identification from photographs of red foxes and other animals with similar or less natural markings without further testing.

Journal Article

Share this book

Add to My Shelf

Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of F ST

by Lotterhos, Katie E. , Whitlock, Michael C. in Alleles , Demography , False positive errors

2015

Loci responsible for local adaptation are likely to have more genetic differentiation among populations than neutral loci. However, neutral loci can vary widely in their amount of genetic differentiation, even over the same geographic range. Unfortunately, the distribution of differentiation—as measured by an index such as F ST—depends on the details of the demographic history of the populations in question, even without spatially heterogeneous selection. Many methods designed to detect F ST outliers assume a specific model of demographic history, which can result in extremely high false positive rates for detecting loci under selection. We develop a new method that infers the distribution of F ST for loci unlikely to be strongly affected by spatially diversifying selection, using data on a large set of loci with unknown selective properties. Compared to previous methods, this approach, called OutFLANK, has much lower false positive rates and comparable power, as shown by simulation.

Journal Article

Share this book

Add to My Shelf

Accuracy and reliability of forensic latent fingerprint decisions

by Ulery, Bradford T , Roberts, Maria Antonia , Buscaglia, JoAnn in Accuracy , Biological Sciences , Computer software

2011

The interpretation of forensic fingerprint evidence relies on the expertise of latent print examiners. The National Research Council of the National Academies and the legal and forensic sciences communities have called for research to measure the accuracy and reliability of latent print examiners' decisions, a challenging and complex problem in need of systematic analysis. Our research is focused on the development of empirical approaches to studying this problem. Here, we report on the first large-scale study of the accuracy and reliability of latent print examiners' decisions, in which 169 latent print examiners each compared approximately 100 pairs of latent and exemplar fingerprints from a pool of 744 pairs. The fingerprints were selected to include a range of attributes and quality encountered in forensic casework, and to be comparable to searches of an automated fingerprint identification system containing more than 58 million subjects. This study evaluated examiners on key decision points in the fingerprint examination process; procedures used operationally include additional safeguards designed to minimize errors. Five examiners made false positive errors for an overall false positive rate of 0.1%. Eighty-five percent of examiners made at least one false negative error for an overall false negative rate of 7.5%. Independent examination of the same comparisons by different participants (analogous to blind verification) was found to detect all false positive errors and the majority of false negative errors in this study. Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion.

Journal Article

Share this book

Add to My Shelf

False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant

by Simmons, Joseph P. , Nelson, Leif D. , Simonsohn, Uri in Adult , Ambiguity , Biological and medical sciences

2011

In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (< .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.

Journal Article

Share this book

Add to My Shelf

Evaluation of false positive and false negative errors in targeted next generation sequencing

by Moon, Youngbeen , Kim, Young-Ho , Kim, Jong-Kwang in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2025

Background Next-generation sequencing (NGS) has become an indispensable diagnostic tool across various diseases. However, sequencing and analysis errors remain major barriers to clinical implementation. In cancer diagnostics, detecting low-level somatic variants is particularly challenging due to tumor heterogeneity and contamination from normal cells. Results We assess targeted next-generation sequencing (T-NGS) performance using reference-standard DNA mixtures of homozygote hydatidiform mole and heterozygote blood DNA at varying ratios, analyzed by certified NGS providers. Analytical sensitivity differs by up to 13.9-fold, and false positive (FP) error rates vary up to 615-fold, depending on provider and pipeline. For identical raw data, DRAGEN and the in-house pipeline differ by up to 36.3-fold in FP error rates. Moderately recurrent FP-prone alleles, although representing only 5.37% of all FP sites, contribute to 36.7% of total FP errors in the Geninus in-house result. Among 22 discordant variant calls between DRAGEN and in-house analyses, more than half of them are not confirmed by single base extension assays, indicating likely false positives. Compared to DRAGEN, a conventional BWA + GATK Mutect2 pipeline maintains equivalent sensitivity but produces a 4-fold increase in FP errors, along with a notable enrichment of recurrent FP-prone alleles. Conclusions T-NGS results from certified providers exhibit substantial variability in both sensitivity and FP error rates. Conventional pipelines not only increase FP errors but also accumulate recurrent FP-prone alleles. These findings underscore the urgent need for standardized pipelines and rigorous quality control measures to ensure the reliability of T-NGS in clinical diagnostics.

Journal Article

Share this book

Add to My Shelf

What Are Error Rates for Classifying Teacher and School Performance Using Value-Added Models?

by Schochet, Peter Z. , Chiang, Hanley S. in Academic Achievement , Achievement Gains , Analytical estimating

2013

This article addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using a realistic performance measurement system scheme based on hypothesis testing, the authors develop error rate formulas based on ordinary least squares and Empirical Bayes estimators. Empirical results suggest that value-added estimates are likely to be noisy using the amount of data that are typically used in practice. Type I and II error rates for comparing a teacher's performance to the average are likely to be about 25% with 3 years of data and 35% with I year of data. Corresponding error rates for overall false positive and negative errors are 10% and 20%, respectively. Lower error rates can be achieved if schools are the performance unit. The results suggest that policymakers must carefully consider likely system error rates when using value-added estimates to make high-stakes decisions regarding educators.

Journal Article

Share this book

Add to My Shelf

Integrating multiple data sources in species distribution modeling: a framework for data fusion

by Gardner, Beth , Singh, Susheela , Stauffer, Glenn in autocorrelation , Autoregressive models , biogeography

2017

The last decade has seen a dramatic increase in the use of species distribution models (SDMs) to characterize patterns of species' occurrence and abundance. Efforts to parameterize SDMs often create a tension between the quality and quantity of data available to fit models. Estimation methods that integrate both standardized and non-standardized data types offer a potential solution to the tradeoff between data quality and quantity. Recently several authors have developed approaches for jointly modeling two sources of data (one of high quality and one of lesser quality). We extend their work by allowing for explicit spatial autocorrelation in occurrence and detection error using a Multivariate Conditional Autoregressive (MVCAR) model and develop three models that share information in a less direct manner resulting in more robust performance when the auxiliary data is of lesser quality. We describe these three new approaches (\"Shared,\" \"Correlation,\" \"Covariates\") for combining data sources and show their use in a case study of the Brown-headed Nuthatch in the Southeastern U.S. and through simulations. All three of the approaches which used the second data source improved out-of-sample predictions relative to a single data source (\"Single\"). When information in the second data source is of high quality, the Shared model performs the best, but the Correlation and Covariates model also perform well. When the information quality in the second data source is of lesser quality, the Correlation and Covariates model performed better suggesting they are robust alternatives when little is known about auxiliary data collected opportunistically or through citizen scientists. Methods that allow for both data types to be used will maximize the useful information available for estimating species distributions.

Journal Article

Share this book

Add to My Shelf

Is the Replicability Crisis Overblown? Three Arguments Examined

by Pashler, Harold , Harris, Christine R. in Attitudes , Bias , Cognitive psychology

2012

We discuss three arguments voiced by scientists who view the current outpouring of concern about replicability as overblown. The first idea is that the adoption of a low alpha level (e.g., 5%) puts reasonable bounds on the rate at which errors can enter the published literature, making false-positive effects rare enough to be considered a minor issue. This, we point out, rests on statistical misunderstanding: The alpha level imposes no limit on the rate at which errors may arise in the literature (loannidis, 2005b). Second, some argue that whereas direct replication attempts are uncommon, conceptual replication attempts are common—providing an even better test of the validity of a phenomenon. We contend that performing conceptual rather than direct replication attempts interacts insidiously with publication bias, opening the door to literatures that appear to confirm the reality of phenomena that in fact do not exist. Finally, we discuss the argument that errors will eventually be pruned out of the literature if the field would just show a bit of patience. We contend that there are no plausible concrete scenarios to back up such forecasts and that what is needed is not patience, but rather systematic reforms in scientific practice.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter