Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
1,512 result(s) for "False positive errors"
Sort by:
Making inference with messy (citizen science) data
Measurement or observation error is common in ecological data: as citizen scientists and automated algorithms play larger roles processing growing volumes of data to address problems at large scales, concerns about data quality and strategies for improving it have received greater focus. However, practical guidance pertaining to fundamental data quality questions for data users or managers—how accurate do data need to be and what is the best or most efficient way to improve it?—remains limited. We present a generalizable framework for evaluating data quality and identifying remediation practices, and demonstrate the framework using trail camera images classified using crowdsourcing to determine acceptable rates of misclassification and identify optimal remediation strategies for analysis using occupancy models. We used expert validation to estimate baseline classification accuracy and simulation to determine the sensitivity of two occupancy estimators (standard and false-positive extensions) to different empirical misclassification rates. We used regression techniques to identify important predictors of misclassification and prioritize remediation strategies. More than 93% of images were accurately classified, but simulation results suggested that most species were not identified accurately enough to permit distribution estimation at our predefined threshold for accuracy (<5% absolute bias). A model developed to screen incorrect classifications predicted misclassified images with >97% accuracy: enough to meet our accuracy threshold. Occupancy models that accounted for false-positive error provided even more accurate inference even at high rates of misclassification (30%). As simulation suggested occupancy models were less sensitive to additional false-negative error, screening models or fitting occupancy models accounting for false-positive error emerged as efficient data remediation solutions. Combining simulation-based sensitivity analysis with empirical estimation of baseline error and its variability allows users and managers of potentially error-prone data to identify and fix problematic data more efficiently. It may be particularly helpful for “big data” efforts dependent upon citizen scientists or automated classification algorithms with many downstream users, but given the ubiquity of observation or measurement error, even conventional studies may benefit from focusing more attention upon data quality.
Is it Possible to Individually Identify Red Foxes from Photographs?
The individual identification of animals from photographs is increasingly used to obtain density estimates not only for animals with distinct natural markings but also for species with little or no distinct markings, such as coyotes (Canis latrans), pumas (Puma concolor) and tapirs (Tapirus terrestris). This lack of distinct natural markings may lead to large error rates in the assessment of photographs, and as a consequence result in poor abundance estimates. We conducted an experiment asking expert observers to identify individual red foxes (Vulpes vulpes) from a set of photographs taken by automatic camera-traps. Our objectives were to determine whether reliable individual identification of red foxes from photographs is possible and, if possible, to improve the identification process. Exact assessment of error rates in individual identification can only be achieved if photographs of known individuals are available. This is rarely the case; therefore, we used photographs of red foxes from different study sites and determined the lower limit of the proportion of false positive matches. Our analysis, based on 10 expert responses, suggested that individual identification of red foxes is not reliable. The number of individual foxes assessed by the observers varied between 4 and 23 individuals. The minimal proportion of false positive matches was very large (>50% of the photographs considered to be of the same individual were from 2 different study sites) and there was little agreement among experts on which photographs showed the same individuals. Hence, we caution against individual identification from photographs of red foxes and other animals with similar or less natural markings without further testing.
Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of F ST
Loci responsible for local adaptation are likely to have more genetic differentiation among populations than neutral loci. However, neutral loci can vary widely in their amount of genetic differentiation, even over the same geographic range. Unfortunately, the distribution of differentiation—as measured by an index such as F ST—depends on the details of the demographic history of the populations in question, even without spatially heterogeneous selection. Many methods designed to detect F ST outliers assume a specific model of demographic history, which can result in extremely high false positive rates for detecting loci under selection. We develop a new method that infers the distribution of F ST for loci unlikely to be strongly affected by spatially diversifying selection, using data on a large set of loci with unknown selective properties. Compared to previous methods, this approach, called OutFLANK, has much lower false positive rates and comparable power, as shown by simulation.
Accuracy and reliability of forensic latent fingerprint decisions
The interpretation of forensic fingerprint evidence relies on the expertise of latent print examiners. The National Research Council of the National Academies and the legal and forensic sciences communities have called for research to measure the accuracy and reliability of latent print examiners' decisions, a challenging and complex problem in need of systematic analysis. Our research is focused on the development of empirical approaches to studying this problem. Here, we report on the first large-scale study of the accuracy and reliability of latent print examiners' decisions, in which 169 latent print examiners each compared approximately 100 pairs of latent and exemplar fingerprints from a pool of 744 pairs. The fingerprints were selected to include a range of attributes and quality encountered in forensic casework, and to be comparable to searches of an automated fingerprint identification system containing more than 58 million subjects. This study evaluated examiners on key decision points in the fingerprint examination process; procedures used operationally include additional safeguards designed to minimize errors. Five examiners made false positive errors for an overall false positive rate of 0.1%. Eighty-five percent of examiners made at least one false negative error for an overall false negative rate of 7.5%. Independent examination of the same comparisons by different participants (analogous to blind verification) was found to detect all false positive errors and the majority of false negative errors in this study. Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion.
False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant
In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (< .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.
Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records
In both political behavior research and voting rights litigation, turnout and vote choice for different racial groups are often inferred using aggregate election results and racial composition. Over the past several decades, many statistical methods have been proposed to address this ecological inference problem. We propose an alternative method to reduce aggregation bias by predicting individual-level ethnicity from voter registration records. Building on the existing methodological literature, we use Bayes's rule to combine the Census Bureau's Surname List with various information from geocoded voter registration records. We evaluate the performance of the proposed methodology using approximately nine million voter registration records from Florida, where self-reported ethnicity is available. We find that it is possible to reduce the false positive rate among Black and Latino voters to 6% and 3%, respectively, while maintaining the true positive rate above 80%. Moreover, we use our predictions to estimate turnout by race and find that our estimates yields substantially less amounts of bias and root mean squared error than standard ecological inference estimates. We provide open-source software to implement the proposed methodology.
Integrating multiple data sources in species distribution modeling: a framework for data fusion
The last decade has seen a dramatic increase in the use of species distribution models (SDMs) to characterize patterns of species' occurrence and abundance. Efforts to parameterize SDMs often create a tension between the quality and quantity of data available to fit models. Estimation methods that integrate both standardized and non-standardized data types offer a potential solution to the tradeoff between data quality and quantity. Recently several authors have developed approaches for jointly modeling two sources of data (one of high quality and one of lesser quality). We extend their work by allowing for explicit spatial autocorrelation in occurrence and detection error using a Multivariate Conditional Autoregressive (MVCAR) model and develop three models that share information in a less direct manner resulting in more robust performance when the auxiliary data is of lesser quality. We describe these three new approaches (\"Shared,\" \"Correlation,\" \"Covariates\") for combining data sources and show their use in a case study of the Brown-headed Nuthatch in the Southeastern U.S. and through simulations. All three of the approaches which used the second data source improved out-of-sample predictions relative to a single data source (\"Single\"). When information in the second data source is of high quality, the Shared model performs the best, but the Correlation and Covariates model also perform well. When the information quality in the second data source is of lesser quality, the Correlation and Covariates model performed better suggesting they are robust alternatives when little is known about auxiliary data collected opportunistically or through citizen scientists. Methods that allow for both data types to be used will maximize the useful information available for estimating species distributions.
Agricultural lands offer seasonal habitats to tigers in a human‐dominated and fragmented landscape in India
Conserving wide‐ranging large carnivores in human‐dominated landscapes is contingent on acknowledging the conservation value of human‐modified lands. This is particularly true for tigers (Panthera tigris), now largely dependent on small and fragmented habitats, embedded within densely populated agroecosystems in India. Devising a comprehensive conservation strategy for the species requires an understanding of the temporal patterns of space use by tiger within these human‐modified areas. These areas are often characterized by altered prey communities, novel risks resulting from high human densities and seasonally dynamic vegetative cover. Understanding space use within these areas is vital to devising human‐tiger conflict prevention measures and for conserving landscape elements critical to maintain functional connectivity between populations. We documented seasonal space‐use patterns of tigers in agricultural lands surrounding protected areas in the Central Terai Landscape (CTL) in northern India. We estimated the probability of space use and its drivers by applying dynamic occupancy models that correct for false‐positive and false‐negative errors to tiger detection‐detection data within agricultural areas. These data were generated by conducting local interviews, sign surveys, and camera trapping within 94 randomly selected 2.5‐km2 grid cells. We found that agricultural areas were used with high probability in the winter (0.64; standard error [SE] 0.08), a period of high vegetative cover availability. The use of agricultural lands was lower in the summer (0.56; SE 0.09) and was lowest in the monsoon season (0.21; SE 0.07), tracking a decline in vegetative cover and available habitat across the landscape. Availability of vegetative cover and drainage features positively influenced space use, whereas use declined with increasing distance to protected areas and the extent of human settlements. These findings highlight the role of agricultural areas in providing seasonal habitats for tigers and offer a basis for understanding where tigers and humans co‐occur in these landscapes. These findings help expand our current understanding of what constitutes large carnivore habitats to include human‐dominated agricultural areas. They underscore the need for greater integration of land‐sharing and land‐sparing initiatives to conserve large carnivores within human‐dominated agroecosystems.
The False-positive to False-negative Ratio in Epidemiologic Studies
The ratio of false-positive to false-negative findings (FP:FN ratio) is an informative metric that warrants further evaluation. The FP:FN ratio varies greatly across different epidemiologic areas. In genetic epidemiology, it has varied from very high values (possibly even >100:1) for associations reported in candidate-gene studies to very low values (1:100 or lower) for associations with genome-wide significance. The substantial reduction over time in the FP:FN ratio in human genome epidemiology has corresponded to the routine adoption of stringent inferential criteria and comprehensive, agnostic reporting of all analyses. Most traditional fields of epidemiologic research more closely follow the practices of past candidate gene epidemiology, and thus have high FP:FN ratios. Further, FP and FN results do not necessarily entail the same consequences, and their relative importance may vary in different settings. This ultimately has implications for what is the acceptable FP:FN ratio and for how the results of published epidemiologic studies should be presented and interpreted.
PARTIAL DISTANCE CORRELATION WITH METHODS FOR DISSIMILARITIES
Distance covariance and distance correlation are scalar coefficients that characterize independence of random vectors in arbitrary dimension. Properties, extensions and applications of distance correlation have been discussed in the recent literature, but the problem of defining the partial distance correlation has remained an open question of considerable interest. The problem of partial distance correlation is more complex than partial correlation partly because the squared distance covariance is not an inner product in the usual linear space. For the definition of partial distance correlation, we introduce a new Hubert space where the squared distance covariance is the inner product. We define the partial distance correlation statistics with the help of this Hubert space, and develop and implement a test for zero partial distance correlation. Our intermediate results provide an unbiased estimator of squared distance covariance, and a neat solution to the problem of distance correlation for dissimilarities rather than distances.