Catalogue Search | MBRL

Probabilistic Cause-of-Death Assignment Using Verbal Autopsies

by McCormick, Tyler H. , Li, Zehang Richard , Crampin, Amelia C. in Applications and Case Studies , Assignment , Assignment problem

2016

In regions without complete-coverage civil registration and vital statistics systems there is uncertainty about even the most basic demographic indicators. In such regions, the majority of deaths occur outside hospitals and are not recorded. Worldwide, fewer than one-third of deaths are assigned a cause, with the least information available from the most impoverished nations. In populations like this, verbal autopsy (VA) is a commonly used tool to assess cause of death and estimate cause-specific mortality rates and the distribution of deaths by cause. VA uses an interview with caregivers of the decedent to elicit data describing the signs and symptoms leading up to the death. This article develops a new statistical tool known as InSilicoVA to classify cause of death using information acquired through VA. InSilicoVA shares uncertainty between cause of death assignments for specific individuals and the distribution of deaths by cause across the population. Using side-by-side comparisons with both observed and simulated data, we demonstrate that InSilicoVA has distinct advantages compared to currently available methods. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Using Aggregated Relational Data to Feasibly Identify Network Structure without Network Data

by McCormick, Tyler H. , Pan, Mengjie , Breza, Emily

2020

Social network data are often prohibitively expensive to collect, limiting empirical network research. We propose an inexpensive and feasible strategy for network elicitation using Aggregated Relational Data (ARD): responses to questions of the form “how many of your links have trait k ?” Our method uses ARD to recover parameters of a network formation model, which permits sampling from a distribution over node- or graph-level statistics. We replicate the results of two field experiments that used network data and draw similar conclusions with ARD alone.

Journal Article

Share this book

Add to My Shelf

Non-confirming replication of “Performance of InSilicoVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards,” by Flaxman et al

by McCormick, Tyler H. , Clark, Samuel J. , Li, Zehang Richard in Algorithms , Analysis , Autopsies

2020

Background A verbal autopsy (VA) is an interview conducted with the caregivers of someone who has recently died to describe the circumstances of the death. In recent years, several algorithmic methods have been developed to classify cause of death using VA data. The performance of one method—InSilicoVA—was evaluated in a study by Flaxman et al., published in BMC Medicine in 2018. The results of that study are different from those previously published by our group. Methods Based on the description of methods in the Flaxman et al. study, we attempt to replicate the analysis to understand why the published results differ from those of our previous work. Results We failed to reproduce the results published in Flaxman et al. Most of the discrepancies we find likely result from undocumented differences in data pre-processing, and/or values assigned to key parameters governing the behavior of the algorithm. Conclusion This finding highlights the importance of making replication code available along with published results. All code necessary to replicate the work described here is freely available on GitHub.

Journal Article

Share this book

Add to My Shelf

INTERPRETABLE CLASSIFIERS USING RULES AND BAYESIAN ANALYSIS: BUILDING A BETTER STROKE PREDICTION MODEL

by McCormick, Tyler H. , Letham, Benjamin , Rudin, Cynthia in Bayesian analysis , classification , interpretability

2015

We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS₂ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS₂, but more accurate.

Journal Article

Share this book

Add to My Shelf

Automated versus physician assignment of cause of death for verbal autopsies: randomized trial of 9374 deaths in 117 villages in India

by Shah, Utkarsh , McCormick, Tyler H. , Li, Zehang Richard in Adult , Algorithms , Analysis

2019

Background Verbal autopsies with physician assignment of cause of death (COD) are commonly used in settings where medical certification of deaths is uncommon. It remains unanswered if automated algorithms can replace physician assignment. Methods We randomized verbal autopsy interviews for deaths in 117 villages in rural India to either physician or automated COD assignment. Twenty-four trained lay (non-medical) surveyors applied the allocated method using a laptop-based electronic system. Two of 25 physicians were allocated randomly to independently code the deaths in the physician assignment arm. Six algorithms (Naïve Bayes Classifier (NBC), King-Lu, InSilicoVA, InSilicoVA-NT, InterVA-4, and SmartVA) coded each death in the automated arm. The primary outcome was concordance with the COD distribution in the standard physician-assigned arm. Four thousand six hundred fifty-one (4651) deaths were allocated to physician (standard), and 4723 to automated arms. Results The two arms were nearly identical in demographics and key symptom patterns. The average concordances of automated algorithms with the standard were 62%, 56%, and 59% for adult, child, and neonatal deaths, respectively. Automated algorithms showed inconsistent results, even for causes that are relatively easy to identify such as road traffic injuries. Automated algorithms underestimated the number of cancer and suicide deaths in adults and overestimated other injuries in adults and children. Across all ages, average weighted concordance with the standard was 62% (range 79–45%) with the best to worst ranking automated algorithms being InterVA-4, InSilicoVA-NT, InSilicoVA, SmartVA, NBC, and King-Lu. Individual-level sensitivity for causes of adult deaths in the automated arm was low between the algorithms but high between two independent physicians in the physician arm. Conclusions While desirable, automated algorithms require further development and rigorous evaluation. Lay reporting of deaths paired with physician COD assignment of verbal autopsies, despite some limitations, remains a practicable method to document the patterns of mortality reliably for unattended deaths. Trial registration ClinicalTrials.gov , NCT02810366. Submitted on 11 April 2016.

Journal Article

Share this book

Add to My Shelf

Methods for correcting inference based on outcomes predicted by machine learning

by McCormick, Tyler H. , Wang, Siruo , Lee, Jeffrey T. in Algorithms , Autopsies , Autopsy

2020

Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference. In this paper, we develop methods for correcting statistical inference using outcomes predicted with arbitrarily complicated machine-learning models including random forests and deep neural nets. Rather than trying to derive the correction from first principles for each machine-learning algorithm, we observe that there is typically a low-dimensional and easily modeled representation of the relationship between the observed and predicted outcomes. We build an approach for postprediction inference that naturally fits into the standard machine-learning framework where the data are divided into training, testing, and validation sets. We train the prediction model in the training set, estimate the relationship between the observed and predicted outcomes in the testing set, and use that relationship to correct subsequent inference in the validation set. We show our postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes. To show the broad range of applicability of our approach, we show postpi can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data. Our method is available through an open-source R package: https://github.com/leekgroup/postpi.

Journal Article

Share this book

Add to My Shelf

LATENT SPACE MODELS FOR MULTIVIEW NETWORK DATA

by McCormick, Tyler H. , Salter-Townshend, Michael

2017

Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (e.g., an individual will not trust all of his/her acquaintances). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper, we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks [see, e.g., J. Amer. Statist. Assoc. 97 (2002) 1090–1098]. These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India [Banerjee et al. (2013)].

Journal Article

Share this book

Add to My Shelf

Latent Surface Models for Networks Using Aggregated Relational Data

by McCormick, Tyler H. , Zheng, Tian in Aggregate data , Bayesian methods , Computation

2015

Despite increased interest across a range of scientific applications in modeling and understanding social network structure, collecting complete network data remains logistically and financially challenging, especially in the social sciences. This article introduces a latent surface representation of social network structure for partially observed network data. We derive a multivariate measure of expected (latent) distance between an observed actor and unobserved actors with given features. We also draw novel parallels between our work and dependent data in spatial and ecological statistics. We demonstrate the contribution of our model using a random digit-dial telephone survey and a multiyear prospective study of the relationship between network structure and the spread of infectious disease. The model proposed here is related to previous network models which represents high-dimensional structure through a projection to a low-dimensional latent geometric surface-encoding dependence as distance in the space. We develop a latent surface model for cases when complete network data are unavailable. We focus specifically on aggregated relational data (ARD) which measure network structure indirectly by asking respondents how many connections they have with members of a certain subpopulation (e.g., How many individuals do you know who are HIV positive?) and are easily added to existing surveys. Instead of conditioning on the (latent) distance between two members of the network, the latent surface model for ARD conditions on the expected distance between a survey respondent and the center of a subpopulation on a latent manifold surface. A spherical latent surface and angular distance across the sphere's surface facilitate tractable computation of this expectation. This model estimates relative homogeneity between groups in the population and variation in the propensity for interaction between respondents and group members. The model also estimates features of groups which are difficult to reach using standard surveys (e.g., the homeless). Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

An Expectation Conditional Maximization Approach for Gaussian Graphical Models

by McCormick, Tyler H. , Li, Zehang Richard in Advances in Sampling and Optimization , Algorithms , Bayesian analysis

2019

Bayesian graphical models are a useful tool for understanding dependence relationships among many variables, particularly in situations with external prior information. In high-dimensional settings, the space of possible graphs becomes enormous, rendering even state-of-the-art Bayesian stochastic search computationally infeasible. We propose a deterministic alternative to estimate Gaussian and Gaussian copula graphical models using an expectation conditional maximization (ECM) algorithm, extending the EM approach from Bayesian variable selection to graphical model estimation. We show that the ECM approach enables fast posterior exploration under a sequence of mixture priors, and can incorporate multiple sources of information. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Consistently estimating network statistics using aggregated relational data

by McCormic, Tyler H. , Pan, Mengjie , Breza, Emily in Community structure , Eigenvectors , Mathematical models

2023

Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD),which ask respondents questions of the form “How many people with trait X do you know?” provide a low-cost option when collecting complete network data is not possible. Rather than asking about connections between each pair of individuals directly, ARD collect the number of contacts the respondent knows with a given trait. Despite widespread use and a growing literature on ARD methodology, there is still no systematic understanding of when and why ARD should accurately recover features of the unobserved network. This paper provides such a characterization by deriving conditions under which statistics about the unobserved network (or functions of these statistics like regression coefficients) can be consistently estimated using ARD. We first provide consistent estimates of network model parameters for three commonly used probabilistic models: the beta-model with node-specific unobserved effects, the stochastic block model with unobserved community structure, and latent geometric space models with unobserved latent locations. A key observation is that cross-group link probabilities for a collection of (possibly unobserved) groups identify the model parameters, meaning ARD are sufficient for parameter estimation. With these estimated parameters, it is possible to simulate graphs from the fitted distribution and analyze the distribution of network statistics. We can then characterize conditions under which the simulated networks based on ARD will allow for consistent estimation of the unobserved network statistics, such as eigenvector centrality, or response functions by or of the unobserved network, such as regression coefficients.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter