Catalogue Search | MBRL

A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data

by Wang, Ye , Liu, Licheng , Xu, Yiqing in Causality , Counterfactual thinking , Diagnostic tests

2024

This paper introduces a simple framework of counterfactual estimation for causal inference with time-series cross-sectional data, in which we estimate the average treatment effect on the treated by directly imputing counterfactual outcomes for treated observations. We discuss several novel estimators under this framework, including the fixed effects counterfactual estimator, interactive fixed effects counterfactual estimator and matrix completion estimator. They provide more reliable causal estimates than conventional two-way fixed effects models when treatment effects are heterogeneous or unobserved time-varying confounders exist. Moreover, we propose a new dynamic treatment effects plot, along with several diagnostic tests, to help researchers gauge the validity of the identifying assumptions. We illustrate these methods with two political economy examples and develop an open-source package, fect, in both R and Stata to facilitate implementation.

Journal Article

Share this book

Add to My Shelf

Three-level meta-analysis of dependent effect sizes

by López-López, José Antonio , Van den Noortgate, Wim , Marín-Martínez, Fulgencio in Behavioral Science and Psychology , Bias , Children & youth

2013

Although dependence in effect sizes is ubiquitous, commonly used meta-analytic methods assume independent effect sizes. We describe and illustrate three-level extensions of a mixed effects meta-analytic model that accounts for various sources of dependence within and across studies, because multilevel extensions of meta-analytic models still are not well known. We also present a three-level model for the common case where, within studies, multiple effect sizes are calculated using the same sample. Whereas this approach is relatively simple and does not require imputing values for the unknown sampling covariances, it has hardly been used, and its performance has not been empirically investigated. Therefore, we set up a simulation study, showing that also in this situation, a three-level approach yields valid results: Estimates of the treatment effects and the corresponding standard errors are unbiased.

Journal Article

Share this book

Add to My Shelf

STOCK MARKET VOLATILITY AND MACROECONOMIC FUNDAMENTALS

by Ghysels, Eric , Sohn, Bumjean , Engle, Robert F. in 1885-2010 , Börsenkurs , Economic models

2013

We revisit the relation between stock market volatility and macroeconomic activity using a new class of component models that distinguish short-run from long-run movements. We formulate models with the long-term component driven by inflation and industrial production growth that are in terms of pseudo out-of-sample prediction for horizons of one quarter at par or outperform more traditional time series volatility models at longer horizons. Hence, imputing economic fundamentals into volatility models pays off in terms of long-horizon forecasting. We also find that macroeconomic fundamentals play a significant role even at short horizons.

Journal Article

Share this book

Add to My Shelf

Imputing Alzheimer's disease phenotype in individuals with mild cognitive impairment for inclusion in gene discovery tests

by Adams, Larry D , Starks, Takiyah D. , Cuccaro, Michael L in Age groups , Alzheimer's disease , Associations

2025

Background Genetic association studies of Alzheimer's disease (AD) typically compare cognitively unimpaired (CU) controls to clinically diagnosed AD cases, excluding individuals with mild cognitive impairment (MCI) due to uncertainty about progression. We hypothesize that estimating the probability of MCI individuals developing AD could enhance case‐control analyses and improve detection of genetic associations. Method Given a dataset of AD cases, CU controls, and MCI individuals, we propose a multi‐step approach: (1) Use stepwise logistic regression with AD and CU individuals (excluding MCI) to identify a “best model” based on non‐genetic covariates, e.g., clinical, demographic, and biomarker data; (2) Fit this model to estimate the probability (pi) of AD for each MCI individual; (3) Incorporate MCI individuals as “cases” in a genetic association test using two possible approaches. The first reclassifies MCI individuals with pi > t as cases and includes them in an association test statistic. The second resamples MCI individuals as cases probabilistically based on pi to generate an empirical distribution of test statistics. As a proof of principle, we applied this framework to assess the association of the APOE‐e4 allele with AD in 1420 individuals (448 AD cases, 714 CU controls, 258 MCI). Results As a benchmark, testing APOE‐e4 using only AD cases and CU controls yielded an odds ratio (OR) of 3.25 and test statistic Z = 9.088. Including all MCI individuals as cases reduced the effect and statistic (OR=1.97;Z=6.628). Our imputation procedure identified a “best model” incorporating cohort, sex, age, pTau‐181, memory box score from Clinical Dementia Rating, and interactions between pTau‐181, age, and sex. The threshold model with t between 0.6 and 1 slightly improved the test statistic compared to using cases and controls alone, e.g., t = 0.8 adds 24 MCI individuals to cases and yields OR=3.22;Z=9.174. The resampling approach performed better than including all MCI individuals but not as well as the threshold method. Conclusion Clinical, demographic, and biomarker data can be used to impute “caseness” for MCI individuals in genetic association tests. Even a small number of imputed MCI cases improved test statistics, suggesting greater benefits in larger datasets.

Journal Article

Share this book

Add to My Shelf

Name-based demographic inference and the unequal distribution of misrecognition

by King, Molly M. , Lockhart, Jeffrey W. , Munsch, Christin in 4014/4045 , 4014/523 , Academic staff

2023

Academics and companies increasingly draw on large datasets to understand the social world, and name-based demographic ascription tools are widespread for imputing information that is often missing from these large datasets. These approaches have drawn criticism on ethical, empirical and theoretical grounds. Using a survey of all authors listed on articles in sociology, economics and communication journals in Web of Science between 2015 and 2020, we compared self-identified demographics with name-based imputations of gender and race/ethnicity for 19,924 scholars across four gender ascription tools and four race/ethnicity ascription tools. We found substantial inequalities in how these tools misgender and misrecognize the race/ethnicity of authors, distributing erroneous ascriptions unevenly among other demographic traits. Because of the empirical and ethical consequences of these errors, scholars need to be cautious with the use of demographic imputation. We recommend five principles for the responsible use of name-based demographic inference. Algorithmic gender and race/ethnicity inference tools based on author names have very high error rates in marginalized communities. This may result in misleading results in many computational social science and sociology projects.

Journal Article

Share this book

Add to My Shelf

Imputation of Missing Cognitive Assessment Scores in Alzheimer's Disease: A Self‐Attention Based Deep Learning Approach

by Jagatha, Bargav , Ang, Ting Fang Alvin , Au, Rhoda in Alzheimer's disease , Analysis of covariance , Anxiety

2025

Background Missing data in longitudinal cognitive assessments may occur due to logistical (e.g., scheduling conflicts), personal (e.g., reduced motivation), and health‐related issues (e.g., deteriorated physical or cognitive functions, anxiety, etc.). This issue poses significant challenges for clinical research and patient monitoring. Deep learning‐based imputation approaches, which are agonistic to assumptions of data distribution and covariance structure, outperform traditional methods in imputing longitudinal missing data in other domains. We aim to evaluate their effectiveness in the domain of Alzheimer's Disease (AD), an area that has not been extensively explored. Method We developed a deep learning imputation approach using the self‐attention‐based imputation for time series (SAITS) model and de‐identified data accessed on the AD Data Initiative's AD workbench. SAITS imputes missing values in multivariate time series by leveraging self‐attention mechanisms to capture both temporal dependencies and feature correlations. It is trained by jointly optimizing imputation and reconstruction tasks on training data. We trained and evaluated SAITS using the Mini‐Mental State Examination (MMSE) and Alzheimer's Disease Assessment Scale‐Cognitive (ADAS‐cog) test scores from GERAS‐EU and GERAS‐US studies, collected every 6 months over 3 years (7 visits in total). We compared SAITS’ performance with (1) two state‐of‐the‐art deep learning approaches: iTransformer and BRITS and (2) two conventional approaches: mean (averaging data from other visits of the same participant) and last observation carried forward (LOCF). We randomly split the full dataset into 80:20 training:test datasets. Deep learning models were optimized on the training set using 5‐fold cross‐validation and evaluated on the test set with 10% of values randomly masked/removed. Model performance was evaluated using mean absolute error (MAE) across 100 trials. Result Data from 1336 GERAS‐EU participants (age: 72.0±11.6, 61.3% female) and 563 GERAS‐US participants (age: 71.0±7.8, 52.7% female) were analyzed. SAITS performed best in most imputation tasks (Tables 1 and 2), achieving 1.636 (GERAS‐EU) and 1.628 (GERAS‐US) MAEs when imputing ADAS‐cog total scores and 0.567 (GERAS‐EU) and 1.178 (GERAS‐US) MAEs when imputing MMSE total scores. Conclusion Deep learning techniques demonstrates great potential in imputing missed longitudinal cognitive data in AD. Future work will evaluate their effects on down‐stream tasks such as predicting cognitive decline.

Journal Article

Share this book

Add to My Shelf

Constructing the Estimates: Computational Methods in Burden of Foodborne Diseases

by Devleesschauwer, B , Fernandez, K , Vaes, L in Availability , Bayesian analysis , Data quality

2025

Abstract In the first iteration of the global burden of foodborne diseases, various approaches for imputing missing incidence data at country level were tested and evaluated. In this iteration estimates were presented at subregional level without a time trend. For the second iteration we aimed to have country and year estimates. Literature searches typically do not provide epidemiological parameter estimates for every year-country combination. Therefore, extrapolation or imputation models are essential to construct a complete dataset of epidemiological parameters across space and time. To ensure parsimony and transparency, we adopted a hierarchical meta-regression model as the default method. The default model estimates epidemiological parameters across spatial and temporal dimensions while allowing for adjustments based on additional study-level covariates. The models were implemented in a Bayesian framework to properly account for uncertainty arising from the estimation process. After fitting the hierarchical meta-regression model to the available data, posterior predictive distributions were used to impute incidence values for countries and years with missing data. Countries deemed free from exposure through the food chain were excluded from the imputation model and assigned a zero incidence and mortality. After imputation, the responsible taskforce and experts reassessed the data and estimates critically to ensure reliable results. Following we assembled all pieces into the disease model. We estimated the incidence, mortality, YLD, YLL and DALYs for each country-year. Due to the varying complexities across hazards, including differences in data availability, biological plausibility, and study quality, we tailored several elements of the modelling process. This included adapting the default model structure and parameters, adjusting for specific control variables, reviewing the studies selected for inclusion and estimating some countries by the global level.

Journal Article

Share this book

Add to My Shelf

Deleting Unreported Innovation

by Reeb, David M. , Sojli, Elvira , Wang, Wendun in Companies , Counterfactuals , Dummy

2022

The absence of observable innovation data for a firm often leads us to exclude or classify these firms as non-innovators. We assess the reliability of six methods for dealing with unreported innovation using several different counterfactuals for firms without reported R&D or patents. These tests reveal that excluding firms without observable innovation or imputing them as zero innovators and including a dummy variable can lead to biased parameter estimates for observed innovation and other explanatory variables. Excluding firms without patents is especially problematic, leading to false-positive results in empirical tests. Our tests suggest using multiple imputation to handle unreported innovation.

Journal Article

Share this book

Add to My Shelf

Medial Temporal Lobe Flexibility as an Early Marker of Alzheimer’s Risk in African Americans with ABCA7‐80 Variant: Application to Multivariate Imputation by Chained Equations using Random Forest

by Saghafi, Abolfazl , Budak, Miray , Moallemian, Soodeh in African Americans , Aging , Alzheimer's disease

2025

Background Alzheimer’s disease (AD) is characterized by progressive neurodegeneration and cognitive decline. Medial Temporal Lobe (MTL) Flexibility, measured from dynamic functional connectivity in resting‐state fMRI, may serve as a biomarker for Mild Cognitive Impairment (MCI) and AD. The ABCA7‐80 variant is associated with increased dementia risk, particularly among African Americans; however, few studies have examined its relationship with MTL Flexibility. Furthermore, missing data remains a pervasive challenge in AD research, often driven by participant burden and health factors. This study investigates the association between ABCA7‐80 and MTL Flexibility, using Multivariate Imputation by Chained Equations Forest (MICEforest) to address high amount of missing data. Method 656 participants were included, enrolled in the Pathways to Healthy Aging in African Americans study. Participants underwent blood draws, MRI scans, and Montreal Cognitive Assessment (MoCA). We first performed partial correlation between ABCA7‐80 gene and MTL Flexibility, adjusting for age, sex, education on the 224 participants (Meanage = 69.7 ± 7.2 years) with MTL flexibility score (pairwise deletion technique). Then a linear regression model was fitted on the data to predict MTL flexibility using ABAC7‐80. Same analyses were used to test the data after MICEforest imputation on the full sample (n = 656; Meanage = 69.5 ± 7.4 years). Result In the pairwise deleted dataset, almost significant association was found between MTL Flexibility and ABCA7‐80 (r = ‐0.141, p > 0.058) after adjusting for age, sex, education and MoCA score. After imputing the missing data, ABCA7‐80 high‐risk allele carriers showed significantly lower MTL Flexibility (r = ‐0.082, p < 0.05). Our regression analysis on the imputed data reveals that ABCA7‐80 can predict the MTL flexibility after controlling for age, sex, education and cognition (R² = 0.026, p = 0.003). Conclusion Our findings reinforce the role of ABCA7‐80 as a genetic risk factor for AD and suggest reduced MTL Flexibility as an early biomarker of vulnerability. Notably, MICEforest imputation empowered our analyses revealing statistically significant correlation and a linear regression model between MTL Flexibility and ABCA7‐80 gene. This highlights the importance of robust imputation strategies for maximizing the utility of neuroimaging data in aging and dementia research.

Journal Article

Share this book

Add to My Shelf

Predicting Race And Ethnicity To Ensure Equitable Algorithms For Health Care Decision Making

by Damberg, Cheryl L , Martino, Steven C , Cabreros, Irineo in Algorithms , Artificial intelligence , Bayesian analysis

2022

Algorithms are currently used to assist in a wide array of health care decisions. Despite the general utility of these health care algorithms, there is growing recognition that they may lead to unintended racially discriminatory practices, raising concerns about the potential for algorithmic bias. An intuitive precaution against such bias is to remove race and ethnicity information as an input to health care algorithms, mimicking the idea of \"race-blind\" decisions. However, we argue that this approach is misguided. Knowledge, not ignorance, of race and ethnicity is necessary to combat algorithmic bias. When race and ethnicity are observed, many methodological approaches can be used to enforce equitable algorithmic performance. When race and ethnicity information is unavailable, which is often the case, imputing them can expand opportunities to not only identify and assess algorithmic bias but also combat it in both clinical and nonclinical settings. A valid imputation method, such as Bayesian Improved surname Geocoding, can be applied to standard data collected by public and private payers and provider entities. We describe two applications in which imputation of race and ethnicity can help mitigate potential algorithmic biases: equitable disease screening algorithms using machine learning and equitable pay-for-performance incentives.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter