Catalogue Search | MBRL

Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework

by Hogan, Joseph W. , Goetghebeur, Els , Lee, Katherine J. in Adult , ALSPAC , Child

2021

Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. Importantly, the lack of transparency around methodological decisions is threatening the validity and reproducibility of modern research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal Study of Parents and Children. The framework consists of three steps: 1) Develop an analysis plan specifying the analysis model and how missing data are going to be addressed. An important consideration is whether a complete records’ analysis is likely to be valid, whether multiple imputation or an alternative approach is likely to offer benefits and whether a sensitivity analysis regarding the missingness mechanism is required; 2) Examine the data, checking the methods outlined in the analysis plan are appropriate, and conduct the preplanned analysis; and 3) Report the results, including a description of the missing data, details on how the missing data were addressed, and the results from all analyses, interpreted in light of the missing data and the clinical relevance. This framework seeks to support researchers in thinking systematically about missing data and transparently reporting the potential effect on the study results, therefore increasing the confidence in and reproducibility of research findings. •Missing data are ubiquitous in medical research.•Guidance is available, but missing data are still often not handled appropriately.•We present a framework for handling and reporting analyses of incomplete data.•This framework encourages researchers to think systematically about missing data.•Adoption of this framework will increase the reproducibility of research findings.•This article provides a much needed framework for handling and reporting the analysis of incomplete data in observational studies.•The framework puts a strong emphasis on preplanning the statistical analysis and encourages transparency when reporting the results of a study.•Adoption of this framework will increase the confidence in and reproducibility of research findings.

Journal Article

Share this book

Add to My Shelf

Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model

by Bartlett, Jonathan W , Carpenter, James R , White, Ian R in Clinical research , Epidemiology , Interaction terms

2015

Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.

Journal Article

Share this book

Add to My Shelf

Ethnic Differences in the Prevalence of Type 2 Diabetes Diagnoses in the UK: Cross-Sectional Analysis of the Health Improvement Network Primary Care Database

by Pham, Tra My , Morris, Tim P , Carpenter, James R in Clinical medicine , Codes , Cross-sectional studies

2019

Type 2 diabetes mellitus is associated with high levels of disease burden, including increased mortality risk and significant long-term morbidity. The prevalence of diabetes differs substantially among ethnic groups. We examined the prevalence of type 2 diabetes diagnoses in the UK primary care setting. We analysed data from 404,318 individuals in The Health Improvement Network database, aged 0-99 years and permanently registered with general practices in London. The association between ethnicity and the prevalence of type 2 diabetes diagnoses in 2013 was estimated using a logistic regression model, adjusting for effect of age group, sex, and social deprivation. A multiple imputation approach utilising population-level information about ethnicity from the UK census was used for imputing missing data. Compared with those of White ethnicity (5.04%, 95% CI 4.95 to 5.13), the crude percentage prevalence of type 2 diabetes was higher in the Asian (7.69%, 95% CI 7.46 to 7.92) and Black (5.58%, 95% CI 5.35 to 5.81) ethnic groups, while lower in the Mixed/Other group (3.42%, 95% CI 3.19 to 3.66). After adjusting for differences in age group, sex, and social deprivation, all minority ethnic groups were more likely to have a diagnosis of type 2 diabetes compared with the White group (OR Asian versus White 2.36, 95% CI 2.26 to 2.47; OR Black versus White 1.65, 95% CI 1.56 to 1.73; OR Mixed/Other versus White 1.17, 95% CI 1.08 to 1.27). The prevalence of type 2 diabetes was higher in the Asian and Black ethnic groups, compared with the White group. Accurate estimates of ethnic prevalence of type 2 diabetes based on large datasets are important for facilitating appropriate allocation of public health resources, and for allowing population-level research to be undertaken examining disease trajectories among minority ethnic groups, that might help reduce inequalities.

Journal Article

Share this book

Add to My Shelf

Unleashing the full potential of digital outcome measures in clinical trials: eight questions that need attention

by Villar, Sofía S. , Carpenter, James R. , Tackney, Mia S. in Biomarkers , Biomedicine , Biosensors

2024

The use of digital health technologies to measure outcomes in clinical trials opens new opportunities as well as methodological challenges. Digital outcome measures may provide more sensitive and higher-frequency measurements but pose vital statistical challenges around how such outcomes should be defined and validated and how trials incorporating digital outcome measures should be designed and analysed. This article presents eight methodological questions, exploring issues such as the length of measurement period, choice of summary statistic and definition and handling of missing data as well as the potential for new estimands and new analyses to leverage the time series data from digital devices. The impact of key issues highlighted by the eight questions on a primary analysis of a trial are illustrated through a simulation study based on the 2019 Bellerophon INOPulse trial which had time spent in MVPA as a digital outcome measure. These eight questions present broad areas where methodological guidance is needed to enable wider uptake of digital outcome measures in trials.

Journal Article

Share this book

Add to My Shelf

Meta-analytical methods to identify who benefits most from treatments: daft, deluded, or deft approach?

by Freeman, Suzanne C , Tierney, Jayne F , Morris, Tim P in Caregivers , Humans , Meta-analysis

2017

Identifying which individuals benefit most from particular treatments or other interventions underpins so-called personalised or stratified medicine. However, single trials are typically underpowered for exploring whether participant characteristics, such as age or disease severity, determine an individual’s response to treatment. A meta-analysis of multiple trials, particularly one where individual participant data (IPD) are available, provides greater power to investigate interactions between participant characteristics (covariates) and treatment effects. We use a published IPD meta-analysis to illustrate three broad approaches used for testing such interactions. Based on another systematic review of recently published IPD meta-analyses, we also show that all three approaches can be applied to aggregate data as well as IPD. We also summarise which methods of analysing and presenting interactions are in current use, and describe their advantages and disadvantages. We recommend that testing for interactions using within-trials information alone (the deft approach) becomes standard practice, alongside graphical presentation that directly visualises this.

Journal Article

Share this book

Add to My Shelf

A four-step strategy for handling missing outcome data in randomised trials affected by a pandemic

by Kahan, Brennan C. , Cro, Suzie , Cornelius, Victoria R. in Betacoronavirus - physiology , Clinical trials , Comorbidity

2020

Background The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness. International drug trial guidelines recommend trialists review plans for handling missing data in the conduct and statistical analysis, but clear recommendations are lacking. Methods We present a four-step strategy for handling missing outcome data in the analysis of randomised trials that are ongoing during a pandemic. We consider handling missing data arising due to (i) participant infection, (ii) treatment disruptions and (iii) loss to follow-up. We consider both settings where treatment effects for a ‘pandemic-free world’ and ‘world including a pandemic’ are of interest. Results In any trial, investigators should; (1) Clarify the treatment estimand of interest with respect to the occurrence of the pandemic; (2) Establish what data are missing for the chosen estimand; (3) Perform primary analysis under the most plausible missing data assumptions followed by; (4) Sensitivity analysis under alternative plausible assumptions. To obtain an estimate of the treatment effect in a ‘pandemic-free world’, participant data that are clinically affected by the pandemic (directly due to infection or indirectly via treatment disruptions) are not relevant and can be set to missing. For primary analysis, a missing-at-random assumption that conditions on all observed data that are expected to be associated with both the outcome and missingness may be most plausible. For the treatment effect in the ‘world including a pandemic’, all participant data is relevant and should be included in the analysis. For primary analysis, a missing-at-random assumption – potentially incorporating a pandemic time-period indicator and participant infection status – or a missing-not-at-random assumption with a poorer response may be most relevant, depending on the setting. In all scenarios, sensitivity analysis under credible missing-not-at-random assumptions should be used to evaluate the robustness of results. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses. Conclusions Missing data problems will be exacerbated for trials active during the Covid-19 pandemic. This four-step strategy will facilitate clear thinking about the appropriate analysis for relevant questions of interest.

Journal Article

Share this book

Add to My Shelf

Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis

by Groenwold, Rolf H.H. , Carpenter, James R. , Altman, Douglas G. in Analysis , Analysis of covariance , Bias

2012

Journal Article

Share this book

Add to My Shelf

Appropriate inclusion of interactions was needed to avoid bias in multiple imputation

by Spratt, Michael , Tilling, Kate , Williamson, Elizabeth J. in Archives & records , Bias , Child development

2016

Missing data are a pervasive problem, often leading to bias in complete records analysis (CRA). Multiple imputation (MI) via chained equations is one solution, but its use in the presence of interactions is not straightforward. We simulated data with outcome Y dependent on binary explanatory variables X and Z and their interaction XZ. Six scenarios were simulated (Y continuous and binary, each with no interaction, a weak and a strong interaction), under five missing data mechanisms. We use directed acyclic graphs to identify when CRA and MI would each be unbiased. We evaluate the performance of CRA, MI without interactions, MI including all interactions, and stratified imputation. We also illustrated these methods using a simple example from the National Child Development Study (NCDS). MI excluding interactions is invalid and resulted in biased estimates and low coverage. When XZ was zero, MI excluding interactions gave unbiased estimates but overcoverage. MI including interactions and stratified MI gave equivalent, valid inference in all cases. In the NCDS example, MI excluding interactions incorrectly concluded there was no evidence for an important interaction. Epidemiologists carrying out MI should ensure that their imputation model(s) are compatible with their analysis model.

Journal Article

Share this book

Add to My Shelf

Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random

by Cornish, Rosie P. , Tilling, Kate , Curnow, Elinor in Algorithms , ALSPAC , Analysis

2024

Background Epidemiological and clinical studies often have missing data, frequently analysed using multiple imputation (MI). In general, MI estimates will be biased if data are missing not at random (MNAR). Bias due to data MNAR can be reduced by including other variables (“auxiliary variables”) in imputation models, in addition to those required for the substantive analysis. Common advice is to take an inclusive approach to auxiliary variable selection (i.e. include all variables thought to be predictive of missingness and/or the missing values). There are no clear guidelines about the impact of this strategy when data may be MNAR. Methods We explore the impact of including an auxiliary variable predictive of missingness but, in truth, unrelated to the partially observed variable, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of the additional bias of the MI estimator for the exposure coefficient (fitting either a linear or logistic regression model), when the (continuous or binary) partially observed variable is either the analysis outcome or the exposure. Here, “additional bias” refers to the difference in magnitude of the MI estimator when the imputation model includes (i) the auxiliary variable and the other analysis model variables; (ii) just the other analysis model variables, noting that both will be biased due to data MNAR. We illustrate the extent of this additional bias by re-analysing data from a birth cohort study. Results The additional bias can be relatively large when the outcome is partially observed and missingness is caused by the outcome itself, and even larger if missingness is caused by both the outcome and the exposure (when either the outcome or exposure is partially observed). Conclusions When using MI, the naïve and commonly used strategy of including all available auxiliary variables should be avoided. We recommend including the variables most predictive of the partially observed variable as auxiliary variables, where these can be identified through consideration of the plausible casual diagrams and missingness mechanisms, as well as data exploration (noting that associations with the partially observed variable in the complete records may be distorted due to selection bias).

Journal Article

Share this book

Add to My Shelf

Variation in colon cancer survival for patients living and receiving care in London, 2006–2013: does where you live matter?

by Quaresma, Manuela , Carpenter, James R , Turculet, Adrian in Algorithms , Bayes Theorem , Bayesian analysis

2022

BackgroundMarked geographical disparities in survival from colon cancer have been consistently described in England. Similar patterns have been observed within London, almost mimicking a microcosm of the country’s survival patterns. This evidence has suggested that the area of residence plays an important role in the survival from cancer.MethodsWe analysed the survival from colon cancer of patients diagnosed in 2006–2013, in a pre-pandemic period, living in London at their diagnosis and received care in a London hospital. We examined the patterns of patient pathways between the area of residence and the hospital of care using flow maps, and we investigated whether geographical variations in survival from colon cancer are associated with the hospital of care. To estimate survival, we applied a Bayesian excess hazard model which accounts for the hierarchical structure of the data.ResultsGeographical disparities in colon cancer survival disappeared once controlled for hospitals, and the disparities seemed to be augmented between hospitals. However, close examination of patient pathways revealed that the poorer survival observed in some hospitals was mostly associated with higher proportions of emergency diagnosis, while their performance was generally as expected for patients diagnosed through non-emergency routes.DiscussionThis study highlights the need to better coordinate primary and secondary care sectors in some areas of London to improve timely access to specialised clinicians and diagnostic tests. This challenge remains crucially relevant after the recent successive regroupings of Clinical Commissioning Groups (which grouped struggling areas together) and the observed exacerbation of disparities during the COVID-19 pandemic.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter