Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
689
result(s) for
"Internal data validation"
Sort by:
Internal validation of self-reported case numbers in hospital quality reports: preparing secondary data for health services research
by
Ji, Limei
,
Geraedts, Max
,
de Cruppé, Werner
in
Committees
,
Compliance
,
Cross-field validation
2024
Background
Health services research often relies on secondary data, necessitating quality checks for completeness, validity, and potential errors before use. Various methods address implausible data, including data elimination, statistical estimation, or value substitution from the same or another dataset. This study presents an internal validation process of a secondary dataset used to investigate hospital compliance with minimum caseload requirements (MCR) in Germany. The secondary data source validated is the German Hospital Quality Reports (GHQR), an official dataset containing structured self-reported data from all hospitals in Germany.
Methods
This study conducted an internal cross-field validation of MCR-related data in GHQR from 2016 to 2021. The validation process checked the validity of reported MCR caseloads, including data availability and consistency, by comparing the stated MCR caseload with further variables in the GHQR. Subsequently, implausible MCR caseload values were corrected using the most plausible values given in the same GHQR. The study also analysed the error sources and used reimbursement-related Diagnosis Related Groups Statistic data to assess the validation outcomes.
Results
The analysis focused on four MCR procedures. 11.8–27.7% of the total MCR caseload values in the GHQR appeared ambiguous, and 7.9–23.7% were corrected. The correction added 0.7–3.7% of cases not previously stated as MCR caseloads and added 1.5–26.1% of hospital sites as MCR performing hospitals not previously stated in the GHQR. The main error source was this non-reporting of MCR caseloads, especially by hospitals with low case numbers. The basic plausibility control implemented by the Federal Joint Committee since 2018 has improved the MCR-related data quality over time.
Conclusions
This study employed a comprehensive approach to dataset internal validation that encompassed: (1) hospital association level data, (2) hospital site level data and (3) medical department level data, (4) report data spanning six years, and (5) logical plausibility checks. To ensure data completeness, we selected the most plausible values without eliminating incomplete or implausible data. For future practice, we recommend a validation process when using GHQR as a data source for MCR-related research. Additionally, an adapted plausibility control could help to improve the quality of MCR documentation.
Journal Article
Evaluation of clinical prediction models (part 2): how to undertake an external validation study
by
Collins, Gary S
,
Ensor, Joie
,
Snell, Kym I E
in
Artificial intelligence
,
Calibration
,
Clinical medicine
2024
External validation studies are an important but often neglected part of prediction model research. In this article, the second in a series on model evaluation, Riley and colleagues explain what an external validation study entails and describe the key steps involved, from establishing a high quality dataset to evaluating a model’s predictive performance and clinical usefulness.
Journal Article
Model selection using information criteria, but is the \best\ model any good?
by
Thomson, James R.
,
Duncan, Richard P.
,
Mac Nally, Ralph
in
Adequacy
,
applied ecology
,
COMMENTARY
2018
1. Information criteria (ICs) are used widely for data summary and model building in ecology, especially in applied ecology and wildlife management. Although ICs are useful for distinguishing among rival candidate models, ICs do not necessarily indicate whether the \"best\" model (or a model-averaged version) is a good representation of the data or whether the model has useful \"explanatory\" or \"predictive\" ability. 2. As editors and reviewers, we have seen many submissions that did not evaluate whether the nominal \"best\" model(s) found using IC is a useful model in the above sense. 3. We scrutinized six leading ecological journals for papers that used IC to models. More than half of papers using IC for model comparison did not evaluate the adequacy of the best model(s) in either \"explaining\" or \"prdicting\" the data. 4. Synthesis and applications. Authors need to evaluate the adequacy of the model identified as the \"best\" model by using information criteria methods to provide convincing evidence to readers and users that inferences from the best models are useful and reliable.
Journal Article
A new framework to enhance the interpretation of external validation studies of clinical prediction models
by
Nieboer, Daan
,
Debray, Thomas P.A.
,
Steyerberg, Ewout W.
in
Case mix
,
Data Interpretation, Statistical
,
Epidemiology
2015
It is widely acknowledged that the performance of diagnostic and prognostic prediction models should be assessed in external validation studies with independent data from “different but related” samples as compared with that of the development sample. We developed a framework of methodological steps and statistical methods for analyzing and enhancing the interpretation of results from external validation studies of prediction models.
We propose to quantify the degree of relatedness between development and validation samples on a scale ranging from reproducibility to transportability by evaluating their corresponding case-mix differences. We subsequently assess the models' performance in the validation sample and interpret the performance in view of the case-mix differences. Finally, we may adjust the model to the validation setting.
We illustrate this three-step framework with a prediction model for diagnosing deep venous thrombosis using three validation samples with varying case mix. While one external validation sample merely assessed the model's reproducibility, two other samples rather assessed model transportability. The performance in all validation samples was adequate, and the model did not require extensive updating to correct for miscalibration or poor fit to the validation settings.
The proposed framework enhances the interpretation of findings at external validation of prediction models.
Journal Article
External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges
by
Altman, Doug G
,
Debray, Thomas P A
,
Collins, Gary S
in
Calibration
,
Clinical medicine
,
Consortia
2016
Access to big datasets from e-health records and individual participant data (IPD) meta-analysis is signalling a new advent of external validation studies for clinical prediction models. In this article, the authors illustrate novel opportunities for external validation in big, combined datasets, while drawing attention to methodological challenges and reporting issues.
Journal Article
Early warning scores for detecting deterioration in adult hospital patients: systematic review and critical appraisal of methodology
2020
AbstractObjectiveTo provide an overview and critical appraisal of early warning scores for adult hospital patients.DesignSystematic review.Data sourcesMedline, CINAHL, PsycInfo, and Embase until June 2019.Eligibility criteria for study selectionStudies describing the development or external validation of an early warning score for adult hospital inpatients.Results13 171 references were screened and 95 articles were included in the review. 11 studies were development only, 23 were development and external validation, and 61 were external validation only. Most early warning scores were developed for use in the United States (n=13/34, 38%) and the United Kingdom (n=10/34, 29%). Death was the most frequent prediction outcome for development studies (n=10/23, 44%) and validation studies (n=66/84, 79%), with different time horizons (the most frequent was 24 hours). The most common predictors were respiratory rate (n=30/34, 88%), heart rate (n=28/34, 83%), oxygen saturation, temperature, and systolic blood pressure (all n=24/34, 71%). Age (n=13/34, 38%) and sex (n=3/34, 9%) were less frequently included. Key details of the analysis populations were often not reported in development studies (n=12/29, 41%) or validation studies (n=33/84, 39%). Small sample sizes and insufficient numbers of event patients were common in model development and external validation studies. Missing data were often discarded, with just one study using multiple imputation. Only nine of the early warning scores that were developed were presented in sufficient detail to allow individualised risk prediction. Internal validation was carried out in 19 studies, but recommended approaches such as bootstrapping or cross validation were rarely used (n=4/19, 22%). Model performance was frequently assessed using discrimination (development n=18/22, 82%; validation n=69/84, 82%), while calibration was seldom assessed (validation n=13/84, 15%). All included studies were rated at high risk of bias.ConclusionsEarly warning scores are widely used prediction models that are often mandated in daily clinical practice to identify early clinical deterioration in hospital patients. However, many early warning scores in clinical use were found to have methodological weaknesses. Early warning scores might not perform as well as expected and therefore they could have a detrimental effect on patient care. Future work should focus on following recommended approaches for developing and evaluating early warning scores, and investigating the impact and safety of using these scores in clinical practice.Systematic review registrationPROSPERO CRD42017053324.
Journal Article
Global, regional, and national age-sex specific mortality for 264 causes of death, 1980–2016: a systematic analysis for the Global Burden of Disease Study 2016
by
Ruhago, George Mugambage
,
Herteliu, Claudiu
,
Roth, Gregory A
in
Acquired immune deficiency syndrome
,
Adolescent
,
Adult
2017
Monitoring levels and trends in premature mortality is crucial to understanding how societies can address prominent sources of early death. The Global Burden of Disease 2016 Study (GBD 2016) provides a comprehensive assessment of cause-specific mortality for 264 causes in 195 locations from 1980 to 2016. This assessment includes evaluation of the expected epidemiological transition with changes in development and where local patterns deviate from these trends.
We estimated cause-specific deaths and years of life lost (YLLs) by age, sex, geography, and year. YLLs were calculated from the sum of each death multiplied by the standard life expectancy at each age. We used the GBD cause of death database composed of: vital registration (VR) data corrected for under-registration and garbage coding; national and subnational verbal autopsy (VA) studies corrected for garbage coding; and other sources including surveys and surveillance systems for specific causes such as maternal mortality. To facilitate assessment of quality, we reported on the fraction of deaths assigned to GBD Level 1 or Level 2 causes that cannot be underlying causes of death (major garbage codes) by location and year. Based on completeness, garbage coding, cause list detail, and time periods covered, we provided an overall data quality rating for each location with scores ranging from 0 stars (worst) to 5 stars (best). We used robust statistical methods including the Cause of Death Ensemble model (CODEm) to generate estimates for each location, year, age, and sex. We assessed observed and expected levels and trends of cause-specific deaths in relation to the Socio-demographic Index (SDI), a summary indicator derived from measures of average income per capita, educational attainment, and total fertility, with locations grouped into quintiles by SDI. Relative to GBD 2015, we expanded the GBD cause hierarchy by 18 causes of death for GBD 2016.
The quality of available data varied by location. Data quality in 25 countries rated in the highest category (5 stars), while 48, 30, 21, and 44 countries were rated at each of the succeeding data quality levels. Vital registration or verbal autopsy data were not available in 27 countries, resulting in the assignment of a zero value for data quality. Deaths from non-communicable diseases (NCDs) represented 72·3% (95% uncertainty interval [UI] 71·2–73·2) of deaths in 2016 with 19·3% (18·5–20·4) of deaths in that year occurring from communicable, maternal, neonatal, and nutritional (CMNN) diseases and a further 8·43% (8·00–8·67) from injuries. Although age-standardised rates of death from NCDs decreased globally between 2006 and 2016, total numbers of these deaths increased; both numbers and age-standardised rates of death from CMNN causes decreased in the decade 2006–16—age-standardised rates of deaths from injuries decreased but total numbers varied little. In 2016, the three leading global causes of death in children under-5 were lower respiratory infections, neonatal preterm birth complications, and neonatal encephalopathy due to birth asphyxia and trauma, combined resulting in 1·80 million deaths (95% UI 1·59 million to 1·89 million). Between 1990 and 2016, a profound shift toward deaths at older ages occurred with a 178% (95% UI 176–181) increase in deaths in ages 90–94 years and a 210% (208–212) increase in deaths older than age 95 years. The ten leading causes by rates of age-standardised YLL significantly decreased from 2006 to 2016 (median annualised rate of change was a decrease of 2·89%); the median annualised rate of change for all other causes was lower (a decrease of 1·59%) during the same interval. Globally, the five leading causes of total YLLs in 2016 were cardiovascular diseases; diarrhoea, lower respiratory infections, and other common infectious diseases; neoplasms; neonatal disorders; and HIV/AIDS and tuberculosis. At a finer level of disaggregation within cause groupings, the ten leading causes of total YLLs in 2016 were ischaemic heart disease, cerebrovascular disease, lower respiratory infections, diarrhoeal diseases, road injuries, malaria, neonatal preterm birth complications, HIV/AIDS, chronic obstructive pulmonary disease, and neonatal encephalopathy due to birth asphyxia and trauma. Ischaemic heart disease was the leading cause of total YLLs in 113 countries for men and 97 countries for women. Comparisons of observed levels of YLLs by countries, relative to the level of YLLs expected on the basis of SDI alone, highlighted distinct regional patterns including the greater than expected level of YLLs from malaria and from HIV/AIDS across sub-Saharan Africa; diabetes mellitus, especially in Oceania; interpersonal violence, notably within Latin America and the Caribbean; and cardiomyopathy and myocarditis, particularly in eastern and central Europe. The level of YLLs from ischaemic heart disease was less than expected in 117 of 195 locations. Other leading causes of YLLs for which YLLs were notably lower than expected included neonatal preterm birth complications in many locations in both south Asia and southeast Asia, and cerebrovascular disease in western Europe.
The past 37 years have featured declining rates of communicable, maternal, neonatal, and nutritional diseases across all quintiles of SDI, with faster than expected gains for many locations relative to their SDI. A global shift towards deaths at older ages suggests success in reducing many causes of early death. YLLs have increased globally for causes such as diabetes mellitus or some neoplasms, and in some locations for causes such as drug use disorders, and conflict and terrorism. Increasing levels of YLLs might reflect outcomes from conditions that required high levels of care but for which effective treatments remain elusive, potentially increasing costs to health systems.
Bill & Melinda Gates Foundation.
Journal Article
Validation and impact of algorithms for identifying variables in observational studies of routinely collected data
2024
Among observational studies of routinely collected health data (RCD) for exploring treatment effects, algorithms are used to identify study variables. However, the extent to which algorithms are reliable and impact the credibility of effect estimates is far from clear. This study aimed to investigate the validation of algorithms for identifying study variables from RCD, and examine the impact of alternative algorithms on treatment effects.
We searched PubMed for observational studies published in 2018 that used RCD to explore drug treatment effects. Information regarding the reporting, validation, and interpretation of algorithms was extracted. We summarized the reporting and methodological characteristics of algorithms and validation. We also assessed the divergence in effect estimates given alternative algorithms by calculating the ratio of estimates of the primary vs. alternative analyses.
A total of 222 studies were included, of which 93 (41.9%) provided a complete list of algorithms for identifying participants, 36 (16.2%) for exposure, and 132 (59.5%) for outcomes, and 15 (6.8%) for all study variables including population, exposure, and outcomes. Fifty-nine (26.6%) studies stated that the algorithms were validated, and 54 (24.3%) studies reported methodological characteristics of 66 validations, among which 61 validations in 49 studies were from the cross-referenced validation studies. Of those 66 validations, 22 (33.3%) reported sensitivity and 16 (24.2%) reported specificity. A total of 63.6% of studies reporting sensitivity and 56.3% reporting specificity used test-result-based sampling, an approach that potentially biases effect estimates. Twenty-eight (12.6%) studies used alternative algorithms to identify study variables, and 24 reported the effects estimated by primary analyses and sensitivity analyses. Of these, 20% had differential effect estimates when using alternative algorithms for identifying population, 18.2% for identifying exposure, and 45.5% for classifying outcomes. Only 32 (14.4%) studies discussed how the algorithms may affect treatment estimates.
In observational studies of RCD, the algorithms for variable identification were not regularly validated, and–even if validated–the methodological approach and performance of the validation were often poor. More seriously, different algorithms may yield differential treatment effects, but their impact is often ignored by researchers. Strong efforts, including recommendations, are warranted to improve good practice.
Journal Article
Identification of homelessness using health administrative data in Ontario, Canada following a national coding mandate: a validation study
2024
Conducting longitudinal health research about people experiencing homelessness poses unique challenges. Identification through administrative data permits large, cost-effective studies; however, case validity in Ontario is unknown after a 2018 Canada-wide policy change mandating homelessness coding in hospital databases. We validated case definitions for identifying homelessness using Ontario health administrative databases after introduction of this coding mandate.
We assessed 42 case definitions in a representative sample of people experiencing homelessness in Toronto (n = 640) from whom longitudinal housing history (ranging from 2018 to 2022) was obtained, and a randomly selected sample of presumably housed people (n = 128,000) in Toronto. We evaluated sensitivity, specificity, positive and negative predictive values, and positive likelihood ratios to select an optimal definition, and compared the resulting true positives against false positives and false negatives to identify potential causes of misclassification.
The optimal case definition included any homelessness indicator during a hospital-based encounter within 180 days of a period of homelessness (sensitivity = 52.9%; specificity = 99.5%). For periods of homelessness with ≥1 hospital-based healthcare encounter, the optimal case definition had greatly improved sensitivity (75.1%) while retaining excellent specificity (98.5%). Review of false positives suggested that homeless status is sometimes erroneously carried forward in healthcare databases after an individual transitioned out of homelessness.
Case definitions to identify homelessness using Ontario health administrative data exhibit moderate to good sensitivity and excellent specificity. Sensitivity has more than doubled since the implementation of a national coding mandate. Mandatory collection and reporting of homelessness information within administrative data present invaluable opportunities for advancing research on the health and healthcare needs of people experiencing homelessness.
•Homelessness status in Canadian healthcare data historically had low sensitivity.•Beginning April 2018, Canadian hospitals must code homelessness where on the chart.•Case definitions now have moderate to good sensitivity, doubling from before 2018.•Future potential improvements include updating status after homelessness ends.•Canadian healthcare data are an important resource for studies about this population.
Journal Article
Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010
by
Aggarwal, Rakesh
,
Bolliger, Ian
,
Schwebel, David C
in
accidents
,
Acquired immune deficiency syndrome
,
Adolescent
2012
Reliable and timely information on the leading causes of death in populations, and how these are changing, is a crucial input into health policy debates. In the Global Burden of Diseases, Injuries, and Risk Factors Study 2010 (GBD 2010), we aimed to estimate annual deaths for the world and 21 regions between 1980 and 2010 for 235 causes, with uncertainty intervals (UIs), separately by age and sex.
We attempted to identify all available data on causes of death for 187 countries from 1980 to 2010 from vital registration, verbal autopsy, mortality surveillance, censuses, surveys, hospitals, police records, and mortuaries. We assessed data quality for completeness, diagnostic accuracy, missing data, stochastic variations, and probable causes of death. We applied six different modelling strategies to estimate cause-specific mortality trends depending on the strength of the data. For 133 causes and three special aggregates we used the Cause of Death Ensemble model (CODEm) approach, which uses four families of statistical models testing a large set of different models using different permutations of covariates. Model ensembles were developed from these component models. We assessed model performance with rigorous out-of-sample testing of prediction error and the validity of 95% UIs. For 13 causes with low observed numbers of deaths, we developed negative binomial models with plausible covariates. For 27 causes for which death is rare, we modelled the higher level cause in the cause hierarchy of the GBD 2010 and then allocated deaths across component causes proportionately, estimated from all available data in the database. For selected causes (African trypanosomiasis, congenital syphilis, whooping cough, measles, typhoid and parathyroid, leishmaniasis, acute hepatitis E, and HIV/AIDS), we used natural history models based on information on incidence, prevalence, and case-fatality. We separately estimated cause fractions by aetiology for diarrhoea, lower respiratory infections, and meningitis, as well as disaggregations by subcause for chronic kidney disease, maternal disorders, cirrhosis, and liver cancer. For deaths due to collective violence and natural disasters, we used mortality shock regressions. For every cause, we estimated 95% UIs that captured both parameter estimation uncertainty and uncertainty due to model specification where CODEm was used. We constrained cause-specific fractions within every age-sex group to sum to total mortality based on draws from the uncertainty distributions.
In 2010, there were 52·8 million deaths globally. At the most aggregate level, communicable, maternal, neonatal, and nutritional causes were 24·9% of deaths worldwide in 2010, down from 15·9 million (34·1%) of 46·5 million in 1990. This decrease was largely due to decreases in mortality from diarrhoeal disease (from 2·5 to 1·4 million), lower respiratory infections (from 3·4 to 2·8 million), neonatal disorders (from 3·1 to 2·2 million), measles (from 0·63 to 0·13 million), and tetanus (from 0·27 to 0·06 million). Deaths from HIV/AIDS increased from 0·30 million in 1990 to 1·5 million in 2010, reaching a peak of 1·7 million in 2006. Malaria mortality also rose by an estimated 19·9% since 1990 to 1·17 million deaths in 2010. Tuberculosis killed 1·2 million people in 2010. Deaths from non-communicable diseases rose by just under 8 million between 1990 and 2010, accounting for two of every three deaths (34·5 million) worldwide by 2010. 8 million people died from cancer in 2010, 38% more than two decades ago; of these, 1·5 million (19%) were from trachea, bronchus, and lung cancer. Ischaemic heart disease and stroke collectively killed 12·9 million people in 2010, or one in four deaths worldwide, compared with one in five in 1990; 1·3 million deaths were due to diabetes, twice as many as in 1990. The fraction of global deaths due to injuries (5·1 million deaths) was marginally higher in 2010 (9·6%) compared with two decades earlier (8·8%). This was driven by a 46% rise in deaths worldwide due to road traffic accidents (1·3 million in 2010) and a rise in deaths from falls. Ischaemic heart disease, stroke, chronic obstructive pulmonary disease (COPD), lower respiratory infections, lung cancer, and HIV/AIDS were the leading causes of death in 2010. Ischaemic heart disease, lower respiratory infections, stroke, diarrhoeal disease, malaria, and HIV/AIDS were the leading causes of years of life lost due to premature mortality (YLLs) in 2010, similar to what was estimated for 1990, except for HIV/AIDS and preterm birth complications. YLLs from lower respiratory infections and diarrhoea decreased by 45–54% since 1990; ischaemic heart disease and stroke YLLs increased by 17–28%. Regional variations in leading causes of death were substantial. Communicable, maternal, neonatal, and nutritional causes still accounted for 76% of premature mortality in sub-Saharan Africa in 2010. Age standardised death rates from some key disorders rose (HIV/AIDS, Alzheimer's disease, diabetes mellitus, and chronic kidney disease in particular), but for most diseases, death rates fell in the past two decades; including major vascular diseases, COPD, most forms of cancer, liver cirrhosis, and maternal disorders. For other conditions, notably malaria, prostate cancer, and injuries, little change was noted.
Population growth, increased average age of the world's population, and largely decreasing age-specific, sex-specific, and cause-specific death rates combine to drive a broad shift from communicable, maternal, neonatal, and nutritional causes towards non-communicable diseases. Nevertheless, communicable, maternal, neonatal, and nutritional causes remain the dominant causes of YLLs in sub-Saharan Africa. Overlaid on this general pattern of the epidemiological transition, marked regional variation exists in many causes, such as interpersonal violence, suicide, liver cancer, diabetes, cirrhosis, Chagas disease, African trypanosomiasis, melanoma, and others. Regional heterogeneity highlights the importance of sound epidemiological assessments of the causes of death on a regular basis.
Bill & Melinda Gates Foundation.
Journal Article