Catalogue Search | MBRL

There is no such thing as a validated prediction model

by van Smeden, Maarten , Van Calster, Ben , Steyerberg, Ewout W. in Biomarkers , Biomedicine , Calibration

2023

Background Clinical prediction models should be validated before implementation in clinical practice. But is favorable performance at internal validation or one external validation sufficient to claim that a prediction model works well in the intended clinical context? Main body We argue to the contrary because (1) patient populations vary, (2) measurement procedures vary, and (3) populations and measurements change over time. Hence, we have to expect heterogeneity in model performance between locations and settings, and across time. It follows that prediction models are never truly validated. This does not imply that validation is not important. Rather, the current focus on developing new models should shift to a focus on more extensive, well-conducted, and well-reported validation studies of promising models. Conclusion Principled validation strategies are needed to understand and quantify heterogeneity, monitor performance over time, and update prediction models when appropriate. Such strategies will help to ensure that prediction models stay up-to-date and safe to support clinical decision-making.

Journal Article

Share this book

Add to My Shelf

Don't be misled: 3 misconceptions about external validation of clinical prediction models

by Dunias, Zoë S. , de Hond, Anne , Kant, Ilse in Artificial intelligence , Clinical algorithm , Clinical prediction model

2024

Clinical prediction models provide risks of health outcomes that can inform patients and support medical decisions. However, most models never make it to actual implementation in practice. A commonly heard reason for this lack of implementation is that prediction models are often not externally validated. While we generally encourage external validation, we argue that an external validation is often neither sufficient nor required as an essential step before implementation. As such, any available external validation should not be perceived as a license for model implementation. We clarify this argument by discussing 3 common misconceptions about external validation. We argue that there is not one type of recommended validation design, not always a necessity for external validation, and sometimes a need for multiple external validations. The insights from this paper can help readers to consider, design, interpret, and appreciate external validation studies.

Journal Article

Share this book

Add to My Shelf

$Model selection using information criteria, but is the \best\ model any good?$

Model selection using information criteria, but is the \best\ model any good?

by Thomson, James R. , Duncan, Richard P. , Mac Nally, Ralph in Adequacy , applied ecology , COMMENTARY

2018

1. Information criteria (ICs) are used widely for data summary and model building in ecology, especially in applied ecology and wildlife management. Although ICs are useful for distinguishing among rival candidate models, ICs do not necessarily indicate whether the \"best\" model (or a model-averaged version) is a good representation of the data or whether the model has useful \"explanatory\" or \"predictive\" ability. 2. As editors and reviewers, we have seen many submissions that did not evaluate whether the nominal \"best\" model(s) found using IC is a useful model in the above sense. 3. We scrutinized six leading ecological journals for papers that used IC to models. More than half of papers using IC for model comparison did not evaluate the adequacy of the best model(s) in either \"explaining\" or \"prdicting\" the data. 4. Synthesis and applications. Authors need to evaluate the adequacy of the model identified as the \"best\" model by using information criteria methods to provide convincing evidence to readers and users that inferences from the best models are useful and reliable.

Journal Article

Share this book

Add to My Shelf

Establishment and verification of a surgical prognostic model for cervical spinal cord injury without radiological abnormality

by Cai, Xuan , Xu, Jia-Wei , Guo, Shuai in Analysis , Care and treatment , Clinical outcomes

2019

Some studies have suggested that early surgical treatment can effectively improve the prognosis of cervical spinal cord injury without radiological abnormality, but no research has focused on the development of a prognostic model of cervical spinal cord injury without radiological abnormality. This retrospective analysis included 43 patients with cervical spinal cord injury without radiological abnormality. Seven potential factors were assessed: age, sex, external force strength causing damage, duration of disease, degree of cervical spinal stenosis, Japanese Orthopaedic Association score, and physiological cervical curvature. A model was established using multiple binary logistic regression analysis. The model was evaluated by concordant profiling and the area under the receiver operating characteristic curve. Bootstrapping was used for internal validation. The prognostic model was as follows: logit(P) = −25.4545 + 21.2576VALUE + 1.2160SCORE − 3.4224TIME, where VALUE refers to the Pavlov ratio indicating the extent of cervical spinal stenosis, SCORE refers to the Japanese Orthopaedic Association score (0-17) after the operation, and TIME refers to the disease duration (from injury to operation). The area under the receiver operating characteristic curve for all patients was 0.8941 (95% confidence interval, 0.7930-0.9952). Three factors assessed in the predictive model were associated with patient outcomes: a great extent of cervical stenosis, a poor preoperative neurological status, and a long disease duration. These three factors could worsen patient outcomes. Moreover, the disease prognosis was considered good when logit(P) ≥ −2.5105. Overall, the model displayed a certain clinical value. This study was approved by the Biomedical Ethics Committee of the Second Affiliated Hospital of Xi'an Jiaotong University, China (approval number: 2018063) on May 8, 2018.

Journal Article

Share this book

Add to My Shelf

Real-Time Clinical Decision Support Based on Recurrent Neural Networks for In-Hospital Acute Kidney Injury: External Validation and Model Interpretation

by Kim, Kipyo , Ryu, Ji-Young , Yang, Hyeonsik in Original Paper

2021

Acute kidney injury (AKI) is commonly encountered in clinical practice and is associated with poor patient outcomes and increased health care costs. Despite it posing significant challenges for clinicians, effective measures for AKI prediction and prevention are lacking. Previously published AKI prediction models mostly have a simple design without external validation. Furthermore, little is known about the process of linking model output and clinical decisions due to the black-box nature of neural network models. We aimed to present an externally validated recurrent neural network (RNN)-based continuous prediction model for in-hospital AKI and show applicable model interpretations in relation to clinical decision support. Study populations were all patients aged 18 years or older who were hospitalized for more than 48 hours between 2013 and 2017 in 2 tertiary hospitals in Korea (Seoul National University Bundang Hospital and Seoul National University Hospital). All demographic data, laboratory values, vital signs, and clinical conditions of patients were obtained from electronic health records of each hospital. We developed 2-stage hierarchical prediction models (model 1 and model 2) using RNN algorithms. The outcome variable for model 1 was the occurrence of AKI within 7 days from the present. Model 2 predicted the future trajectory of creatinine values up to 72 hours. The performance of each developed model was evaluated using the internal and external validation data sets. For the explainability of our models, different model-agnostic interpretation methods were used, including Shapley Additive Explanations, partial dependence plots, individual conditional expectation, and accumulated local effects plots. We included 69,081 patients in the training, 7675 in the internal validation, and 72,352 in the external validation cohorts for model development after excluding cases with missing data and those with an estimated glomerular filtration rate less than 15 mL/min/1.73 m2 or end-stage kidney disease. Model 1 predicted any AKI development with an area under the receiver operating characteristic curve (AUC) of 0.88 (internal validation) and 0.84 (external validation), and stage 2 or higher AKI development with an AUC of 0.93 (internal validation) and 0.90 (external validation). Model 2 predicted the future creatinine values within 3 days with mean-squared errors of 0.04-0.09 for patients with higher risks of AKI and 0.03-0.08 for those with lower risks. Based on the developed models, we showed AKI probability according to feature values in total patients and each individual with partial dependence, accumulated local effects, and individual conditional expectation plots. We also estimated the effects of feature modifications such as nephrotoxic drug discontinuation on future creatinine levels. We developed and externally validated a continuous AKI prediction model using RNN algorithms. Our model could provide real-time assessment of future AKI occurrences and individualized risk factors for AKI in general inpatient cohorts; thus, we suggest approaches to support clinical decisions based on prediction models for in-hospital AKI.

Journal Article

Share this book

Add to My Shelf

Practical guidance for validating the predictive performance in the presence of missing data: a guide for the clinical researcher

by de Grooth, H.J.S. , Cremer, O.L. , De Mul, N. in Bias , Data Interpretation, Statistical , Datasets

2026

Prediction models are widely used across all fields of medicine as tools to support patient counseling and guide treatment decisions. A key step before any prediction model can be implemented in clinical practice is internal validation, for which principles are well described in the literature. However, the application of these principles is challenging when complex models are used or when missing values are present in the predictor variables. Approaches for internal validation and handling of missing data often result in a multitude of datasets, such as multiple bootstrapped samples across multiple imputations. Analyzing such cross-multiplied datasets in a streamlined manner is not straightforward. This paper provides practical guidance and a structured R workflow to support clinical researchers in combining internal validation and imputation methods when building reliable prediction models. •Prediction modeling faces three challenges: regularization, missingness, and validation.•Tackling this involves combining multiple dataset layers, complicating analyses.•We provide practical guidance and a structured R workflow for clinical researchers.•Real-world data illustration shows the practical feasibility of MI-Boot.•This workflow helps the clinical researcher to address the core challenges.

Journal Article

Share this book

Add to My Shelf

On Two Novel Parameters for Validation of Predictive QSAR Models

by Paul, Somnath , Pratim Roy, Partha , Roy, Kunal in Algorithms , Animals , External validation

2009

Validation is a crucial aspect of quantitative structure–activity relationship (QSAR) modeling. The present paper shows that traditionally used validation parameters (leave-one-out Q2 for internal validation and predictive R2 for external validation) may be supplemented with two novel parameters rm2 and Rp2 for a stricter test of validation. The parameter rm2(overall) penalizes a model for large differences between observed and predicted values of the compounds of the whole set (considering both training and test sets) while the parameter Rp2 penalizes model R2 for large differences between determination coefficient of nonrandom model and square of mean correlation coefficient of random models in case of a randomization test. Two other variants of rm2 parameter, rm2(LOO) and rm2(test), penalize a model more strictly than Q2 and R2pred respectively. Three different data sets of moderate to large size have been used to develop multiple models in order to indicate the suitability of the novel parameters in QSAR studies. The results show that in many cases the developed models could satisfy the requirements of conventional parameters (Q2 and R2pred) but fail to achieve the required values for the novel parameters rm2 and Rp2. Moreover, these parameters also help in identifying the best models from among a set of comparable models. Thus, a test for these two parameters is suggested to be a more stringent requirement than the traditional validation parameters to decide acceptability of a predictive QSAR model, especially when a regulatory decision is involved.

Journal Article

Share this book

Add to My Shelf

Empirical simulation of internal validation methods for prediction models: comparing k-fold cross-validation with bootstrap-based optimism correction

by Yan, Ruohua , Peng, Xiaoxia , Liu, Xiaohang in Acute Kidney Injury - epidemiology , Algorithms , Bias

2026

To systematically evaluate the performance of k-fold cross-validation and bootstrap-based optimism correction methods for internal validation of statistical and machine learning models. A total of 239,415 inpatients were extracted from an open access database named Medical Information Mart for Intensive Care IV, of which 39,145 were randomly sampled as a predefined reference dataset. Among the remaining simulation dataset with 200,000 inpatients, training sets with sample sizes ranging from 595 to 5946 were randomly selected, and multiple prediction models were developed in each training set using various modeling strategies, including logistic regression, least absolute shrinkage and selection operator regression, Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Light Gradient Boosting Machine, and Random Forest. The dependent variable of the model was acute kidney injury (AKI), a binary outcome with an incidence of 18.5%, and the independent variables included 22 common predictors of AKI. For each model, 2-fold, 5-fold, and 10-fold cross-validation were used for internal validation to calculate area under the receiver-operating characteristic curve (AUC), which is a common metric for quantifying the overall ability of a model to discriminate between positive or negative classifications. In addition, the Harrell, .632, and .632+ AUC estimators were calculated for internal validation based on bootstrapping. The above simulation process was repeated 1000 times to obtain 1000 estimates of AUC for each internal validation method of each model. The model performance was simultaneously evaluated in the reference dataset to obtain an empirical AUC (analogous to the “gold standard”). Then, by comparing the 1000 AUC estimates with the empirical AUC, the accuracy of internal validation methods for different models was assessed. For parametric models, the .632+ estimator provided the most accurate estimates of AUC, followed by 10-fold cross-validation with only slight bias. In contrast, for nonparametric models, all bootstrap-based optimism correction methods significantly overestimated AUC, and the overestimation was not reduced by increasing the sample size. Most strikingly, 10-fold cross-validation demonstrated stable and good performance across all scenarios considered, regardless of the modeling strategy or sample size. The performance of bootstrap-based optimism correction methods can be affected by model complexity, although the .632+ estimator performs best in parameter models based on small-sample training. In comparison, 10-fold cross-validation is more robust and easier to implement. Therefore, it is recommended to prioritize 10-fold cross-validation as the internal validation method for prediction models. With the exponential growth of clinical prediction models, the methods for conducting internal validation of these models remain controversial. Both k-fold cross-validation and bootstrap-based optimism correction methods are recommended by guidance papers. However, the issue of whether they are applicable to all modeling strategies, especially machine learning algorithms, still lacks evidence. This study simulated various sample size scenarios based on real-world clinical data, and developed AKI prediction models based on parametric and nonparametric modeling strategies. Then, internal validation was performed for each model using different methods. The results showed that bootstrap-based optimism correction methods were suitable for parametric models. However, as the model complexity increased, the bias of bootstrap-based optimism correction methods increased accordingly. In contrast, 10-fold cross-validation performed well in all scenarios, regardless of the modeling strategy or sample size. Therefore, 10-fold cross-validation is recommended as a preferred method for internal validation of prediction models. [Display omitted] Key findings•Through a comprehensive comparison of internal validation methods in statistical and machine learning models, this study found that cross-validation, especially the 10-fold cross-validation, demonstrated stable and good performance across a wide range of scenarios. What this adds to what is known?•Although bootstrap-based optimism correction methods provide accurate estimates of model performance in parametric models, they exhibit significant overestimation in nonparametric models. What is the implication and what should change now?•Given that cross-validation is more robust and computationally less costly to implement than the bootstrap-based methods, it should be recommended as a preferred method for internal validation of prediction models.

Journal Article

Share this book

Add to My Shelf

Machine learning-based model for predicting the occurrence and mortality of nonpulmonary sepsis-associated ARDS

by Nie, Shinan , Sun, Zhaorui , Zhang, Suyan in 692/699/1785/3193 , 692/699/255 , Acute respiratory distress syndrome

2024

Objective: The objective was to establish a machine learning-based model for predicting the occurrence and mortality of nonpulmonary sepsis-associated ARDS. Methods: 80% of sepsis patients selected randomly from the MIMIC-IV database, without prior pulmonary conditions and with nonpulmonary infection sites, were used to construct prediction models through machine learning techniques (including K-nearest neighbour, extreme gradient boosting, support vector machine, deep neural network, and decision tree methods). The remaining 20% of patients were utilized to validate the model’s accuracy. Additionally, local data were employed for further model validation. Results: A total of 11,409 patients were included, with the most common type of infection being bloodstream infection. A total of 7,632 (66.9%) patients developed nonpulmonary sepsis-associated ARDS (NPS-ARDS). Patients with NPS-ARDS had significantly longer ICU stays (6.2 ± 5.2 days vs. 4.4 ± 3.7 days, p < 0.01) and higher 28-day mortality rates (19.5% vs. 14.9%, p < 0.01). Both internal and external validation demonstrated that the model constructed with the extreme gradient boosting method had high accuracy. In the internal validation, the model predicted NPS-ARDS and mortality in such patients with accuracies of 77.5% and 71.8%, respectively. In the external validation, the model predicted NPS-ARDS and mortality in these patients with accuracies of 78.0% and 81.4%, respectively. Conclusion: The model established via the extreme gradient boosting method can predict the occurrence and mortality of nonpulmonary sepsis-associated ARDS to a certain extent.

Journal Article

Share this book

Add to My Shelf

External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients

by Eertink, Jakoba J , Heymans, Martijn W , Zwezerijnen, Gerben J. C in Calibration , Datasets , Prediction models

2022

AimClinical prediction models need to be validated. In this study, we used simulation data to compare various internal and external validation approaches to validate models.MethodsData of 500 patients were simulated using distributions of metabolic tumor volume, standardized uptake value, the maximal distance between the largest lesion and another lesion, WHO performance status and age of 296 diffuse large B cell lymphoma patients. These data were used to predict progression after 2 years based on an existing logistic regression model. Using the simulated data, we applied cross-validation, bootstrapping and holdout (n = 100). We simulated new external datasets (n = 100, n = 200, n = 500) and simulated stage-specific external datasets (1), varied the cut-off for high-risk patients (2) and the false positive and false negative rates (3) and simulated a dataset with EARL2 characteristics (4). All internal and external simulations were repeated 100 times. Model performance was expressed as the cross-validated area under the curve (CV-AUC ± SD) and calibration slope.ResultsThe cross-validation (0.71 ± 0.06) and holdout (0.70 ± 0.07) resulted in comparable model performances, but the model had a higher uncertainty using a holdout set. Bootstrapping resulted in a CV-AUC of 0.67 ± 0.02. The calibration slope was comparable for these internal validation approaches. Increasing the size of the test set resulted in more precise CV-AUC estimates and smaller SD for the calibration slope. For test datasets with different stages, the CV-AUC increased as Ann Arbor stages increased. As expected, changing the cut-off for high risk and false positive- and negative rates influenced the model performance, which is clearly shown by the low calibration slope. The EARL2 dataset resulted in similar model performance and precision, but calibration slope indicated overfitting.ConclusionIn case of small datasets, it is not advisable to use a holdout or a very small external dataset with similar characteristics. A single small testing dataset suffers from a large uncertainty. Therefore, repeated CV using the full training dataset is preferred instead. Our simulations also demonstrated that it is important to consider the impact of differences in patient population between training and test data, which may ask for adjustment or stratification of relevant variables.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter