Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Language
      Language
      Clear All
      Language
  • Subject
      Subject
      Clear All
      Subject
  • Item Type
      Item Type
      Clear All
      Item Type
  • Discipline
      Discipline
      Clear All
      Discipline
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
148 result(s) for "Kors, Jan A."
Sort by:
External validation of existing dementia prediction models on observational health data
Background Many dementia prediction models have been developed, but only few have been externally validated, which hinders clinical uptake and may pose a risk if models are applied to actual patients regardless. Externally validating an existing prediction model is a difficult task, where we mostly rely on the completeness of model reporting in a published article. In this study, we aim to externally validate existing dementia prediction models. To that end, we define model reporting criteria, review published studies, and externally validate three well reported models using routinely collected health data from administrative claims and electronic health records. Methods We identified dementia prediction models that were developed between 2011 and 2020 and assessed if they could be externally validated given a set of model criteria. In addition, we externally validated three of these models (Walters’ Dementia Risk Score, Mehta’s RxDx-Dementia Risk Index, and Nori’s ADRD dementia prediction model) on a network of six observational health databases from the United States, United Kingdom, Germany and the Netherlands, including the original development databases of the models. Results We reviewed 59 dementia prediction models. All models reported the prediction method, development database, and target and outcome definitions. Less frequently reported by these 59 prediction models were predictor definitions (52 models) including the time window in which a predictor is assessed (21 models), predictor coefficients (20 models), and the time-at-risk (42 models). The validation of the model by Walters (development c-statistic: 0.84) showed moderate transportability (0.67–0.76 c-statistic). The Mehta model (development c-statistic: 0.81) transported well to some of the external databases (0.69–0.79 c-statistic). The Nori model (development AUROC: 0.69) transported well (0.62–0.68 AUROC) but performed modestly overall. Recalibration showed improvements for the Walters and Nori models, while recalibration could not be assessed for the Mehta model due to unreported baseline hazard. Conclusion We observed that reporting is mostly insufficient to fully externally validate published dementia prediction models, and therefore, it is uncertain how well these models would work in other clinical settings. We emphasize the importance of following established guidelines for reporting clinical prediction models. We recommend that reporting should be more explicit and have external validation in mind if the model is meant to be applied in different settings.
Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
BackgroundThere is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data.MethodsWe developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots.ResultsWe developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset.ConclusionsOverall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases.
Using Structured Codes and Free-Text Notes to Measure Information Complementarity in Electronic Health Records: Feasibility and Validation Study
Electronic health records (EHRs) consist of both structured data (eg, diagnostic codes) and unstructured data (eg, clinical notes). It is commonly believed that unstructured clinical narratives provide more comprehensive information. However, this assumption lacks large-scale validation and direct validation methods. This study aims to quantitatively compare the information in structured and unstructured EHR data and directly validate whether unstructured data offers more extensive information across a patient population. We analyzed both structured and unstructured data from patient records and visits in a large Dutch primary care EHR database between January 2021 and January 2024. Clinical concepts were identified from free-text notes using an extraction framework tailored for Dutch and compared with concepts from structured data. Concept embeddings were generated to measure semantic similarity between structured and extracted concepts through cosine similarity. A similarity threshold was systematically determined via annotated matches and minimized weighted Gini impurity. We then quantified the concept overlap between structured and unstructured data across various concept domains and patient populations. In a population of 1.8 million patients, only 13% of extracted concepts from patient records and 7% from individual visits had similar structured counterparts. Conversely, 42% of structured concepts in records and 25% in visits had similar matches in unstructured data. Condition concepts had the highest overlap, followed by measurements and drug concepts. Subpopulation visits, such as those with chronic conditions or psychological disorders, showed different proportions of data overlap, indicating varied reliance on structured versus unstructured data across clinical contexts. Our study demonstrates the feasibility of quantifying the information difference between structured and unstructured data, showing that the unstructured data provides important additional information in the studied database and populations. The annotated concept matches are made publicly available for the clinical natural language processing community. Despite some limitations, our proposed methodology proves versatile, and its application can lead to more robust and insightful observational clinical research.
Heart rate variability is associated with left ventricular systolic, diastolic function and incident heart failure in the general population
Background HRV has mostly shown associations with systolic dysfunction and more recently, with diastolic dysfunction in Heart failure (HF) patients. But the role of sympathetic nervous system in changes of left ventricular (LV) systolic and diastolic function and new-onset HF has not been extensively studied. Methods Among 3157 men and 4405 women free of HF and atrial fibrillation retrospectively included from the population-based Rotterdam Study, we used linear mixed models to examine associations of RR-interval differences and standard deviation of RR-intervals corrected for heart rate (RMSSDc and SDNNc) with longitudinal changes of LV ejection fraction (LVEF), E/A ratio, left atrial (LA) diameter, E/e’ ratio. Afterwards, using cox regressions, we examined their association with new-onset HF. Results Mean (SD) age was 65 (9.95) in men and 65.7 (10.2) in women. Every unit increase in log RMSSDc was accompanied by 0.75% (95%CI:-1.11%;-0.39%) and 0.31% (− 0.60%;-0.01%) lower LVEF among men and women each year, respectively. Higher log RMSSDc was linked to 0.03 (− 0.04;-0.01) and 0.02 (− 0.03;-0.003) lower E/A and also − 1.76 (− 2.77;− 0.75) and − 1.18 (− 1.99;-0.38) lower LVM index in both sexes and 0.72 mm (95% CI: − 1.20;-0.25) smaller LA diameters in women. The associations with LVEF in women diminished after excluding HF cases during the first 3 years of follow-up. During a median follow-up of 8.7 years, hazard ratios (95%CI) for incident HF were 1.34 (1.08;1.65) for log RMSSDc in men and 1.15 (0.93;1.42) in women. SDNNc showed similar associations. Conclusions Indices of HRV were associated with worse systolic function in men but mainly with improvement in LA size in women. Higher HRV was associated with higher risk of new-onset HF in men. Our findings highlight potential sex differences in autonomic function underlying cardiac dysfunction and heart failure in the general population.
Annotated Chemical Patent Corpus: A Gold Standard for Text Mining
Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph
Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the expression of nearby genes on the genome. However, identifying which genes are targeted by these disease-associated SNPs remains challenging. In the past, protein knowledge graphs have often been used to identify genes that are associated with disease, also referred to as “disease genes”. Here, we explore whether protein knowledge graphs can be used to identify genes that are targeted by disease-associated non-coding SNPs by testing and comparing the performance of six existing methods for a protein knowledge graph, four of which were developed for disease gene identification. We compare our performance against two baselines: (1) an existing state-of-the-art method that is based on guilt-by-association, and (2) the leading assumption that SNPs target the nearest gene on the genome. We test these methods with four reference sets, three of which were obtained by different means. Furthermore, we combine methods to investigate whether their combination improves performance. We find that protein knowledge graphs that include predicate information perform comparable to the current state of the art, achieving an area under the receiver operating characteristic curve (AUC) of 79.6% on average across all four reference sets. Protein knowledge graphs that lack predicate information perform comparable to our other baseline (genetic distance) which achieved an AUC of 75.7% across all four reference sets. Combining multiple methods improved performance to 84.9% AUC. We conclude that methods for a protein knowledge graph can be used to identify which genes are targeted by disease-associated non-coding SNPs.
Development and validation of a patient-level model to predict dementia across a network of observational databases
Background A prediction model can be a useful tool to quantify the risk of a patient developing dementia in the next years and take risk-factor-targeted intervention. Numerous dementia prediction models have been developed, but few have been externally validated, likely limiting their clinical uptake. In our previous work, we had limited success in externally validating some of these existing models due to inadequate reporting. As a result, we are compelled to develop and externally validate novel models to predict dementia in the general population across a network of observational databases. We assess regularization methods to obtain parsimonious models that are of lower complexity and easier to implement. Methods Logistic regression models were developed across a network of five observational databases with electronic health records (EHRs) and claims data to predict 5-year dementia risk in persons aged 55–84. The regularization methods L1 and Broken Adaptive Ridge (BAR) as well as three candidate predictor sets to optimize prediction performance were assessed. The predictor sets include a baseline set using only age and sex, a full set including all available candidate predictors, and a phenotype set which includes a limited number of clinically relevant predictors. Results BAR can be used for variable selection, outperforming L1 when a parsimonious model is desired. Adding candidate predictors for disease diagnosis and drug exposure generally improves the performance of baseline models using only age and sex. While a model trained on German EHR data saw an increase in AUROC from 0.74 to 0.83 with additional predictors, a model trained on US EHR data showed only minimal improvement from 0.79 to 0.81 AUROC. Nevertheless, the latter model developed using BAR regularization on the clinically relevant predictor set was ultimately chosen as best performing model as it demonstrated more consistent external validation performance and improved calibration. Conclusions We developed and externally validated patient-level models to predict dementia. Our results show that although dementia prediction is highly driven by demographic age, adding predictors based on condition diagnoses and drug exposures further improves prediction performance. BAR regularization outperforms L1 regularization to yield the most parsimonious yet still well-performing prediction model for dementia.
Mapping between clinical and preclinical terminologies: eTRANSAFE’s Rosetta stone approach
Background The eTRANSAFE project developed tools that support translational research. One of the challenges in this project was to combine preclinical and clinical data, which are coded with different terminologies and granularities, and are expressed as single pre-coordinated, clinical concepts and as combinations of preclinical concepts from different terminologies. This study develops and evaluates the Rosetta Stone approach, which maps combinations of preclinical concepts to clinical, pre-coordinated concepts, allowing for different levels of exactness of mappings. Methods Concepts from preclinical and clinical terminologies used in eTRANSAFE have been mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). SNOMED CT acts as an intermediary terminology that provides the semantics to bridge between pre-coordinated clinical concepts and combinations of preclinical concepts with different levels of granularity. The mappings from clinical terminologies to SNOMED CT were taken from existing resources, while mappings from the preclinical terminologies to SNOMED CT were manually created. A coordination template defines the relation types that can be explored for a mapping and assigns a penalty score that reflects the inexactness of the mapping. A subset of 60 pre-coordinated concepts was mapped both with the Rosetta Stone semantic approach and with a lexical term matching approach. Both results were manually evaluated. Results A total of 34,308 concepts from preclinical terminologies (Histopathology terminology, Standard for Exchange of Nonclinical Data (SEND) code lists, Mouse Adult Gross Anatomy Ontology) and a clinical terminology (MedDRA) were mapped to SNOMED CT as the intermediary bridging terminology. A terminology service has been developed that returns dynamically the exact and inexact mappings between preclinical and clinical concepts. On the evaluation set, the precision of the mappings from the terminology service was high (95%), much higher than for lexical term matching (22%). Conclusion The Rosetta Stone approach uses a semantically rich intermediate terminology to map between pre-coordinated clinical concepts and a combination of preclinical concepts with different levels of exactness. The possibility to generate not only exact but also inexact mappings allows to relate larger amounts of preclinical and clinical data, which can be helpful in translational use cases.
The meaning of the Tp-Te interval and its diagnostic value
The interval between T peak (Tp) and T end (Te) has been proposed as a measure of transmural dispersion of repolarization, but experimental and clinical studies to validate Tp-Te have given conflicting results. We have investigated the meaning of Tp-Te and its diagnostic potential. We used a digital model of the left ventricular wall to simulate the effect of varying action potential durations on the timing of Tp and Te. Furthermore, we used the vectorcardiogram to explain the relationships between Tp locations in the precordial electrocardiogram leads. Prolongation or ischemic shortening of action potentials in our model did not result in substantial Tp shifts. The phase relationships revealed by the vectorcardiogram showed that Tp-Te in the precordial leads is a derivative of T loop morphology. Tp-Te is the resultant of the global distribution of the repolarization process and is a surrogate diagnostic parameter.
Parasitic infections related to anti-type 2 immunity monoclonal antibodies: a disproportionality analysis in the food and drug administration’s adverse event reporting system (FAERS)
Introduction: Monoclonal antibodies (mAbs) targeting immunoglobulin E (IgE) [omalizumab], type 2 (T2) cytokine interleukin (IL) 5 [mepolizumab, reslizumab], IL-4 Receptor (R) α [dupilumab], and IL-5R [benralizumab]), improve quality of life in patients with T2-driven inflammatory diseases. However, there is a concern for an increased risk of helminth infections. The aim was to explore safety signals of parasitic infections for omalizumab, mepolizumab, reslizumab, dupilumab, and benralizumab. Methods: Spontaneous reports were used from the Food and Drug Administration’s Adverse Event Reporting System (FAERS) database from 2004 to 2021. Parasitic infections were defined as any type of parasitic infection term obtained from the Standardised Medical Dictionary for Regulatory Activities ® (MedDRA ® ). Safety signal strength was assessed by the Reporting Odds Ratio (ROR). Results: 15,502,908 reports were eligible for analysis. Amongst 175,888 reports for omalizumab, mepolizumab, reslizumab, dupilumab, and benralizumab, there were 79 reports on parasitic infections. Median age was 55 years (interquartile range 24–63 years) and 59.5% were female. Indications were known in 26 (32.9%) reports; 14 (53.8%) biologicals were reportedly prescribed for asthma, 8 (30.7%) for various types of dermatitis, and 2 (7.6%) for urticaria. A safety signal was observed for each biological, except for reslizumab (due to lack of power), with the strongest signal attributed to benralizumab (ROR = 15.7, 95% Confidence Interval: 8.4–29.3). Conclusion: Parasitic infections were disproportionately reported for mAbs targeting IgE, T2 cytokines, or T2 cytokine receptors. While the number of adverse event reports on parasitic infections in the database was relatively low, resulting safety signals were disproportionate and warrant further investigation.