Catalogue Search | MBRL

Large Language Model Symptom Identification From Clinical Text: Multicenter Study

by Phelan, Dylan , Miller, Timothy , Dixon, Brian E in AI Language Models in Health Care , Clinical Informatics , Clinical Information and Decision Making

2025

Recognizing patient symptoms is fundamental to medicine, research, and public health. However, symptoms are often underreported in coded formats even though they are routinely documented in physician notes. Large language models (LLMs), noted for their generalizability, could help bridge this gap by mimicking the role of human expert chart reviewers for symptom identification. The primary objective of this multisite study was to measure the accurate identification of infectious respiratory disease symptoms using LLMs instructed to follow chart review guidelines. The secondary objective was to evaluate LLM generalizability in multisite settings without the need for site-specific training, fine-tuning, or customization. Four LLMs were evaluated: GPT-4, GPT-3.5, Llama2 70B, and Mixtral 8×7B. LLM prompts were instructed to take on the role of chart reviewers and follow symptom annotation guidelines when assessing physician notes. Ground truth labels for each note were annotated by subject matter experts. Optimal LLM prompting strategies were selected using a development corpus of 103 notes from the emergency department at Boston Children's Hospital. The performance of each LLM was measured using a test corpus with 202 notes from Boston Children's Hospital. The performance of an International Classification of Diseases, Tenth Revision (ICD-10)-based method was also measured as a baseline. Generalizability of the most performant LLM was then measured in a validation corpus of 308 notes from 21 emergency departments in the Indiana Health Information Exchange. Symptom identification accuracy was superior for every LLM tested for each infectious disease symptom compared to an ICD-10-based method (F1-score=45.1%). GPT-4 was the highest scoring (F1-score=91.4%; P<.001) and was significantly better than the ICD-10-based method, followed by GPT-3.5 (F1-score=90.0%; P<.001), Llama2 (F1-score=81.7%; P<.001), and Mixtral (F1-score=83.5%; P<.001). For the validation corpus, performance of the ICD-10-based method decreased (F1-score=26.9%), while GPT-4 increased (F1-score=94.0%), demonstrating better generalizability using GPT-4 (P<.001). LLMs significantly outperformed an ICD-10-based method for respiratory symptom identification in emergency department electronic health records. GPT-4 demonstrated the highest accuracy and generalizability, suggesting that LLMs may augment or replace traditional approaches. LLMs can be instructed to mimic human chart reviewers with high accuracy. Future work should assess broader symptom types and health care settings.

Journal Article

Share this book

Add to My Shelf

Emergency department visits and boarding for pediatric patients with suicidality before and during the COVID-19 pandemic

by Bode, Louisa , Zipursky, Amy R. , McMurry, Andrew in Care and treatment , Child , Children

2023

To quantify the increase in pediatric patients presenting to the emergency department with suicidality before and during the COVID-19 pandemic, and the subsequent impact on emergency department length of stay and boarding. This retrospective cohort study from June 1, 2016, to October 31, 2022, identified patients ages 6 to 21 presenting to the emergency department at a pediatric academic medical center with suicidality using ICD-10 codes. Number of emergency department encounters for suicidality, demographic characteristics of patients with suicidality, and emergency department length of stay were compared before and during the COVID-19 pandemic. Unobserved components models were used to describe monthly counts of emergency department encounters for suicidality. There were 179,736 patient encounters to the emergency department during the study period, 6,215 (3.5%) for suicidality. There were, on average, more encounters for suicidality each month during the COVID-19 pandemic than before the COVID-19 pandemic. A time series unobserved components model demonstrated a temporary drop of 32.7 encounters for suicidality in April and May of 2020 (p<0.001), followed by a sustained increase of 31.2 encounters starting in July 2020 (p = 0.003). The average length of stay for patients that boarded in the emergency department with a diagnosis of suicidality was 37.4 hours longer during the COVID-19 pandemic compared to before the COVID-19 pandemic (p<0.001). The number of encounters for suicidality among pediatric patients and the emergency department length of stay for psychiatry boarders has increased during the COVID-19 pandemic. There is a need for acute care mental health services and solutions to emergency department capacity issues.

Journal Article

Share this book

Add to My Shelf

Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study

by Mandl, Kenneth D , Zipursky, Amy R , Ignatov, Vladimir in Accuracy , Artificial Intelligence , Biosecurity

2024

Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children's hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F -score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F -score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F -score=0.796) than ICD-10 codes (F -score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F -score=0.828 and ICD-10: F -score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.

Journal Article

Share this book

Add to My Shelf

Multisource representation learning for pediatric knowledge extraction from electronic health records

by Pan, Kevin , Li, Mengyan , Mandl, Kenneth in 631/114/1305 , 692/308/3187 , Automation

2024

Electronic Health Record (EHR) systems are particularly valuable in pediatrics due to high barriers in clinical studies, but pediatric EHR data often suffer from low content density. Existing EHR code embeddings tailored for the general patient population fail to address the unique needs of pediatric patients. To bridge this gap, we introduce a transfer learning approach, MU ltisource G raph S ynthesis (MUGS), aimed at accurate knowledge extraction and relation detection in pediatric contexts. MUGS integrates graphical data from both pediatric and general EHR systems, along with hierarchical medical ontologies, to create embeddings that adaptively capture both the homogeneity and heterogeneity between hospital systems. These embeddings enable refined EHR feature engineering and nuanced patient profiling, proving particularly effective in identifying pediatric patients similar to specific profiles, with a focus on pulmonary hypertension (PH). MUGS embeddings, resistant to negative transfer, outperform other benchmark methods in multiple applications, advancing evidence-based pediatric research.

Journal Article

Share this book

Add to My Shelf

A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies

by Li, Mengyan , Greco, Kimberly F. , Cai, Tianxi in 631/114 , 639/705 , 692/308

2026

Rare diseases affect an estimated 300–400 million people worldwide, yet individual conditions remain underdiagnosed and poorly characterized due to low prevalence and limited clinician familiarity. Computational phenotyping offers a scalable approach to improving rare disease detection, but algorithm development is constrained by scarce high-quality labeled data. Expert-labeled datasets from chart reviews and registries are highly accurate but limited in scope, whereas labels derived from electronic health records (EHRs) provide broader coverage but are often noisy or incomplete. To efficiently leverage both sources, we propose WEST (WEakly Supervised Transformer) for rare disease diagnosis and subphenotyping from EHRs. At its core, WEST employs a weakly supervised transformer trained on a limited set of expert-validated labels and extensive probabilistic silver-standard labels—derived from structured and unstructured EHR features—that are iteratively refined across training rounds to improve model calibration. We evaluate WEST on two rare pulmonary conditions using EHR data from Boston Children’s Hospital and show that it outperforms existing methods in phenotype classification, identification of clinically relevant subphenotypes, and prediction of disease progression. By reducing reliance on manual annotation, WEST enables label-efficient representation learning that supports accurate rare disease diagnosis and reveals deeper clinical insights from routine EHR data.

Journal Article

Share this book

Add to My Shelf

Therapeutic Hypothermia in Children

by Riess, Matthias L , Aufderheide, Tom P , Yannopoulos, Demetris in Bayesian analysis , Children , Coma

2015

To the Editor: Moler et al. (May 14 issue) 1 report that in comatose children who survived out-of-hospital cardiac arrest, therapeutic hypothermia did not confer a significant benefit in survival with a good functional outcome at 1 year. Two issues concern us about rejecting such therapy. First, accounting for the presence of pupillary responses at presentation may influence the analysis. Moler et al. previously found that bilateral reactive pupils up to 12 hours after the return of circulation were independently associated with lower mortality. 2 The current study did not present this information. Second, a Bayesian approach may show a significant reduction . . .

Journal Article

Share this book

Add to My Shelf

Artificial intelligence-enabled electrocardiogram guidance for pulmonary valve replacement timing in repaired tetralogy of Fallot

by Triedman, John K. , Geva, Tal , Wald, Rachel M. in Adolescent , Adult , Artificial Intelligence

2026

•Pre-PVR AI-ECG was predictive of post-PVR survival in patients with rTOF.•AI-ECG complements imaging biomarkers for PVR risk stratification.•AI-ECG may help physicians to safely defer PVR based on the patient’s risk profile. Optimal timing of pulmonary valve replacement (PVR) in repaired tetralogy of Fallot (rTOF) remains challenging. We hypothesized that pre-PVR artificial intelligence-enabled electrocardiogram (AI-ECG) may inform optimal PVR timing in rTOF. rTOF PVR patients at Boston Children’s Hospital (BCH) and Toronto General Hospital (TGH) with analyzable ECGs ≤3 months pre-PVR were included. Patients undergoing PVR were propensity score-matched 1:1 to non-PVR patients. Patients were partitioned into risk tertiles based on pre-PVR AI-ECG probabilities of 5-year mortality: low-, intermediate-, and high-risk. The PVR cohort included 605 patients (504 at Boston Children’s Hospital (BCH), 101 at Toronto General Hospital (TGH); median age 20.3 [IQR, 13.6-32.0] years; median follow-up 7.5 [IQR, 4.7-10.6] years; 3.6% mortality). Pre-PVR AI-ECG risk probability was predictive of post-PVR mortality (c-index 0.77), outperforming an established imaging-based model benchmark (c-index 0.70). AI-ECG remained an independent predictor when added to the benchmark model (P < .001) with a higher c-index of 0.84. Survival was similar between low- and intermediate-risk groups (97-98% 15-year survival; P = .6), with increased mortality for the high-risk group (83% 15-year survival; P = .009). The matched cohort demonstrated that PVR was associated with increased survival overall (HR 0.28 [95% CI, 0.13-0.60], P = .001). Exploratory analyses stratified by risk group tertiles showed survival benefit associated with PVR in the intermediate-risk (HR 0.10 [95% CI, 0.01-0.86]; P = .04) and high-risk (HR 0.3 [0.1-0.7]; P = .005) groups, but not in the low-risk group (P = .8). AI-ECG predicts post-PVR survival in rTOF patients with a PVR survival benefit in intermediate- and high-risk, but not low-risk, groups. AI-ECG may complement imaging biomarkers to determine rTOF PVR timing.

Journal Article

Share this book

Add to My Shelf

What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask

by Cimino, James J , Pedrera-Jiménez, Miguel , Murphy, Shawn N in Appraisal , Audiences , Best practice

2021

Coincident with the tsunami of COVID-19–related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.

Journal Article

Share this book

Add to My Shelf

Long-term outcomes and risk factors for aortic regurgitation after discrete subvalvular aortic stenosis resection in children

by Geva, Tal , Gauvreau, Kimberlee , Geva, Alon in Adolescent , Adult , Age Factors

2015

ObjectivesTo characterise long-term outcomes after discrete subaortic stenosis (DSS) resection and to identify risk factors for reoperation and aortic regurgitation (AR) requiring repair or replacement.MethodsAll patients who underwent DSS resection between 1984 and 2009 at our institution with at least 36 months’ follow-up were included. Demographic, surgical and echocardiographic data were reviewed. Outcomes were reoperation for recurrent DSS, surgery for AR, death and morbidities, including heart transplant, endocarditis and complete heart block.ResultsMedian length of postoperative follow-up was 10.9 years (3–27.2 years). Reoperation occurred in 32 patients (21%) and plateaued 10 years after initial resection. Survival at 10 years and 20 years was 98.6% and 86.3%, respectively. Aortic valve (AoV) repair or replacement for predominant AR occurred in 31 patients (20%) during or after DSS resection. By multivariable analysis, prior aortic stenosis (AS) intervention (HR 22.4, p<0.001) was strongly associated with AoV repair or replacement. Risk factors for reoperation by multivariable analysis included younger age at resection (HR 1.24, p=0.003), preoperative gradient ≥60 mm Hg (HR 2.23, p=0.04), peeling of membrane off AoV or mitral valve (HR 2.52, p=0.01), distance of membrane to AoV <7.0 mm (HR 4.03, p=0.03) and AS (HR 2.58, p=0.01).ConclusionsIn this cohort, the incidence of reoperations after initial DSS resection plateaued after 10 years. Despite a significant rate of reoperation, overall survival was good. Concomitant congenital AS and its associated interventions significantly increased the risk of AR requiring surgical intervention.

Journal Article

Share this book

Add to My Shelf

Clinical Characteristics and Outcomes of Children with Acute Catastrophic Brain Injury: A 13-Year Retrospective Cohort Study

by De Souza, Bradley J. , Szuch, Eliza , Tasker, Robert C. in Cardiac arrest , Chronic illnesses , Cohort analysis

2022

Background The purpose of this study was to describe and analyze clinical characteristics and outcomes in children with acute catastrophic brain injury (CBI). Methods This was a single-center, 13-year (2008–2020) retrospective cohort study of children in the pediatric and cardiac intensive care units with CBI, defined as (1) acute neurologic injury based on clinical and/or imaging findings, (2) the need for life-sustaining intensive care unit therapies, and (3) death or survival with a Glasgow Coma Scale score < 13 at discharge. Patients were excluded if they were discharged directly to home < 14 days from admission or had a chronic neurologic condition with a baseline Glasgow Coma Scale score < 13. The association between the primary outcome of death and clinical variables was analyzed by using Kaplan–Meier estimates and multivariable Cox proportional hazard models. Outcomes assessed after discharge were technology dependence, neurologic deficits, and Functional Status Score. Improved functional status was defined as a change in total Functional Status Score ≥ 2. Results Of 106 patients (58% boys, median age 3.9 years) with CBI, 86 (81%) died. Withdrawal of life-sustaining therapies was the most common cause of death (60 of 86, 70%). In our multivariable analysis, each unit increase in admission pediatric sequential organ failure assessment score was associated with 10% greater hazard of death (hazard ratio 1.10, 95% confidence interval 1.04–1.17, p < .01). After controlling for admission pediatric sequential organ failure assessment scores, compared with those of patients with traumatic brain injury, all other etiologies of CBI were associated with a greater hazard of death ( p = .02; hazard ratio 3.76–10). The median survival time for the cohort was 22 days (95% confidence interval 14–37 days). Of 23 survivors to hospital discharge, 20 were still alive after a median of 2 years (interquartile range 1–3 years), 6 of 20 (30%) did not have any technology dependence, 12 of 20 (60%) regained normal levels of alertness and responsiveness, and 15 of 20 (75%) had improved functional status. Conclusions Most children with acute CBI died within 1 month of hospitalization. Having traumatic brain injury as the etiology of CBI was associated with greater survival, whereas increased organ dysfunction score on admission was associated with a higher hazard of mortality. Of the survivors, some recovered consciousness and functional status and did not require permanent technology dependence. Larger prospective studies are needed to improve prediction of CBI among critically ill children, understand factors guiding clinician and family decisions on the continuation or withdrawal of life-sustaining treatments, and characterize the natural history and long-term outcomes among CBI survivors.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter