Catalogue Search | MBRL

The shaky foundations of large language models and foundation models for electronic health records

by Xu, Yizhe , Steinberg, Ethan , Fries, Jason in Artificial intelligence , Chatbots , Electronic health records

2023

The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models’ capabilities. In this narrative review, we examine 84 foundation models trained on non-imaging EMR data (i.e., clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g., MIMIC-III) or broad, public biomedical corpora (e.g., PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. Considering these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.

Journal Article

Share this book

Add to My Shelf

SARS-CoV-2 Infection after Vaccination in Health Care Workers in California

by Abeles, Shira R , Torriani, Francesca J , Schooley, Robert T in Asymptomatic , Asymptomatic Diseases , California - epidemiology

2021

After more than 36,500 health care workers at the University of California received at least one dose of vaccine, 71% of 379 workers with positive SARS-CoV-2 tests had positive results within 2 weeks after the first dose. Of 37 workers with positive results after the second dose, 7 had positive results 15 or more days after the dose.

Journal Article

Share this book

Add to My Shelf

Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications

by Mishuris, Rebecca G. , Hernandez-Boussard, Tina , Pfeffer, Michael A. in 631/114/1305 , 631/114/2397 , 631/114/2413

2025

With rapidly evolving artificial intelligence solutions, healthcare organizations need an implementation roadmap. A “clinical trials” informed approach can promote safe and impactful implementation of artificial intelligence. This framework includes four phases: (1) Safety; (2) Efficacy; (3) Effectiveness and comparison to an existing standard; and (4) Monitoring. Combined with inter-institutional collaboration and national funding support, this approach will advance safe, usable, effective, and equitable deployments of artificial intelligence in healthcare.

Journal Article

Share this book

Add to My Shelf

Considerations in the reliability and fairness audits of predictive models for advance care planning

by Pfeffer, Michael A. , Pfohl, Stephen , Chobot, Sarah in advance care planning , Advance directives , Algorithms

2022

Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8–10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.

Journal Article

Share this book

Add to My Shelf

Optimizing large language models for detecting symptoms of depression/anxiety in chronic diseases patient communications

by Torous, John , Kim, Jiyeong , Galatzer-Levy, Isaac R. in 692/1807 , 692/699/476/1300 , 692/699/476/1414

2025

Patients with diabetes are at increased risk of comorbid depression or anxiety, complicating their management. This study evaluated the performance of large language models (LLMs) in detecting these symptoms from secure patient messages. We applied multiple approaches, including engineered prompts, systemic persona, temperature adjustments, and zero-shot and few-shot learning, to identify the best-performing model and enhance performance. Three out of five LLMs demonstrated excellent performance (over 90% of F-1 and accuracy), with Llama 3.1 405B achieving 93% in both F-1 and accuracy using a zero-shot approach. While LLMs showed promise in binary classification and handling complex metrics like Patient Health Questionnaire-4, inconsistencies in challenging cases warrant further real-life assessment. The findings highlight the potential of LLMs to assist in timely screening and referrals, providing valuable empirical knowledge for real-world triage systems that could improve mental health care for patients with chronic diseases.

Journal Article

Share this book

Add to My Shelf

Corticosteroid suppression of lipoxin A4 and leukotriene B4from alveolar macrophages in severe asthma

by Bhavsar, Pankaj K , Levy, Bruce D , Pfeffer, Michael A in Asthma , Comparative analysis , Dexamethasone

2010

Background An imbalance in the generation of pro-inflammatory leukotrienes, and counter-regulatory lipoxins is present in severe asthma. We measured leukotriene B 4 (LTB 4 ), and lipoxin A 4 (LXA 4 ) production by alveolar macrophages (AMs) and studied the impact of corticosteroids. Methods AMs obtained by fiberoptic bronchoscopy from 14 non-asthmatics, 12 non-severe and 11 severe asthmatics were stimulated with lipopolysaccharide (LPS,10 μg/ml) with or without dexamethasone (10 -6 M). LTB 4 and LXA 4 were measured by enzyme immunoassay. Results LXA 4 biosynthesis was decreased from severe asthma AMs compared to non-severe (p < 0.05) and normal subjects (p < 0.001). LXA 4 induced by LPS was highest in normal subjects and lowest in severe asthmatics (p < 0.01). Basal levels of LTB 4 were decreased in severe asthmatics compared to normal subjects (p < 0.05), but not to non-severe asthma. LPS-induced LTB 4 was increased in severe asthma compared to non-severe asthma (p < 0.05). Dexamethasone inhibited LPS-induced LTB 4 and LXA 4 , with lesser suppression of LTB 4 in severe asthma patients (p < 0.05). There was a significant correlation between LPS-induced LXA 4 and FEV 1 (% predicted) (r s = 0.60; p < 0.01). Conclusions Decreased LXA 4 and increased LTB 4 generation plus impaired corticosteroid sensitivity of LPS-induced LTB 4 but not of LXA 4 support a role for AMs in establishing a pro-inflammatory balance in severe asthma.

Journal Article

Share this book

Add to My Shelf

Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor

by Dash, Dev , Shah, Nigam H. , Pfeffer, Michael A. in Data Collection , Documentation , Health Informatics

2022

Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied. To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested. MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items. From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex). These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.

Journal Article

Share this book

Add to My Shelf

A framework for using AI to drive care model transformation: building cars rather than faster horses

by Hofmann, Lawrence V. , Pfeffer, Michael A. , Sharp, Christopher in Artificial intelligence , Capitation , Editorial

2026

Despite advances in science and technology, persistent challenges in the delivery of healthcare call for care model transformations that have yet to be realized. Artificial intelligence could drive these transformations, but has yet to do so at scale. We present a four-layer framework for leveraging AI to design new care models: Knowledge (clinical content and institutional expertise), Intelligence (AI-powered synthesis and reasoning), Application (user interfaces), and Workflow (redesigned care processes). These layers are modular yet tightly interdependent, requiring cross-functional teams to design across the full stack. We illustrate this framework through an AI-enabled specialty consultation service deployed within Stanford Health Care, a quaternary academic medical center, that integrates all four layers to transform how expertise is delivered. This framework offers health system leaders a roadmap for moving beyond technology deployment toward systematic care model engineering—an organizational capability that will help shape the future of healthcare delivery.

Journal Article

Share this book

Add to My Shelf

The Stanford Medicine data science ecosystem for clinical and translational research

by Ashley, Euan , Pfeffer, Michael A , Halaas, Michael in Case studies , Data entry , Data science

2023

Abstract Objective To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure. Lay Summary Research patient data repositories are essential for health systems to learn from the experiences of their patients and for advancing the mission of academic medical centers. In this paper, we describe methods, tools, and practices at Stanford Medicine to maintain its research patient data repository and computing resources to support clinical and translational research, which together comprise the Stanford Medicine Data Science Resources (SDSR). The SDSR includes computing infrastructure and tools to create, search, retrieve, and analyze patient data. Data are made available via self-service and staff supported access, on secure computers. The Stanford Medicine Research Data Repository functions as the SDSR data integration point, and includes patient records such as clinical images, text, bedside monitoring data and administrative records. SDSR tools include a search engine for patient data and data analysis tools for identifying and retrieving data about groups of patients with shared characteristics, such as a diagnosis or treatment. The SDSR also supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and observational studies. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users.

Journal Article

Share this book

Add to My Shelf

Excess Patient Visits for Cough and Pulmonary Disease at a Large US Health System in the Months Prior to the COVID-19 Pandemic: Time-Series Analysis

by Morrison, Douglas E , Elmore, Joann G , Kerr, Kathleen F in Acute Disease , Adult , Ambulatory Care Facilities

2020

Accurately assessing the regional activity of diseases such as COVID-19 is important in guiding public health interventions. Leveraging electronic health records (EHRs) to monitor outpatient clinical encounters may lead to the identification of emerging outbreaks. The aim of this study is to investigate whether excess visits where the word \"cough\" was present in the EHR reason for visit, and hospitalizations with acute respiratory failure were more frequent from December 2019 to February 2020 compared with the preceding 5 years. A retrospective observational cohort was identified from a large US health system with 3 hospitals, over 180 clinics, and 2.5 million patient encounters annually. Data from patient encounters from July 1, 2014, to February 29, 2020, were included. Seasonal autoregressive integrated moving average (SARIMA) time-series models were used to evaluate if the observed winter 2019/2020 rates were higher than the forecast 95% prediction intervals. The estimated excess number of visits and hospitalizations in winter 2019/2020 were calculated compared to previous seasons. The percentage of patients presenting with an EHR reason for visit containing the word \"cough\" to clinics exceeded the 95% prediction interval the week of December 22, 2019, and was consistently above the 95% prediction interval all 10 weeks through the end of February 2020. Similar trends were noted for emergency department visits and hospitalizations starting December 22, 2019, where observed data exceeded the 95% prediction interval in 6 and 7 of the 10 weeks, respectively. The estimated excess over the 3-month 2019/2020 winter season, obtained by either subtracting the maximum or subtracting the average of the five previous seasons from the current season, was 1.6 or 2.0 excess visits for cough per 1000 outpatient visits, 11.0 or 19.2 excess visits for cough per 1000 emergency department visits, and 21.4 or 39.1 excess visits per 1000 hospitalizations with acute respiratory failure, respectively. The total numbers of excess cases above the 95% predicted forecast interval were 168 cases in the outpatient clinics, 56 cases for the emergency department, and 18 hospitalized with acute respiratory failure. A significantly higher number of patients with respiratory complaints and diseases starting in late December 2019 and continuing through February 2020 suggests community spread of SARS-CoV-2 prior to established clinical awareness and testing capabilities. This provides a case example of how health system analytics combined with EHR data can provide powerful and agile tools for identifying when future trends in patient populations are outside of the expected ranges.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter