Catalogue Search | MBRL

Large language models for structured reporting in radiology: past, present, and future

by Saba, Luca , dos Santos, Daniel Pinto , Bressem, Keno K. in Algorithms , Automation , Communication

2025

Structured reporting (SR) has long been a goal in radiology to standardize and improve the quality of radiology reports. Despite evidence that SR reduces errors, enhances comprehensiveness, and increases adherence to guidelines, its widespread adoption has been limited. Recently, large language models (LLMs) have emerged as a promising solution to automate and facilitate SR. Therefore, this narrative review aims to provide an overview of LLMs for SR in radiology and beyond. We found that the current literature on LLMs for SR is limited, comprising ten studies on the generative pre-trained transformer (GPT)-3.5 ( n = 5) and/or GPT-4 ( n = 8), while two studies additionally examined the performance of Perplexity and Bing Chat or IT5. All studies reported promising results and acknowledged the potential of LLMs for SR, with six out of ten studies demonstrating the feasibility of multilingual applications. Building upon these findings, we discuss limitations, regulatory challenges, and further applications of LLMs in radiology report processing, encompassing four main areas: documentation, translation and summarization, clinical evaluation, and data mining. In conclusion, this review underscores the transformative potential of LLMs to improve efficiency and accuracy in SR and radiology report processing. Key Points Question How can LLMs help make SR in radiology more ubiquitous ? Findings Current literature leveraging LLMs for SR is sparse but shows promising results, including the feasibility of multilingual applications . Clinical relevance LLMs have the potential to transform radiology report processing and enable the widespread adoption of SR. However, their future role in clinical practice depends on overcoming current limitations and regulatory challenges, including opaque algorithms and training data .

Journal Article

Share this book

Add to My Shelf

Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis

by Allega, Fabio , Shaik, Maqsood , Gaber, Farieda in 631/114 , 692/1807 , 692/308

2025

Accurate medical decision-making is critical for both patients and clinicians. Patients often struggle to interpret their symptoms, determine their severity, and select the right specialist. Simultaneously, clinicians face challenges in integrating complex patient data to make timely, accurate diagnoses. Recent advances in large language models (LLMs) offer the potential to bridge this gap by supporting decision-making for both patients and healthcare providers. In this study, we benchmark multiple LLM versions and an LLM-based workflow incorporating retrieval-augmented generation (RAG) on a curated dataset of 2000 medical cases derived from the Medical Information Mart for Intensive Care database. Our findings show that these LLMs are capable of providing personalized insights into likely diagnoses, suggesting appropriate specialists, and assessing urgent care needs. These models may also support clinicians in refining diagnoses and decision-making, offering a promising approach to improving patient outcomes and streamlining healthcare delivery.

Journal Article

Share this book

Add to My Shelf

What Does DALL-E 2 Know About Radiology?

by Bressem, Keno K , Adams, Lisa C , Makowski, Marcus R in Ankle , Artificial Intelligence , Augmentation

2023

Generative models, such as DALL-E 2 (OpenAI), could represent promising future tools for image generation, augmentation, and manipulation for artificial intelligence research in radiology, provided that these models have sufficient medical domain knowledge. Herein, we show that DALL-E 2 has learned relevant representations of x-ray images, with promising capabilities in terms of zero-shot text-to-image generation of new images, the continuation of an image beyond its original boundaries, and the removal of elements; however, its capabilities for the generation of images with pathological abnormalities (eg, tumors, fractures, and inflammation) or computed tomography, magnetic resonance imaging, or ultrasound images are still limited. The use of generative models for augmenting and generating radiological data thus seems feasible, even if the further fine-tuning and adaptation of these models to their respective domains are required first.

Journal Article

Share this book

Add to My Shelf

Navigating the European Union Artificial Intelligence Act for Healthcare

by Adams, Lisa C. , Bressem, Keno K. , Johner, Christian in 692/700/1538 , 692/700/3934 , 706/703/253

2024

The European Union’s recently adopted Artificial Intelligence (AI) Act is the first comprehensive legal framework specifically on AI. This is particularly important for the healthcare domain, as other existing harmonisation legislation, such as the Medical Device Regulation, do not explicitly cover medical AI applications. Given the far-reaching impact of this regulation on the medical AI sector, this commentary provides an overview of the key elements of the AI Act, with easy-to-follow references to the relevant chapters.

Journal Article

Share this book

Add to My Shelf

Current applications and challenges in large language models for patient care: a systematic review

by Saba, Luca , Kader, Rawen , Ortiz-Prado, Esteban in 692/700/139 , 692/700/1750 , 692/700/228

2025

Background The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care and broadening access to medical knowledge. Despite the popularity of LLMs, there is a significant gap in systematized information on their use in patient care. Therefore, this systematic review aims to synthesize current applications and limitations of LLMs in patient care. Methods We systematically searched 5 databases for qualitative, quantitative, and mixed methods articles on LLMs in patient care published between 2022 and 2023. From 4349 initial records, 89 studies across 29 medical specialties were included. Quality assessment was performed using the Mixed Methods Appraisal Tool 2018. A data-driven convergent synthesis approach was applied for thematic syntheses of LLM applications and limitations using free line-by-line coding in Dedoose. Results We show that most studies investigate Generative Pre-trained Transformers (GPT)-3.5 (53.2%, n = 66 of 124 different LLMs examined) and GPT-4 (26.6%, n = 33/124) in answering medical questions, followed by patient information generation, including medical text summarization or translation, and clinical documentation. Our analysis delineates two primary domains of LLM limitations: design and output. Design limitations include 6 second-order and 12 third-order codes, such as lack of medical domain optimization, data transparency, and accessibility issues, while output limitations include 9 second-order and 32 third-order codes, for example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, and bias. Conclusions This review systematically maps LLM applications and limitations in patient care, providing a foundational framework and taxonomy for their implementation and evaluation in healthcare settings. Plain Language Summary Large language models (LLMs) are computer programs that can generate human-like text. They promise to improve patient education and expand access to medical information by helping patients better understand health conditions and treatment options. However, more information is needed about how these tools are used in patient care and the challenges they present. In this review, researchers analyzed 89 studies from 2022 to 2023 covering 29 medical specialties. These studies explored ways LLMs are used: for example, answering patient questions, summarizing or translating medical texts, and supporting clinical paperwork. While these tools show potential, the review highlights limitations. Many LLMs are not optimized for medical use, lack transparency about data use, and can be difficult for some users to access. Additionally, the text they generate may sometimes be inaccurate, incomplete, or biased, raising safety concerns. Busch et al. discuss large language models in patient healthcare. This systematic review analyzes current literature for utilization of these models and limitations of use and implementation.

Journal Article

Share this book

Add to My Shelf

Generative Artificial Intelligence in Medical Education: Enhancing Critical Thinking or Undermining Cognitive Autonomy?

by Izquierdo-Condoy, Juan S , Busch, Felix , Ortiz-Prado, Esteban in Artificial intelligence , Cognition , Critical thinking

2025

Generative artificial intelligence (GenAI) enables the production of coherent and contextually relevant text by processing large-scale linguistic datasets. Tools such as ChatGPT, Gemini, Claude, and LLaMA are increasingly integrated into medical education, assisting students with a range of tasks, including clinical reasoning, literature review, scientific writing, and formative assessment. Although these tools offer significant advantages in terms of productivity, personalization, and cognitive support, their impact on critical thinking—a cornerstone of medical education—remains uncertain. The aim of this viewpoint paper is to critically assess the influence of GenAI on critical thinking within medical training, examining both its potential to enhance cognitive skills and the risks it poses to cognitive autonomy. Users have reported increased efficiency and improved linguistic output; however, concerns have also been raised regarding the risk of cognitive overreliance. Current evidence presents a mixed picture, indicating both improvements in learner engagement and potential drawbacks such as passivity or susceptibility to misinformation. Without curricular integration that prioritizes ethical use, prompt engineering, and critical evaluation, GenAI may compromise the cognitive autonomy of medical students. Conversely, when thoughtfully embedded into pedagogical frameworks, these tools can act as cognitive enhancers—supporting, rather than replacing, clinical reasoning. Medical education must adapt to ensure that future physicians engage with GenAI in a critical, ethical, and context-aware manner, especially in complex decision-making scenarios. This transformation demands not only technological fluency but also reflective practice and sustained oversight by faculty and academic institutions.

Journal Article

Share this book

Add to My Shelf

Correction: Integrating Text and Image Analysis: Exploring GPT-4V’s Capabilities in Advanced Radiological Applications Across Subspecialties

by Bressem, Keno K , Han, Tianyu , Makowski, Marcus R in and Addenda

2026

[This corrects the article DOI: 10.2196/54948.].

Journal Article

Share this book

Add to My Shelf

Integrating Text and Image Analysis: Exploring GPT-4V’s Capabilities in Advanced Radiological Applications Across Subspecialties

by Bressem, Keno K , Han, Tianyu , Makowski, Marcus R in Accuracy , Application programming interface , Clinical medicine

2024

Related Articles This is a corrected version. See correction statement in: https://www.jmir.org/2024/1/e64411This is a corrected version. See correction statement in: https://www.jmir.org/2026/1/e91415This study demonstrates that GPT-4V outperforms GPT-4 across radiology subspecialties in analyzing 207 cases with 1312 images from the Radiological Society of North America Case Collection.

Journal Article

Share this book

Add to My Shelf

Gender Segregation, Occupational Sorting, and Growth of Wage Disparities Between Women

by Busch, Felix in Change agents , Demography , Desegregation

2020

Average female wages in traditionally male occupations have steeply risen over the past couple of decades in Germany. This trend led to a new and substantial pay gap between women working in male-typed occupations and other women. I dissect the emergence of these wage disparities between women, using data from the German Socio-Economic Panel (1992–2015). Compositional change with respect to education is the main driver for growing inequality. Other factors are less influential but still relevant: marginal returns for several wage-related personal characteristics have grown faster in male-typed occupations. Net of individual-level heterogeneity, traditionally male occupations have also become more attractive because of rising returns to task-specific skills. Discrimination of women in typically male lines of work seems to have declined, too, which erased part of the wage penalty these women had previously experienced. In sum, I document changes in the occupational sorting behavior of women as well as shifts in occupation-level reward mechanisms that have had a profound impact on the state of inequality between working women.

Journal Article

Share this book

Add to My Shelf

Correction: Integrating Text and Image Analysis: Exploring GPT-4V’s Capabilities in Advanced Radiological Applications Across Subspecialties

by Bressem, Keno K , Han, Tianyu , Makowski, Marcus R in and Addenda

2024

[This corrects the article DOI: 10.2196/54948.].

Journal Article

Share this book

Add to My Shelf