Catalogue Search | MBRL

Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test

by Cerveri, Pietro , Mainardi, Luca , Moglia, Andrea in Artificial Intelligence , Chatbots , Comparative analysis

2024

Large language models (LLMs) have the intrinsic potential to acquire medical knowledge. Several studies assessing LLMs on medical examinations have been published. However, there is no reported evidence on tests related to robot-assisted surgery. The aims of this study were to perform the first systematic review of LLMs on medical examinations and to establish whether ChatGPT, GPT-4, and Bard can pass the Fundamentals of Robotic Surgery (FRS) didactic test. A literature search was performed on PubMed, Web of Science, Scopus, and arXiv following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach. A total of 45 studies were analyzed. GPT-4 passed several national qualifying examinations with questions in English, Chinese, and Japanese using zero-shot and few-shot learning. Med-PaLM 2 obtained similar scores on the United States Medical Licensing Examination with more refined prompt engineering techniques. Five different 2023 releases of ChatGPT, one of GPT-4, and one of Bard were tested on FRS. Seven attempts were performed with each release. The pass score was 79.5%. ChatGPT achieved a mean score of 64.6%, 65.6%, 75.0%, 78.9%, and 72.7% respectively from the first to the fifth tested release on FRS vs 91.5% of GPT-4 and 79.5% of Bard. GPT-4 outperformed ChatGPT and Bard in all corresponding attempts with a statistically significant difference for ChatGPT (p < 0.001), but not Bard (p = 0.002). Our findings agree with other studies included in this systematic review. We highlighted the potential and challenges of LLMs to transform the education of healthcare professionals in the different stages of learning, by assisting teachers in the preparation of teaching contents, and trainees in the acquisition of knowledge, up to becoming an assessment framework of leaners.

Journal Article

Share this book

Add to My Shelf

How to optimize the systematic review process using AI tools

by Wong, Stanley , Fabiano, Nicholas , Gupta, Arnav in Application programming interface , Artificial intelligence , Chatbots

2024

Systematic reviews are a cornerstone for synthesizing the available evidence on a given topic. They simultaneously allow for gaps in the literature to be identified and provide direction for future research. However, due to the ever‐increasing volume and complexity of the available literature, traditional methods for conducting systematic reviews are less efficient and more time‐consuming. Numerous artificial intelligence (AI) tools are being released with the potential to optimize efficiency in academic writing and assist with various stages of the systematic review process including developing and refining search strategies, screening titles and s for inclusion or exclusion criteria, extracting essential data from studies and summarizing findings. Therefore, in this article we provide an overview of the currently available tools and how they can be incorporated into the systematic review process to improve efficiency and quality of research synthesis. We emphasize that authors must report all AI tools that have been used at each stage to ensure replicability as part of reporting in methods.

Journal Article

Share this book

Add to My Shelf

Multimodal Large Language Models in Medical Imaging: Current State and Future Directions

by Kim, Namkug , Kyung, Sunggu , Seo, Jinyoung in Artificial Intelligence , Connectors , Diagnostic Imaging - methods

2025

Multimodal large language models (MLLMs) are emerging as powerful tools in medicine, particularly in radiology, with the potential to serve as trusted artificial intelligence (AI) partners for clinicians. In radiology, these models integrate large language models (LLMs) with diverse multimodal data sources by combining clinical information and text with radiologic images of various modalities, ranging from 2D chest X-rays to 3D CT/MRI. Methods for achieving this multimodal integration are rapidly evolving, and the high performance of freely available LLMs may further accelerate MLLM development. Current applications of MLLMs now span automatic generation of preliminary radiology report, visual question answering, and interactive diagnostic support. Despite these promising capabilities, several significant challenges hinder widespread clinical adoption. MLLMs require access to large-scale, high-quality multimodal datasets, which are scarce in the medical domain. Risks of hallucinated findings, lack of transparency in decision-making processes, and high computational demands further complicate implementation. This review summarizes the current capabilities and limitations of MLLMs in medicine-particularly in radiology-and outlines key directions for future research. Critical areas include incorporating region-grounded reasoning to link model outputs to specific image regions, developing robust foundation models pre-trained on large-scale medical datasets, and establishing strategies for the safe and effective integration of MLLMs into clinical practice.

Journal Article

Share this book

Add to My Shelf

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

by Ricci, Riccardo , Bazi, Yakoub , Al Rahhal, Mohamad Mahmoud in captioning , Data analysis , data collection

2024

In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and visual question answering (VQA). In particular, we introduce an improved version of the Large Language and Vision Assistant Model (LLaVA), specifically adapted for RS imagery through a low-rank adaptation approach. To evaluate the model performance, we create the RS-instructions dataset, a comprehensive benchmark dataset that integrates four diverse single-task datasets related to captioning and VQA. The experimental results confirm the model’s effectiveness, marking a step forward toward the development of efficient multi-task models for RS image analysis.

Journal Article

Share this book

Add to My Shelf

A Large Language Model Approach to Educational Survey Feedback Analysis

by Oh, YeaRim , Parker, Michael J. , Stone, Claire in Academic Achievement , Artificial Intelligence , Classification

2025

This paper assesses the potential for the large language models (LLMs) GPT-4 and GPT-3.5 to aid in deriving insight from education feedback surveys. Exploration of LLM use cases in education has focused on teaching and learning, with less exploration of capabilities in education feedback analysis. Survey analysis in education involves goals such as finding gaps in curricula or evaluating teachers, often requiring time-consuming manual processing of textual responses. LLMs have the potential to provide a flexible means of achieving these goals without specialized machine learning models or fine-tuning. We demonstrate a versatile approach to such goals by treating them as sequences of natural language processing (NLP) tasks including classification (multi-label, multi-class, and binary), extraction, thematic analysis, and sentiment analysis, each performed by LLM. We apply these workflows to a real-world dataset of 2500 end-of-course survey comments from biomedical science courses, and evaluate a zero-shot approach (i.e., requiring no examples or labeled training data) across all tasks, reflecting education settings, where labeled data is often scarce. By applying effective prompting practices, we achieve human-level performance on multiple tasks with GPT-4, enabling workflows necessary to achieve typical goals. We also show the potential of inspecting LLMs’ chain-of-thought (CoT) reasoning for providing insight that may foster confidence in practice. Moreover, this study features development of a versatile set of classification categories, suitable for various course types (online, hybrid, or in-person) and amenable to customization. Our results suggest that LLMs can be used to derive a range of insights from survey text.

Journal Article

Share this book

Add to My Shelf

Multi-modal large language models in radiology: principles, applications, and potential

by Rui, Wushuang , Zhao, Chen , Shen, Yiqiu in Abdomen , Accuracy , Artificial Intelligence

2025

Large language models (LLMs) and multi-modal large language models (MLLMs) represent the cutting-edge in artificial intelligence. This review provides a comprehensive overview of their capabilities and potential impact on radiology. Unlike most existing literature reviews focusing solely on LLMs, this work examines both LLMs and MLLMs, highlighting their potential to support radiology workflows such as report generation, image interpretation, EHR summarization, differential diagnosis generation, and patient education. By streamlining these tasks, LLMs and MLLMs could reduce radiologist workload, improve diagnostic accuracy, support interdisciplinary collaboration, and ultimately enhance patient care. We also discuss key limitations, such as the limited capacity of current MLLMs to interpret 3D medical images and to integrate information from both image and text data, as well as the lack of effective evaluation methods. Ongoing efforts to address these challenges are introduced.

Journal Article

Share this book

Add to My Shelf

A Survey on Multimodal Large Language Models in Radiology for Report Generation and Visual Question Answering

by Yi, Ziruo , Albert, Mark V. , Xiao, Ting in Artificial intelligence , Automation , Chatbots

2025

Large language models (LLMs) and large vision models (LVMs) have driven significant advancements in natural language processing (NLP) and computer vision (CV), establishing a foundation for multimodal large language models (MLLMs) to integrate diverse data types in real-world applications. This survey explores the evolution of MLLMs in radiology, focusing on radiology report generation (RRG) and radiology visual question answering (RVQA), where MLLMs leverage the combined capabilities of LLMs and LVMs to improve clinical efficiency. We begin by tracing the history of radiology and the development of MLLMs, followed by an overview of MLLM applications in RRG and RVQA, detailing core datasets, evaluation metrics, and leading MLLMs that demonstrate their potential in generating radiology reports and answering image-based questions. We then discuss the challenges MLLMs face in radiology, including dataset scarcity, data privacy and security, and issues within MLLMs such as bias, toxicity, hallucinations, catastrophic forgetting, and limitations in traditional evaluation metrics. Finally, this paper proposes future research directions to address these challenges, aiming to help AI researchers and radiologists overcome these obstacles and advance the study of MLLMs in radiology.

Journal Article

Share this book

Add to My Shelf

Towards AI-Powered Applications: The Development of a Personalised LLM for HRI and HCI

by Zaraki, Abolfazl , Ghamati, Khashayar , Banitalebi Dehkordi, Maryam in Accuracy , Adaptation , adaptive AI systems

2025

In this work, we propose a novel Personalised Large Language Model (PLLM) agent, designed to advance the integration and adaptation of large language models within the field of human–robot interaction and human–computer interaction. While research in this field has primarily focused on the technical deployment of LLMs, critical academic challenges persist regarding their ability to adapt dynamically to user-specific contexts and evolving environments. To address this fundamental gap, we present a methodology for personalising LLMs using domain-specific data and tests using the NeuroSense EEG dataset. By enabling the personalised data interpretation, our approach promotes conventional implementation strategies, contributing to ongoing research on AI adaptability and user-centric application. Furthermore, this study engages with the broader ethical dimensions of PLLM, critically discussing issues of generalisability and data privacy concerns in AI research. Our findings demonstrate the usability of using the PLLM in a human–robot interaction scenario in real-world settings, highlighting its applicability across diverse domains, including healthcare, education, and assistive technologies. We believe the proposed system represents a significant step towards AI adaptability and personalisation, offering substantial benefits across a range of fields.

Journal Article

Share this book

Add to My Shelf

Large language models for clinical artificial intelligence in healthcare a systematic review

by Saleh, Amro , Ghnemat, Rawan in Artificial Intelligence , Bias , Clinical decision making

2026

Large Language Models (LLMs) have demonstrated the capacity to process, reason, and generate extensive volumes of data, providing a novel paradigm for integrating generative artificial intelligence (GenAI) into the medical field. Multimodal LLMs (MLLMs) extend these capabilities by incorporating diverse data modalities into unified representations, including genomics, medical imaging, and clinical text. This systematic review synthesizes advancements from 246 records identified between January 2020 and September 2025, of which 90 studies were included after full-text screening, to address critical gaps in understanding the clinical role of LLM and MLLM in healthcare. We trace the evolution from classical natural language processing (NLP) approaches to modern transformer-based architectures, summarize their technical foundations, and examine their construction, evaluation, and deployment in medical workflows. Key contributions include highlighting multimodal integration (e.g., imaging-genomics-text fusion), ethical governance frameworks, and validated domain-specific fine-tuning in clinical settings. We also highlight advances in Prompting, Retrieval-Augmented Generation (RAG), and Multi-Agent (agentic) workflows, providing a critical assessment of their benefits and limitations. In addition, we analyze challenges such as hallucinations, bias, and privacy risks, while providing actionable guidelines for clinicians, developers, and policymakers to improve regulatory compliance. By consolidating the nomenclature and systematically evaluating GenAI in medicine, this review offers evidence-based recommendations and directions for the safe and effective integration of generative AI into healthcare. The findings are intended as an authoritative guide for researchers and practitioners, bridge principles, clinical applications, and policy considerations for LLM and MLLM.

Journal Article

Share this book

Add to My Shelf

Structure-Aware Lightweight Document-Level Event Extraction via Code-Based Large Language Models

by Deng, Zhongchen , Zhao, Jianbin , Liu, Yaduo in Alignment , Bias , Bridges

2026

Document-level Event Extraction (DEE) requires identifying complex event records and arguments dispersed across unstructured texts. However, applying general Large Language Models (LLMs) to DEE is intrinsically hindered by their lack of inductive bias for rigid structural constraints, often leading to schema violations and suboptimal performance in complex structural prediction tasks. To address this, we propose the S tructure-Aware Lightweight DEE, termed SALE, which leverages the structural reasoning potential of Code-Based LLMs (Code-LLMs) as a favorable inductive preference. We leverage the natural isomorphism between event schemas and programming object definitions, formulating event extraction as a Python 3.9 class instantiation task to bridge the gap between semantic understanding and structural adherence. Specifically, SALE employs a novel two-stage training paradigm: First, a Structure-Aware Fine-tuning stage injects general structural knowledge via diverse code-style instruction tasks derived from broad Information Extraction (IE) datasets; second, an Event Extraction Alignment stage utilizes a reward-based alignment loss—optimized via policy gradient—to adapt this capability to document-level intricacies. The effectiveness of SALE stems from the synergy between its structure-aware prompting and the specialized alignment stage built on a code-oriented backbone. Extensive experiments on established news-domain benchmarks (RAMS and WikiEvents) demonstrate that our approach significantly outperforms representative supervised and general LLM baselines in cross-task zero-shot and few-shot transfer settings (e.g., surpassing supervised baselines by over 7% in F1 score). Furthermore, SALE maintains a highly efficient inference profile and parameter-efficient footprint, offering a practical and scalable solution for vertical domain applications.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter