Catalogue Search | MBRL

Multimodal Large Language Models in Medical Imaging: Current State and Future Directions

by Kim, Namkug , Kyung, Sunggu , Seo, Jinyoung in Artificial Intelligence , Connectors , Diagnostic Imaging - methods

2025

Multimodal large language models (MLLMs) are emerging as powerful tools in medicine, particularly in radiology, with the potential to serve as trusted artificial intelligence (AI) partners for clinicians. In radiology, these models integrate large language models (LLMs) with diverse multimodal data sources by combining clinical information and text with radiologic images of various modalities, ranging from 2D chest X-rays to 3D CT/MRI. Methods for achieving this multimodal integration are rapidly evolving, and the high performance of freely available LLMs may further accelerate MLLM development. Current applications of MLLMs now span automatic generation of preliminary radiology report, visual question answering, and interactive diagnostic support. Despite these promising capabilities, several significant challenges hinder widespread clinical adoption. MLLMs require access to large-scale, high-quality multimodal datasets, which are scarce in the medical domain. Risks of hallucinated findings, lack of transparency in decision-making processes, and high computational demands further complicate implementation. This review summarizes the current capabilities and limitations of MLLMs in medicine-particularly in radiology-and outlines key directions for future research. Critical areas include incorporating region-grounded reasoning to link model outputs to specific image regions, developing robust foundation models pre-trained on large-scale medical datasets, and establishing strategies for the safe and effective integration of MLLMs into clinical practice.

Journal Article

Share this book

Add to My Shelf

Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test

by Cerveri, Pietro , Mainardi, Luca , Moglia, Andrea in Artificial Intelligence , Chatbots , Comparative analysis

2024

Large language models (LLMs) have the intrinsic potential to acquire medical knowledge. Several studies assessing LLMs on medical examinations have been published. However, there is no reported evidence on tests related to robot-assisted surgery. The aims of this study were to perform the first systematic review of LLMs on medical examinations and to establish whether ChatGPT, GPT-4, and Bard can pass the Fundamentals of Robotic Surgery (FRS) didactic test. A literature search was performed on PubMed, Web of Science, Scopus, and arXiv following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach. A total of 45 studies were analyzed. GPT-4 passed several national qualifying examinations with questions in English, Chinese, and Japanese using zero-shot and few-shot learning. Med-PaLM 2 obtained similar scores on the United States Medical Licensing Examination with more refined prompt engineering techniques. Five different 2023 releases of ChatGPT, one of GPT-4, and one of Bard were tested on FRS. Seven attempts were performed with each release. The pass score was 79.5%. ChatGPT achieved a mean score of 64.6%, 65.6%, 75.0%, 78.9%, and 72.7% respectively from the first to the fifth tested release on FRS vs 91.5% of GPT-4 and 79.5% of Bard. GPT-4 outperformed ChatGPT and Bard in all corresponding attempts with a statistically significant difference for ChatGPT (p < 0.001), but not Bard (p = 0.002). Our findings agree with other studies included in this systematic review. We highlighted the potential and challenges of LLMs to transform the education of healthcare professionals in the different stages of learning, by assisting teachers in the preparation of teaching contents, and trainees in the acquisition of knowledge, up to becoming an assessment framework of leaners.

Journal Article

Share this book

Add to My Shelf

Towards AI-Powered Applications: The Development of a Personalised LLM for HRI and HCI

by Zaraki, Abolfazl , Ghamati, Khashayar , Banitalebi Dehkordi, Maryam in Accuracy , Adaptation , adaptive AI systems

2025

In this work, we propose a novel Personalised Large Language Model (PLLM) agent, designed to advance the integration and adaptation of large language models within the field of human–robot interaction and human–computer interaction. While research in this field has primarily focused on the technical deployment of LLMs, critical academic challenges persist regarding their ability to adapt dynamically to user-specific contexts and evolving environments. To address this fundamental gap, we present a methodology for personalising LLMs using domain-specific data and tests using the NeuroSense EEG dataset. By enabling the personalised data interpretation, our approach promotes conventional implementation strategies, contributing to ongoing research on AI adaptability and user-centric application. Furthermore, this study engages with the broader ethical dimensions of PLLM, critically discussing issues of generalisability and data privacy concerns in AI research. Our findings demonstrate the usability of using the PLLM in a human–robot interaction scenario in real-world settings, highlighting its applicability across diverse domains, including healthcare, education, and assistive technologies. We believe the proposed system represents a significant step towards AI adaptability and personalisation, offering substantial benefits across a range of fields.

Journal Article

Share this book

Add to My Shelf

How to optimize the systematic review process using AI tools

by Wong, Stanley , Fabiano, Nicholas , Gupta, Arnav in Application programming interface , Artificial intelligence , Chatbots

2024

Systematic reviews are a cornerstone for synthesizing the available evidence on a given topic. They simultaneously allow for gaps in the literature to be identified and provide direction for future research. However, due to the ever‐increasing volume and complexity of the available literature, traditional methods for conducting systematic reviews are less efficient and more time‐consuming. Numerous artificial intelligence (AI) tools are being released with the potential to optimize efficiency in academic writing and assist with various stages of the systematic review process including developing and refining search strategies, screening titles and s for inclusion or exclusion criteria, extracting essential data from studies and summarizing findings. Therefore, in this article we provide an overview of the currently available tools and how they can be incorporated into the systematic review process to improve efficiency and quality of research synthesis. We emphasize that authors must report all AI tools that have been used at each stage to ensure replicability as part of reporting in methods.

Journal Article

Share this book

Add to My Shelf

Enhancing the Accuracy of Human Phenotype Ontology Identification: Comparative Evaluation of Multimodal Large Language Models

by Zhong, Wei , Liu, Yan , Yan, YouSheng in Accuracy , AI Language Models in Health Care , Artificial Intelligence

2025

Identifying Human Phenotype Ontology (HPO) terms is crucial for diagnosing and managing rare diseases. However, clinicians, especially junior physicians, often face challenges due to the complexity of describing patient phenotypes accurately. Traditional manual search methods using HPO databases are time-consuming and prone to errors. The aim of the study is to investigate whether the use of multimodal large language models (MLLMs) can improve the accuracy of junior physicians in identifying HPO terms from patient images related to rare diseases. In total, 20 junior physicians from 10 specialties participated. Each physician evaluated 27 patient images sourced from publicly available literature, with phenotypes relevant to rare diseases listed in the Chinese Rare Disease Catalogue. The study was divided into 2 groups: the manual search group relied on the Chinese Human Phenotype Ontology website, while the MLLM-assisted group used an electronic questionnaire that included HPO terms preidentified by ChatGPT-4o as prompts, followed by a search using the Chinese Human Phenotype Ontology. The primary outcome was the accuracy of HPO identification, defined as the proportion of correctly identified HPO terms compared to a standard set determined by an expert panel. Additionally, the accuracy of outputs from ChatGPT-4o and 2 open-source MLLMs (Llama3.2:11b and Llama3.2:90b) was evaluated using the same criteria, with hallucinations for each model documented separately. Furthermore, participating physicians completed an additional electronic questionnaire regarding their rare disease background to identify factors affecting their ability to accurately describe patient images using standardized HPO terms. A total of 270 descriptions were evaluated per group. The MLLM-assisted group achieved a significantly higher accuracy rate of 67.4% (182/270) compared to 20.4% (55/270) in the manual group (relative risk 3.31, 95% CI 2.58-4.25; P<.001). The MLLM-assisted group demonstrated consistent performance across departments, whereas the manual group exhibited greater variability. Among standalone MLLMs, ChatGPT-4o achieved an accuracy of 48% (13/27), while the open-source models Llama3.2:11b and Llama3.2:90b achieved 15% (4/27) and 18% (5/27), respectively. However, MLLMs exhibited a high hallucination rate, frequently generating HPO terms with incorrect IDs or entirely fabricated content. Specifically, ChatGPT-4o, Llama3.2:11b, and Llama3.2:90b generated incorrect IDs in 57.3% (67/117), 98% (62/63), and 82% (46/56) of cases, respectively, and fabricated terms in 34.2% (40/117), 41% (26/63), and 32% (18/56) of cases, respectively. Additionally, a survey on the rare disease knowledge of junior physicians suggests that participation in rare disease and genetic disease training may enhance the performance of some physicians. The integration of MLLMs into clinical workflows significantly enhances the accuracy of HPO identification by junior physicians, offering promising potential to improve the diagnosis of rare diseases and standardize phenotype descriptions in medical research. However, the notable hallucination rate observed in MLLMs underscores the necessity for further refinement and rigorous validation before widespread adoption in clinical practice.

Journal Article

Share this book

Add to My Shelf

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

by Ricci, Riccardo , Bazi, Yakoub , Al Rahhal, Mohamad Mahmoud in captioning , Data analysis , data collection

2024

In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and visual question answering (VQA). In particular, we introduce an improved version of the Large Language and Vision Assistant Model (LLaVA), specifically adapted for RS imagery through a low-rank adaptation approach. To evaluate the model performance, we create the RS-instructions dataset, a comprehensive benchmark dataset that integrates four diverse single-task datasets related to captioning and VQA. The experimental results confirm the model’s effectiveness, marking a step forward toward the development of efficient multi-task models for RS image analysis.

Journal Article

Share this book

Add to My Shelf

HSG-ON: Hierarchical Scene Graph-Based Object Navigation

by Kwon, Seokjoon , Jang, Hee-Deok , Chang, Dong Eui in Cognition & reasoning , Communication , embodied ai

2026

For a robot to operate effectively in human-centric environments, finding objects based on natural language is essential. Zero-shot object goal navigation is a significant challenge where robots must find unseen objects in new environments without prior knowledge. Existing methods often struggle with strategic exploration, leading to inefficient searches. In this study, we propose a hierarchical scene graph-based navigation system to address this challenge. Our core innovations are twofold: dynamically constructing a three-layer “room–workspace–object” hierarchical scene graph without manually pre-tuned parameters, and introducing a novel workspace-based searching strategy. By evaluating semantic relevance at the workspace level rather than the object level, the robot infers probable containers for a target, enabling focused, human-like exploration. Simulation results demonstrate that our system significantly outperforms existing state-of-the-art methods. Quantitatively, our approach improves the Success Rate (SR) by 26.8% (SR 0.4859) under distance-constrained settings and by 20.2% (SR 0.7360) under unconstrained settings, compared to the best baselines. These results validate that our framework offers a robust solution for zero-shot object goal navigation.

Journal Article

Share this book

Add to My Shelf

KPLLM-STE: Knowledge-enhanced and prompt-aware large language models for short-text expansion

by Lin, Ronghua , Zhang, Qi , Li, Weisheng in Computer Science , Database Management , Graph matching

2025

Short-text Expansion plays a significant role in enhancing the quality, diversity, and practicality of Short-text, helping users to more comprehensively understand the content expressed in the Short-text. In this paper, we aim to enhance the capabilities of large language models in short-text expansion through knowledge graphs and propose the knowledge-enhanced and prompt-aware large language models. First, we construct a multi-dimensional knowledge graph that includes semantics, sentiment, and topics based on large language models in domain-specific text. Second, we propose a method for mining prompts of Short-text across the three dimensions of semantics, sentiment, and topics based on the constructed multi-dimensional knowledge graph. Finally, we match triplets in the constructed knowledge graph based on the generated prompts in the three dimensions. The matched triplets is then integrated by the large language model to generate a expansion of given short-text. Experiments are conducted using three large language models on two public datasets, and the results indicate that our model shows improvements across multiple metrics for text similarity, readability, and coherence compared to the short-text expansion generated by the baseline large language models and existing methods.

Journal Article

Share this book

Add to My Shelf

Assessing the Utility of Multimodal Large Language Models (GPT-4 Vision and Large Language and Vision Assistant) in Identifying Melanoma Across Different Skin Tones

by Akrout, Mohamed , Oakley, Amanda , Abid, Latif in Artificial intelligence , Asymmetry , Decision making

2024

The large language models GPT-4 Vision and Large Language and Vision Assistant are capable of understanding and accurately differentiating between benign lesions and melanoma, indicating potential incorporation into dermatologic care, medical research, and education.

Journal Article

Share this book

Add to My Shelf

Perceptions, Usage, and Educational Impact of ChatGPT Among Medical Students in Germany: Cross-Sectional Mixed Methods Survey

by Knitza, Johannes , Hirsch, Martin Christian , Fußhöller, Anna in Adult , Artificial intelligence , Artificial Intelligence (AI) in Medical Education

2025

Large language models such as ChatGPT offer significant opportunities for medical education. However, empirical data on actual usage patterns, perceived benefits, and limitations among medical students remain limited. This study aimed to assess how medical students in Germany use ChatGPT, their perceptions of its educational value, and the challenges and concerns associated with its use. A cross-sectional 17-item online survey was conducted between May and August 2024 among medical students from Philipps University Marburg, Germany. A mixed methods approach was applied, combining descriptive and inferential statistical analysis with qualitative content analysis of open-ended responses. A total of 84 fully completed surveys were included in the analysis (response rate: 26.7%; 315 surveys started). Overall, 76.2% (64/84) of the participants reported having used ChatGPT for medical education, with significantly higher usage during exam periods (P=.003). Preclinical students reported higher overall usage than clinical students (P=.02). ChatGPT was primarily used for summarizing information by 60.7% (51/84) of students, for literature research by 57.7% (49/84), and for clarifying concepts by 47.1% (40/84). A total of 70.2% (59/84) felt that it helped them save time, and 51.2% (43/84) reported an improved understanding of content. In contrast, only 31% (26/84) saw benefits for applying knowledge and 15.5% (13/84) for long-term knowledge retention. Qualitative responses highlighted clear benefits such as time savings and support in exam preparation, while also pointing to potential applications in clinical documentation and expressing concerns about misinformation and source transparency. However, 73.3% (55/75) expressed concerns about misinformation, and 72.6% (61/84) reported lacking confidence in their artificial intelligence (AI)-related skills. Only 41.7% (35/84) stated that they trust ChatGPT's outputs. Students who used the tool more frequently also reported higher levels of trust in ChatGPT's outputs (r=0.374, P<.001). Over 70% of respondents indicated a strong desire for increased integration of AI-related education and practical applications within the medical curriculum. ChatGPT was already widely used among medical students, especially in exam preparation and the early stages of training. Students valued its efficiency and support for understanding complex material, but its long-term influence on learning is limited. Concerns about reliability, source transparency, and data privacy remain, and AI skills played a key role in shaping usage. These findings underscore the need to integrate structured, practice-oriented AI education into medical training to support critical, informed, and ethical use of large language models.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter