Catalogue Search | MBRL

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook

by Sheikh, Javaid , Renault, Max-Antoine , Damseh, Rafat in Application , Artificial intelligence , Chatbots

2024

In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data–driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems.

Journal Article

Share this book

Add to My Shelf

Examining Science Education in ChatGPT: An Exploratory Study of Generative Artificial Intelligence

by Cooper, Grant in Affordances , Artificial Intelligence , Catalysts

2023

The advent of generative artificial intelligence (AI) offers transformative potential in the field of education. The study explores three main areas: (1) How did ChatGPT answer questions related to science education? (2) What are some ways educators could utilise ChatGPT in their science pedagogy? and (3) How has ChatGPT been utilised in this study, and what are my reflections about its use as a research tool? This exploratory research applies a self-study methodology to investigate the technology. Impressively, ChatGPT’s output often aligned with key themes in the research. However, as it currently stands, ChatGPT runs the risk of positioning itself as the ultimate epistemic authority, where a single truth is assumed without a proper grounding in evidence or presented with sufficient qualifications. Key ethical concerns associated with AI include its potential environmental impact, issues related to content moderation, and the risk of copyright infringement. It is important for educators to model responsible use of ChatGPT, prioritise critical thinking, and be clear about expectations. ChatGPT is likely to be a useful tool for educators designing science units, rubrics, and quizzes. Educators should critically evaluate any AI-generated resource and adapt it to their specific teaching contexts. ChatGPT was used as a research tool for assistance with editing and to experiment with making the research narrative clearer. The intention of the paper is to act as a catalyst for a broader conversation about the use of generative AI in science education.

Journal Article

Share this book

Add to My Shelf

Explainable artificial intelligence: a comprehensive review

in Algorithms , Artificial intelligence , Competitive advantage

2022

Thanks to the exponential growth in computing power and vast amounts of data, artificial intelligence (AI) has witnessed remarkable developments in recent years, enabling it to be ubiquitously adopted in our daily lives. Even though AI-powered systems have brought competitive advantages, the black-box nature makes them lack transparency and prevents them from explaining their decisions. This issue has motivated the introduction of explainable artificial intelligence (XAI), which promotes AI algorithms that can show their internal process and explain how they made decisions. The number of XAI research has increased significantly in recent years, but there lacks a unified and comprehensive review of the latest XAI progress. This review aims to bridge the gap by discovering the critical perspectives of the rapidly growing body of research associated with XAI. After offering the readers a solid XAI background, we analyze and review various XAI methods, which are grouped into (i) pre-modeling explainability, (ii) interpretable model, and (iii) post-modeling explainability. We also pay attention to the current methods that dedicate to interpret and analyze deep learning methods. In addition, we systematically discuss various XAI challenges, such as the trade-off between the performance and the explainability, evaluation methods, security, and policy. Finally, we show the standard approaches that are leveraged to deal with the mentioned challenges.

Journal Article

Share this book

Add to My Shelf

In Conversation with Artificial Intelligence: Aligning language Models with Human Values

by Kasirzadeh, Atoosa , Gabriel, Iason in Agents , Agents (artificial intelligence) , Artificial intelligence

2023

Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.

Journal Article

Share this book

Add to My Shelf

Learning to Prompt for Vision-Language Models

by Yang, Jingkang , Zhou, Kaiyang , Liu, Ziwei in Context , Domains , Language

2022

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming—one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP), we propose Context Optimization (CoOp), a simple approach specifically for adapting CLIP-like vision-language models for downstream image recognition. Concretely, CoOp models a prompt’s context words with learnable vectors while the entire pre-trained parameters are kept fixed. To handle different image recognition tasks, we provide two implementations of CoOp: unified context and class-specific context. Through extensive experiments on 11 datasets, we demonstrate that CoOp requires as few as one or two shots to beat hand-crafted prompts with a decent margin and is able to gain significant improvements over prompt engineering with more shots, e.g., with 16 shots the average gain is around 15% (with the highest reaching over 45%). Despite being a learning-based approach, CoOp achieves superb domain generalization performance compared with the zero-shot model using hand-crafted prompts.

Journal Article

Share this book

Add to My Shelf

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

by Ma, Teli , Zhang, Yongfeng , Qiao, Yu in Ablation , Adapters , Classification

2024

Large-scale contrastive vision-language pretraining has shown significant progress in visual representation learning. Unlike traditional visual systems trained by a fixed set of discrete labels, a new paradigm was introduced in Radford et al. (International conference on machine learning, PMLR, 2021) to directly learn to align images with raw texts in an open-vocabulary setting. On downstream tasks, a carefully chosen text prompt is employed to make zero-shot predictions. To avoid non-trivial prompt engineering, context optimization (Zhou et al. in Int J Comput Vis 130(9):2337–2348, 2022) has been proposed to learn continuous vectors as task-specific prompts with few-shot training examples. In this paper, we show that there is an alternative path to achieve better vision-language models other than prompt tuning. While prompt tuning is for the textual inputs, we propose CLIP-Adapter to conduct fine-tuning with feature adapters on either visual or language branch. Specifically, CLIP-Adapter adopts an additional bottleneck layer to learn new features and performs residual-style feature blending with the original pretrained features. As a consequence, CLIP-Adapter is able to outperform context optimization while maintaining a simple design. Experiments and extensive ablation studies on various visual classification tasks demonstrate the effectiveness of our approach.

Journal Article

Share this book

Add to My Shelf

LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation

by Dao, Son Duy , Cai, Jianfei , Shi, Hengcan in Artificial Intelligence , Computer Imaging , Computer Science

2025

Open-vocabulary (OV) semantic segmentation has attracted increasing attention in recent years, which aims to recognize objects in an open class set for real-world applications. While prior OV semantic segmentation approaches have relied on additional semantic knowledge derived from vision-language (VL) pre-training, such as the popular CLIP model, this paper introduces a novel paradigm by harnessing the unprecedented capabilities of large language models (LLMs). Inspired by recent breakthroughs in LLMs that provide a richer knowledge base compared to traditional vision-language pre-training, our proposed methodology capitalizes on the vast knowledge embedded within LLMs for OV semantic segmentation. Particularly, we partition LLM knowledge into object, attribute, and relation priors, and propose three novel attention modules-semantic, scaled visual, and relation attentions, to utilize the LLM priors. Extensive experiments are conducted on common benchmarks including ADE20K (847 classes) and Pascal Context (459 classes). The results show that our model outperforms previous state-of-the-art (SoTA) methods by up to 7.2% absolute. Moreover, unlike previous VL-pre-training-based works, our method can even predict OV segmentation results without target candidate classes.

Journal Article

Share this book

Add to My Shelf

AI-generated feedback on writing: insights into efficacy and ENL student preference

by Barrett, Alex , Escalante, Juan , Pack, Austin in Artificial intelligence , Chatbots , College English

2023

The question of how generative AI tools, such as large language models and chatbots, can be leveraged ethically and effectively in education is ongoing. Given the critical role that writing plays in learning and assessment within educational institutions, it is of growing importance for educators to make thoughtful and informed decisions as to how and in what capacity generative AI tools should be leveraged to assist in the development of students’ writing skills. This paper reports on two longitudinal studies. Study 1 examined learning outcomes of 48 university English as a new language (ENL) learners in a six-week long repeated measures quasi experimental design where the experimental group received writing feedback generated from ChatGPT (GPT-4) and the control group received feedback from their human tutor. Study 2 analyzed the perceptions of a different group of 43 ENLs who received feedback from both ChatGPT and their tutor. Results of study 1 showed no difference in learning outcomes between the two groups. Study 2 results revealed a near even split in preference for AI-generated or human-generated feedback, with clear advantages to both forms of feedback apparent from the data. The main implication of these studies is that the use of AI-generated feedback can likely be incorporated into ENL essay evaluation without affecting learning outcomes, although we recommend a blended approach that utilizes the strengths of both forms of feedback. The main contribution of this paper is in addressing generative AI as an automatic essay evaluator while incorporating learner perspectives.

Journal Article

Share this book

Add to My Shelf

Large language models (LLMs): survey, technical frameworks, and future challenges

by Kumar, Pranjal in Artificial Intelligence , Automatic text generation , Biomedicine

2024

Artificial intelligence (AI) has significantly impacted various fields. Large language models (LLMs) like GPT-4, BARD, PaLM, Megatron-Turing NLG, Jurassic-1 Jumbo etc., have contributed to our understanding and application of AI in these domains, along with natural language processing (NLP) techniques. This work provides a comprehensive overview of LLMs in the context of language modeling, word embeddings, and deep learning. It examines the application of LLMs in diverse fields including text generation, vision-language models, personalized learning, biomedicine, and code generation. The paper offers a detailed introduction and background on LLMs, facilitating a clear understanding of their fundamental ideas and concepts. Key language modeling architectures are also discussed, alongside a survey of recent works employing LLM methods for various downstream tasks across different domains. Additionally, it assesses the limitations of current approaches and highlights the need for new methodologies and potential directions for significant advancements in this field.

Journal Article

Share this book

Add to My Shelf

The rise and potential of large language model based agents: a survey

by Chen, Wenxiang , Zheng, Rui , Zhang, Qi in Agents (artificial intelligence) , Algorithms , Artificial intelligence

2025

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds human intelligence. AI agents, which are artificial entities capable of sensing the environment, making decisions, and taking actions, are seen as a means to achieve this goal. Extensive efforts have been made to develop AI agents, with a primary focus on refining algorithms or training strategies to enhance specific skills or particular task performance. The field, however, lacks a sufficiently general and powerful model to serve as a foundation for building general agents adaptable to diverse scenarios. With their versatile capabilities, large language models (LLMs) pave a promising path for the development of general AI agents, and substantial progress has been made in the realm of LLM-based agents. In this article, we conduct a comprehensive survey on LLM-based agents, covering their construction frameworks, application scenarios, and the exploration of societies built upon LLM-based agents. We also conclude some potential future directions and open problems in this flourishing field.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter