Catalogue Search | MBRL

Ethical implications of using general-purpose LLMs in clinical settings: a comparative analysis of prompt engineering strategies and their impact on patient safety

by Esmaeilzadeh, Pouyan in Accountability , Accuracy , Artificial intelligence

2025

Background The rapid integration of large language models (LLMs) into healthcare raises critical ethical concerns regarding patient safety, reliability, transparency, and equitable care delivery. Despite not being trained explicitly on medical data, individuals increasingly use general-purpose LLMs to address medical questions and clinical scenarios. While prompt engineering can optimize LLM performance, its ethical implications for clinical decision-making remain underexplored. This study aimed to evaluate the ethical dimensions of prompt engineering strategies in the clinical applications of LLMs, focusing on safety, bias, transparency, and their implications for the responsible implementation of AI in healthcare. Methods We conducted an ethics-focused analysis of three advanced and reasoning-capable LLMs (OpenAI O3, Claude Sonnet 4, Google Gemini 2.5 Pro) across six prompt engineering strategies and five clinical scenarios of varying ethical complexity. Six expert clinicians evaluated 90 responses using domains that included diagnostic accuracy, safety assessment, communication, empathy, and ethical reasoning. We specifically analyzed safety incidents, bias patterns, and transparency of reasoning processes. Results Significant ethical concerns emerged across all models and scenarios. Critical safety issues occurred in 12.2% of responses, with concentration in complex ethical scenarios (Level 5: 23.1% vs. Level 1: 2.3%, p < 0.001). Meta-cognitive prompting demonstrated superior ethical reasoning (mean ethics score: 78.3 ± 9.1), while safety-first prompting reduced safety incidents by 45% compared to zero-shot approaches (8.9% vs. 16.2%). However, all models showed concerning deficits in communication empathy (mean 54% of maximum) and exhibited potential bias in complex multi-cultural scenarios. Transparency varied significantly by prompt strategy, with meta-cognitive approaches providing the clearest reasoning pathways (4.2 vs. 1.8 explicit reasoning steps), which are essential for clinical accountability. The study highlighted critical gaps in ethical decision-making transparency, with meta-cognitive approaches providing 4.2 explicit reasoning steps compared to 1.8 in zero-shot methods ( p < 0.001). Bias patterns disproportionately affected vulnerable populations, with systematic underestimation of treatment appropriateness in elderly patients and inadequate cultural considerations in end-of-life scenarios. Conclusions Current clinical applications of general-purpose LLMs present substantial ethical challenges requiring urgent attention. While structured prompt engineering demonstrated measurable improvements in some domains, with meta-cognitive approaches showing 13.0% performance gains and safety-first prompting reducing critical incidents by 45%, substantial limitations persist across all strategies. Even optimized approaches achieved inadequate performance in communication and empathy (≤ 54% of maximum), retained residual bias patterns (11.7% in safety-first conditions), and exhibited concerning safety deficits, indicating that current prompt engineering methods provide only marginal improvements, which are insufficient for reliable clinical deployment. These findings highlight significant ethical challenges that necessitate further investigation into the development of appropriate guidelines and regulatory frameworks for the clinical use of general-purpose AI models.

Journal Article

Share this book

Add to My Shelf

Enhancing responses from large language models with role-playing prompts: a comparative study on answering frequently asked questions about total knee arthroplasty

by Chen, Yi-Chen , Hu, Chih-Chien , Sheu, Huan in Acceptability , Accuracy , Arthroplasty (knee)

2025

Background The application of artificial intelligence (AI) in medical education and patient interaction is rapidly growing. Large language models (LLMs) such as GPT-3.5, GPT-4, Google Gemini, and Claude 3 Opus have shown potential in providing relevant medical information. This study aims to evaluate and compare the performance of these LLMs in answering frequently asked questions (FAQs) about Total Knee Arthroplasty (TKA), with a specific focus on the impact of role-playing prompts. Methods Four leading LLMs—GPT-3.5, GPT-4, Google Gemini, and Claude 3 Opus—were evaluated using ten standardized patient inquiries related to TKA. Each model produced two distinct responses per question: one generated under zero-shot prompting (question-only), and one under role-playing prompting (instructed to simulate an experienced orthopaedic surgeon). Four orthopaedic surgeons evaluated responses for accuracy and comprehensiveness on a 5-point Likert scale, along with a binary measure for acceptability. Statistical analyses (Wilcoxon rank sum and Chi-squared tests; P < 0.05) were conducted to compare model performance. Results ChatGPT-4 with role-playing prompts achieved the highest scores for accuracy (3.73), comprehensiveness (4.05), and acceptability (77.5%), followed closely by ChatGPT-3.5 with role-playing prompts (3.70, 3.85, 72.5%, respectively). Google Gemini and Claude 3 Opus demonstrated lower performance across all metrics. In between-model comparisons based on zero-shot prompting, ChatGPT-4 achieved significantly higher scores of both accuracy and comprehensiveness relative to Google Gemini ( P = 0.031 and P = 0.009, respectively) and Claude 3 Opus ( P = 0.019 and P = 0.002), and demonstrated higher acceptability than Claude 3 Opus ( P = 0.006). Within-model comparisons showed role-playing significantly improved all metrics for ChatGPT-3.5 ( P < 0.05) and acceptability for ChatGPT-4 ( P = 0.033). No significant prompting effects were observed for Gemini or Claude. Conclusions This study demonstrates that role-playing prompts significantly enhance the performance of LLMs, particularly for ChatGPT-3.5 and ChatGPT-4, in answering FAQs related to TKA. ChatGPT-4, with role-playing prompts, showed superior performance in terms of accuracy, comprehensiveness, and acceptability. Despite occasional inaccuracies, LLMs hold promise for improving patient education and clinical decision-making in orthopaedic practice. Clinical trial number Not applicable.

Journal Article

Share this book

Add to My Shelf

Exploring doctors’ perspectives on precision medicine and AI in colorectal cancer: opportunities and challenges for the doctor-patient relationship

by Flobak, Åsmund , Mascalzoni, Deborah , Grauman, Åsa in Adult , Artificial Intelligence , Attitude of Health Personnel

2025

Background Precision medicine and artificial intelligence (AI) are increasingly integrated into colorectal cancer (CRC) care, offering personalised treatment strategies and data-driven decision support. While these technologies promise improved outcomes, they also raise challenges concerning clinical decision-making, the doctor-patient relationship, and ethics. This study explores physicians’ perspectives on integrating precision medicine and AI in CRC care. Methods A qualitative study was conducted using semi-structured interviews with ten CRC physicians from six European countries. Participants were recruited through purposive and snowball sampling. Interviews were analysed using thematic analysis. Results Three key themes emerged from the analysis. First, physicians described precision medicine as a logical extension of existing tailoring practices, offering new opportunities while introducing complexity. Many expressed concerns about the blurred boundary between experimental and standard treatments, noting potential implications for equity and ethical decision-making. Second, AI was viewed as a future partner in care, with the potential to enhance efficiency and assist in synthesising complex data. However, participants voiced concerns about trust, clinical responsibility, and the lack of regulatory clarity, particularly due to AI’s “black box” nature. Finally, doctors reported challenges in communicating both precision medicine and AI-based recommendations to patients. They emphasised the importance of adapting communication strategies to individual patients and highlighted the need for structured approaches to ensure patient understanding and prevent miscommunication, especially when dealing with uncertain outcomes or emerging technologies. Conclusions The findings highlight both the opportunities and challenges of integrating precision medicine and AI in CRC care. Addressing concerns related to communication, ethics, and regulation requires clear guidance and improved support for clinicians. Precision medicine and AI enhance CRC care but demand robust communication, regulation, and ethical safeguards to ensure transparency, trust, and physician autonomy.

Journal Article

Share this book

Add to My Shelf

Expectations and attitudes towards medical artificial intelligence: A qualitative study in the field of stroke

by Vayena, Effy , Ormond, Kelly E. , Blasimme, Alessandro in Algorithms , Artificial Intelligence , Attitudes

2023

Artificial intelligence (AI) has the potential to transform clinical decision-making as we know it. Powered by sophisticated machine learning algorithms, clinical decision support systems (CDSS) can generate unprecedented amounts of predictive information about individuals' health. Yet, despite the potential of these systems to promote proactive decision-making and improve health outcomes, their utility and impact remain poorly understood due to their still rare application in clinical practice. Taking the example of AI-powered CDSS in stroke medicine as a case in point, this paper provides a nuanced account of stroke survivors', family members', and healthcare professionals' expectations and attitudes towards medical AI. We followed a qualitative research design informed by the sociology of expectations, which recognizes the generative role of individuals' expectations in shaping scientific and technological change. Semi-structured interviews were conducted with stroke survivors, family members, and healthcare professionals specialized in stroke based in Germany and Switzerland. Data was analyzed using a combination of inductive and deductive thematic analysis. Based on the participants' deliberations, we identified four presumed roles that medical AI could play in stroke medicine, including an administrative, assistive, advisory, and autonomous role AI. While most participants held positive attitudes towards medical AI and its potential to increase accuracy, speed, and efficiency in medical decision making, they also cautioned that it is not a stand-alone solution and may even lead to new problems. Participants particularly emphasized the importance of relational aspects and raised questions regarding the impact of AI on roles and responsibilities and patients' rights to information and decision-making. These findings shed light on the potential impact of medical AI on professional identities, role perceptions, and the doctor-patient relationship. Our findings highlight the need for a more differentiated approach to identifying and tackling pertinent ethical and legal issues in the context of medical AI. We advocate for stakeholder and public involvement in the development of AI and AI governance to ensure that medical AI offers solutions to the most pressing challenges patients and clinicians face in clinical care.

Journal Article

Share this book

Add to My Shelf

AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research

by Pressman, Sophia M. , Haider, Clifton , Haider, Syed A. in Analysis , Artificial intelligence , Bone surgery

2024

Introduction: As large language models receive greater attention in medical research, the investigation of ethical considerations is warranted. This review aims to explore surgery literature to identify ethical concerns surrounding these artificial intelligence models and evaluate how autonomy, beneficence, nonmaleficence, and justice are represented within these ethical discussions to provide insights in order to guide further research and practice. Methods: A systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Five electronic databases were searched in October 2023. Eligible studies included surgery-related articles that focused on large language models and contained adequate ethical discussion. Study details, including specialty and ethical concerns, were collected. Results: The literature search yielded 1179 articles, with 53 meeting the inclusion criteria. Plastic surgery, orthopedic surgery, and neurosurgery were the most represented surgical specialties. Autonomy was the most explicitly cited ethical principle. The most frequently discussed ethical concern was accuracy (n = 45, 84.9%), followed by bias, patient confidentiality, and responsibility. Conclusion: The ethical implications of using large language models in surgery are complex and evolving. The integration of these models into surgery necessitates continuous ethical discourse to ensure responsible and ethical use, balancing technological advancement with human dignity and safety.

Journal Article

Share this book

Add to My Shelf

Leading with AI in critical care nursing: challenges, opportunities, and the human factor

by Hassan, Eman Arafa , El-Ashry, Ayman Mohamed in Artificial intelligence , Artificial intelligence and nursing practice , Critical care nursing

2024

Introduction The integration of artificial intelligence (AI) in intensive care units (ICUs) presents both opportunities and challenges for critical care nurses. This study delves into the human factor, exploring how nurses with leadership roles perceive the impact of AI on their professional practice. Objective To investigate how nurses perceive the impact of AI on their professional identity, ethical considerations surrounding its use, and the shared meanings they attribute to trust, collaboration, and communication when working with AI systems. Methods An interpretive phenomenological analysis was used to capture the lived experiences of critical care nurses leading with AI. Ten nurses with leadership roles in various ICU specializations were interviewed through purposive sampling. Semi-structured interviews explored nurses’ experiences with AI, challenges, and opportunities. Thematic analysis identified recurring themes related to the human factor in leading with AI. Findings Thematic analysis revealed two key themes which are leading with AI: making sense of challenges and opportunities and the human factor in leading with AI. The two main themes have six subthemes which revealed that AI offered benefits like task automation, but concerns existed about overreliance and the need for ongoing training. New challenges emerged, including adapting to new workflows and managing potential bias. Clear communication and collaboration were crucial for successful AI integration. Building trust in AI hinged on transparency, and collaboration allowed nurses to focus on human-centered care while AI supported data analysis. Ethical considerations included maintaining patient autonomy and ensuring accountability in AI-driven decisions. Conclusion While AI presents opportunities for automation and data analysis, successful integration hinges on addressing concerns about overreliance, workflow adaptation, and potential bias. Building trust and fostering collaboration are fundamentals for AI integration. Transparency in AI systems allows nurses to confidently delegate tasks, while collaboration empowers them to focus on human-centered care with AI support. Ultimately, dealing with the ethical concerns of AI in ICU care requires prioritizing patient autonomy and ensuring accountability in AI-driven decisions.

Journal Article

Share this book

Add to My Shelf

What the radiologist should know about artificial intelligence – an ESR white paper

in Artificial intelligence , Medical ethics , Optimization

2019

This paper aims to provide a review of the basis for application of AI in radiology, to discuss the immediate ethical and professional impact in radiology, and to consider possible future evolution.Even if AI does add significant value to image interpretation, there are implications outside the traditional radiology activities of lesion detection and characterisation. In radiomics, AI can foster the analysis of the features and help in the correlation with other omics data. Imaging biobanks would become a necessary infrastructure to organise and share the image data from which AI models can be trained. AI can be used as an optimising tool to assist the technologist and radiologist in choosing a personalised patient’s protocol, tracking the patient’s dose parameters, providing an estimate of the radiation risks. AI can also aid the reporting workflow and help the linking between words, images, and quantitative data. Finally, AI coupled with CDS can improve the decision process and thereby optimise clinical and radiological workflow.

Journal Article

Share this book

Add to My Shelf

Ethical implications related to processing of personal data and artificial intelligence in humanitarian crises: a scoping review

by Boone, Ella , Kreutzer, Tino , Orbinski, James in Altruism , Artificial intelligence , Artificial intelligence (AI)

2025

Background Humanitarian organizations are rapidly expanding their use of data in the pursuit of operational gains in effectiveness and efficiency. Ethical risks, particularly from artificial intelligence (AI) data processing, are increasingly recognized yet inadequately addressed by current humanitarian data protection guidelines. This study reports on a scoping review that maps the range of ethical issues that have been raised in the academic literature regarding data processing of people affected by humanitarian crises. Methods We systematically searched databases to identify peer-reviewed studies published since 2010. Data and findings were standardized, grouping ethical issues into the value categories of autonomy, beneficence, non-maleficence, and justice. The study protocol followed Arksey and O’Malley’s approach and PRISMA reporting guidelines. Results We identified 16,200 unique records and retained 218 relevant studies. Nearly one in three ( n = 66) discussed technologies related to AI. Seventeen studies included an author from a lower-middle income country while four included an author from a low-income country. We identified 22 ethical issues which were then grouped along the four ethical value categories of autonomy, beneficence, non-maleficence, and justice. Slightly over half of included studies ( n = 113) identified ethical issues based on real-world examples. The most-cited ethical issue ( n = 134) was a concern for privacy in cases where personal or sensitive data might be inadvertently shared with third parties. Aside from AI, the technologies most frequently discussed in these studies included social media, crowdsourcing, and mapping tools. Conclusions Studies highlight significant concerns that data processing in humanitarian contexts can cause additional harm, may not provide direct benefits, may limit affected populations’ autonomy, and can lead to the unfair distribution of scarce resources. The increase in AI tool deployment for humanitarian assistance amplifies these concerns. Urgent development of specific, comprehensive guidelines, training, and auditing methods is required to address these ethical challenges. Moreover, empirical research from low and middle-income countries, disproportionally affected by humanitarian crises, is vital to ensure inclusive and diverse perspectives. This research should focus on the ethical implications of both emerging AI systems, as well as established humanitarian data management practices. Trial registration Not applicable.

Journal Article

Share this book

Add to My Shelf

Physician Use of Large Language Models: A Quantitative Study Based on Large-Scale Query-Level Data

by Qiu, Lin , Bi, Xuan , Zhang, Heping in Adult , AI Language Models in Health Care , Artificial Intelligence

2025

Generative artificial intelligence (GenAI) has rapidly emerged as a promising tool in health care. Despite its growing adoption, how physicians make use of it in medical practice has not been qualitatively studied. Existing literature has largely focused on theoretical applications or experimental validations, with limited insight into real-world physician engagement with GenAI technologies. The aim of this study was to leverage a fine-grained dataset at the query level to quantitatively examine how physicians incorporate GenAI into their clinical and research workflows. The primary objective was to analyze usage patterns over time and across physician demographics. A secondary goal was to assess potential risks to patient privacy arising from physicians' interactions with GenAI platforms. This study collected 106,942 query-and-answer pairs by 989 physicians between August 29, 2023, and April 16, 2024. We performed topic classification to identify the most prevalent use cases, examining how these use cases evolved over time and across demographics. We also developed sensitivity classifiers to detect personally identifiable information in physicians' queries to explore the potential privacy breach risks around physicians' use of GenAI. Approximately 40% (396/989) of the enrolled physicians were female, 45.9% (454/989) were younger than 25 years, and 54.1% (535/989) were between 25 and 56 years of age. The majority of them worked in clinical departments (680/989, 68.8%) or medical technology departments (127/989, 12.8%). Our classification-based quantitative analyses suggest the following. First, physicians use GenAI predominantly for medical research (64,379/106,942, 60.2%) rather than clinical practice (13,100/106,942, 12.25%). Second, physicians focus more on health care-related questions (rising from 64,165/106,942, 60% to 83,415/106,942, 78%) within the first 15% (16,041/106,942) of their query sequence. Third, the use of GenAI differed across physician demographics and features. Specifically, female physicians asked a larger proportion of clinical questions (female: 0.154 vs male: 0.108; P<.001) and administration questions (female: 0.027 vs male: 0.018; P<.001) than male physicians; younger physicians posed more clinical questions (age ≤25: 0.146 vs age ∈ (25, 40]: 0.115 vs age >40: 0.103; P<.001) but fewer research questions (age ≤25: 0.580 vs age ∈ (25, 40]: 0.607 vs age >40: 0.664; P<.001) than senior physicians; and physicians accessing GenAI via computers asked more research questions (computer: 0.637 vs mobile: 0.296; P<.001), whereas physicians using mobile devices asked more clinical questions (computer: 0.107 vs mobile: 0.264; P<.001). Fourth, only 2.68% (2866/106,942) of physician queries contained sensitive information, the majority of which were primarily derived from writing and editing. Physicians are actively integrating GenAI into their professional routines, primarily leveraging it for research but also increasingly for clinical support. Usage patterns vary significantly across demographic lines, including gender, age, and device preference. Despite the presence of sensitive information in some queries, the risk of privacy breaches appears to be low.

Journal Article

Share this book

Add to My Shelf

Identification of Ethical Issues and Practice Recommendations Regarding the Use of Robotic Coaching Solutions for Older Adults: Narrative Review

by Ogawa, Toshimi , Barbarossa, Federico , Rigaud, Anne-Sophie in Accidents , Adults , Aged

2024

Technological advances in robotics, artificial intelligence, cognitive algorithms, and internet-based coaches have contributed to the development of devices capable of responding to some of the challenges resulting from demographic aging. Numerous studies have explored the use of robotic coaching solutions (RCSs) for supporting healthy behaviors in older adults and have shown their benefits regarding the quality of life and functional independence of older adults at home. However, the use of RCSs by individuals who are potentially vulnerable raises many ethical questions. Establishing an ethical framework to guide the development, use, and evaluation practices regarding RCSs for older adults seems highly pertinent. The objective of this paper was to highlight the ethical issues related to the use of RCSs for health care purposes among older adults and draft recommendations for researchers and health care professionals interested in using RCSs for older adults. We conducted a narrative review of the literature to identify publications including an analysis of the ethical dimension and recommendations regarding the use of RCSs for older adults. We used a qualitative analysis methodology inspired by a Health Technology Assessment model. We included all article types such as theoretical papers, research studies, and reviews dealing with ethical issues or recommendations for the implementation of these RCSs in a general population, particularly among older adults, in the health care sector and published after 2011 in either English or French. The review was performed between August and December 2021 using the PubMed, CINAHL, Embase, Scopus, Web of Science, IEEE Explore, SpringerLink, and PsycINFO databases. Selected publications were analyzed using the European Network of Health Technology Assessment Core Model (version 3.0) around 5 ethical topics: benefit-harm balance, autonomy, privacy, justice and equity, and legislation. In the 25 publications analyzed, the most cited ethical concerns were the risk of accidents, lack of reliability, loss of control, risk of deception, risk of social isolation, data confidentiality, and liability in case of safety problems. Recommendations included collecting the opinion of target users, collecting their consent, and training professionals in the use of RCSs. Proper data management, anonymization, and encryption appeared to be essential to protect RCS users' personal data. Our analysis supports the interest in using RCSs for older adults because of their potential contribution to individuals' quality of life and well-being. This analysis highlights many ethical issues linked to the use of RCSs for health-related goals. Future studies should consider the organizational consequences of the implementation of RCSs and the influence of cultural and socioeconomic specificities of the context of experimentation. We suggest implementing a scalable ethical and regulatory framework to accompany the development and implementation of RCSs for various aspects related to the technology, individual, or legal aspects.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter