Catalogue Search | MBRL

Evaluation of a Medical Interview-Assistance System Using Artificial Intelligence for Resident Physicians Interviewing Simulated Patients: A Crossover, Randomized, Controlled Trial

by Yanagisawa, Naotake , Watanabe, Yu , Fujibayashi, Kazutoshi in Accuracy , Algorithms , Artificial Intelligence

2023

Medical interviews are expected to undergo a major transformation through the use of artificial intelligence. However, artificial intelligence-based systems that support medical interviews are not yet widespread in Japan, and their usefulness is unclear. A randomized, controlled trial to determine the usefulness of a commercial medical interview support system using a question flow chart-type application based on a Bayesian model was conducted. Ten resident physicians were allocated to two groups with or without information from an artificial intelligence-based support system. The rate of correct diagnoses, amount of time to complete the interviews, and number of questions they asked were compared between the two groups. Two trials were conducted on different dates, with a total of 20 resident physicians participating. Data for 192 differential diagnoses were obtained. There was a significant difference in the rate of correct diagnosis between the two groups for two cases and for overall cases (0.561 vs. 0.393; p = 0.02). There was a significant difference in the time required between the two groups for overall cases (370 s (352–387) vs. 390 s (373–406), p = 0.04). Artificial intelligence-assisted medical interviews helped resident physicians make more accurate diagnoses and reduced consultation time. The widespread use of artificial intelligence systems in clinical settings could contribute to improving the quality of medical care.

Journal Article

Share this book

Add to My Shelf

Interprofessional Approach Reducing Central Venous Catheters by Expanding Peripheral Access

by Ang, Georgina , Boros, Kata , McDonnell, Max in Access , catheter related infections , Catheterization

2025

A community hospital reduced central venous catheter (CVC) line days and central line-associated bloodstream infections (CLABSIs) through a collaborative nursing and resident physician-led vascular access initiative. This was achieved by introducing midline and extended dwell catheters to the facility, device insertion training for nurse practitioners (NP) and resident physicians, and the establishment of safe alternative vascular access eligibility criteria. This approach offers valuable insights for hospitals aiming to decrease CVC utilization, reduce CLABSIs, expand NP and resident physician skillsets, foster interdisciplinary collaboration, and optimize vascular access practices. •One hundred thirty-three midlines and extended dwell catheters were placed (1,506 device days) in 1 year.•Central venous catheter days were reduced by 22.6% (1,061 line days) in 1 year.•Central line–associated bloodstream infections were reduced by 75% (4 to 1).•Only 1 laboratory-confirmed bloodstream infection occurred during the study period.•All devices were placed by a nurse practitioner or internal medicine resident physician.

Journal Article

Share this book

Add to My Shelf

ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis

by Auer, Matthias K , Hoppe, John Michael , Stremmel, Christopher in Accuracy , Acute coronary syndromes , Adults

2024

OpenAI's ChatGPT is a pioneering artificial intelligence (AI) in the field of natural language processing, and it holds significant potential in medicine for providing treatment advice. Additionally, recent studies have demonstrated promising results using ChatGPT for emergency medicine triage. However, its diagnostic accuracy in the emergency department (ED) has not yet been evaluated. This study compares the diagnostic accuracy of ChatGPT with GPT-3.5 and GPT-4 and primary treating resident physicians in an ED setting. Among 100 adults admitted to our ED in January 2023 with internal medicine issues, the diagnostic accuracy was assessed by comparing the diagnoses made by ED resident physicians and those made by ChatGPT with GPT-3.5 or GPT-4 against the final hospital discharge diagnosis, using a point system for grading accuracy. The study enrolled 100 patients with a median age of 72 (IQR 58.5-82.0) years who were admitted to our internal medicine ED primarily for cardiovascular, endocrine, gastrointestinal, or infectious diseases. GPT-4 outperformed both GPT-3.5 (P<.001) and ED resident physicians (P=.01) in diagnostic accuracy for internal medicine emergencies. Furthermore, across various disease subgroups, GPT-4 consistently outperformed GPT-3.5 and resident physicians. It demonstrated significant superiority in cardiovascular (GPT-4 vs ED physicians: P=.03) and endocrine or gastrointestinal diseases (GPT-4 vs GPT-3.5: P=.01). However, in other categories, the differences were not statistically significant. In this study, which compared the diagnostic accuracy of GPT-3.5, GPT-4, and ED resident physicians against a discharge diagnosis gold standard, GPT-4 outperformed both the resident physicians and its predecessor, GPT-3.5. Despite the retrospective design of the study and its limited sample size, the results underscore the potential of AI as a supportive diagnostic tool in ED settings.

Journal Article

Share this book

Add to My Shelf

Knowledge, Attitudes, and Ordering Patterns for Routine HIV Screening among Resident Physicians at an Urban Medical Center

by Kordik, Abbe , Farnan, Jeanne , Watson, Sydeaka in Adult , Attitudes , Barriers

2016

Background: We sought to measure resident physician knowledge of HIV epidemiology and screening guidelines, attitudes toward testing, testing practices, and barriers and facilitators to routine testing. Methods: Resident physicians in internal medicine, pediatrics, obstetrics and gynecology, and emergency medicine were surveyed. Results: Overall response rate was 63% (162 of 259). Half knew details of the HIV screening guidelines, but few follow these recommendations. Less than one-third reported always or usually performing routine testing. A significant proportion reported only sometimes or never screening patients with risk factors. This was despite a strong belief that HIV screening improves patient care and public health. The most common barriers to testing were competing priorities and forgetting to order the test. Elimination of written consent and electronic reminders was identified as facilitators to routine testing. Although an institutional policy assigns responsibility for test notification and linkage of HIV-positive patients to care to the HIV care program, only 29% were aware of this. Conclusions: Few resident physicians routinely screen for HIV infection and some don’t test patients with risk factors. While competing priorities remain a significant barrier, elimination of written consent form and electronic reminders has facilitated testing. Increasing the awareness of policies regarding test notification and linkage to care may improve screening.

Journal Article

Share this book

Add to My Shelf

Comparison of the Quality of Discharge Letters Written by Large Language Models and Junior Clinicians: Single-Blinded Study

by Lim, Daniel Yan Zheng , Tan, Ting Fang , Ong, Jasmine Chiat Ling in Archives & records , Artificial intelligence , Biopsy

2024

Discharge letters are a critical component in the continuity of care between specialists and primary care providers. However, these letters are time-consuming to write, underprioritized in comparison to direct clinical care, and are often tasked to junior doctors. Prior studies assessing the quality of discharge summaries written for inpatient hospital admissions show inadequacies in many domains. Large language models such as GPT have the ability to summarize large volumes of unstructured free text such as electronic medical records and have the potential to automate such tasks, providing time savings and consistency in quality. The aim of this study was to assess the performance of GPT-4 in generating discharge letters written from urology specialist outpatient clinics to primary care providers and to compare their quality against letters written by junior clinicians. Fictional electronic records were written by physicians simulating 5 common urology outpatient cases with long-term follow-up. Records comprised simulated consultation notes, referral letters and replies, and relevant discharge summaries from inpatient admissions. GPT-4 was tasked to write discharge letters for these cases with a specified target audience of primary care providers who would be continuing the patient's care. Prompts were written for safety, content, and style. Concurrently, junior clinicians were provided with the same case records and instructional prompts. GPT-4 output was assessed for instances of hallucination. A blinded panel of primary care physicians then evaluated the letters using a standardized questionnaire tool. GPT-4 outperformed human counterparts in information provision (mean 4.32, SD 0.95 vs 3.70, SD 1.27; P=.03) and had no instances of hallucination. There were no statistically significant differences in the mean clarity (4.16, SD 0.95 vs 3.68, SD 1.24; P=.12), collegiality (4.36, SD 1.00 vs 3.84, SD 1.22; P=.05), conciseness (3.60, SD 1.12 vs 3.64, SD 1.27; P=.71), follow-up recommendations (4.16, SD 1.03 vs 3.72, SD 1.13; P=.08), and overall satisfaction (3.96, SD 1.14 vs 3.62, SD 1.34; P=.36) between the letters generated by GPT-4 and humans, respectively. Discharge letters written by GPT-4 had equivalent quality to those written by junior clinicians, without any hallucinations. This study provides a proof of concept that large language models can be useful and safe tools in clinical documentation.

Journal Article

Share this book

Add to My Shelf

Consent-GPT: is it ethical to delegate procedural consent to conversational AI?

by Allen, Jemima Winifred , Wilkinson, Dominic , Earp, Brian D in Artificial Intelligence , Autonomy , Birth control

2024

Obtaining informed consent from patients prior to a medical or surgical procedure is a fundamental part of safe and ethical clinical practice. Currently, it is routine for a significant part of the consent process to be delegated to members of the clinical team not performing the procedure (eg, junior doctors). However, it is common for consent-taking delegates to lack sufficient time and clinical knowledge to adequately promote patient autonomy and informed decision-making. Such problems might be addressed in a number of ways. One possible solution to this clinical dilemma is through the use of conversational artificial intelligence using large language models (LLMs). There is considerable interest in the potential benefits of such models in medicine. For delegated procedural consent, LLM could improve patients’ access to the relevant procedural information and therefore enhance informed decision-making.In this paper, we first outline a hypothetical example of delegation of consent to LLMs prior to surgery. We then discuss existing clinical guidelines for consent delegation and some of the ways in which current practice may fail to meet the ethical purposes of informed consent. We outline and discuss the ethical implications of delegating consent to LLMs in medicine concluding that at least in certain clinical situations, the benefits of LLMs potentially far outweigh those of current practices.

Journal Article

Share this book

Add to My Shelf

Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study

by Tapia, Antonio , Saadat, Soheil , Roh, Jennifer S in Accuracy , Artificial Intelligence , Averages

2024

Recent surveys indicate that 48% of consumers actively use generative artificial intelligence (AI) for health-related inquiries. Despite widespread adoption and the potential to improve health care access, scant research examines the performance of AI chatbot responses regarding emergency care advice. We assessed the quality of AI chatbot responses to common emergency care questions. We sought to determine qualitative differences in responses from 4 free-access AI chatbots, for 10 different serious and benign emergency conditions. We created 10 emergency care questions that we fed into the free-access versions of ChatGPT 3.5 (OpenAI), Google Bard, Bing AI Chat (Microsoft), and Claude AI (Anthropic) on November 26, 2023. Each response was graded by 5 board-certified emergency medicine (EM) faculty for 8 domains of percentage accuracy, presence of dangerous information, factual accuracy, clarity, completeness, understandability, source reliability, and source relevancy. We determined the correct, complete response to the 10 questions from reputable and scholarly emergency medical references. These were compiled by an EM resident physician. For the readability of the chatbot responses, we used the Flesch-Kincaid Grade Level of each response from readability statistics embedded in Microsoft Word. Differences between chatbots were determined by the chi-square test. Each of the 4 chatbots' responses to the 10 clinical questions were scored across 8 domains by 5 EM faculty, for 400 assessments for each chatbot. Together, the 4 chatbots had the best performance in clarity and understandability (both 85%), intermediate performance in accuracy and completeness (both 50%), and poor performance (10%) for source relevance and reliability (mostly unreported). Chatbots contained dangerous information in 5% to 35% of responses, with no statistical difference between chatbots on this metric (P=.24). ChatGPT, Google Bard, and Claud AI had similar performances across 6 out of 8 domains. Only Bing AI performed better with more identified or relevant sources (40%; the others had 0%-10%). Flesch-Kincaid Reading level was 7.7-8.9 grade for all chatbots, except ChatGPT at 10.8, which were all too advanced for average emergency patients. Responses included both dangerous (eg, starting cardiopulmonary resuscitation with no pulse check) and generally inappropriate advice (eg, loosening the collar to improve breathing without evidence of airway compromise). AI chatbots, though ubiquitous, have significant deficiencies in EM patient advice, despite relatively consistent performance. Information for when to seek urgent or emergent care is frequently incomplete and inaccurate, and patients may be unaware of misinformation. Sources are not generally provided. Patients who use AI to guide health care decisions assume potential risks. AI chatbots for health should be subject to further research, refinement, and regulation. We strongly recommend proper medical consultation to prevent potential adverse outcomes.

Journal Article

Share this book

Add to My Shelf

The Effect of Screen-to-Screen Versus Face-to-Face Consultation on Doctor-Patient Communication: An Experimental Study with Simulated Patients

by Kanters, Saskia , Tates, Kiek , Nieboer, Theodoor E in Attitudes , Behavior , Communication

2017

Despite the emergence of Web-based patient-provider contact, it is still unclear how the quality of Web-based doctor-patient interactions differs from face-to-face interactions. This study aimed to examine (1) the impact of a consultation medium on doctors' and patients' communicative behavior in terms of information exchange, interpersonal relationship building, and shared decision making and (2) the mediating role of doctors' and patients' communicative behavior on satisfaction with both types of consultation medium. Doctor-patient consultations on pelvic organ prolapse were simulated, both in a face-to-face and in a screen-to-screen (video) setting. Twelve medical interns and 6 simulated patients prepared 4 different written scenarios and were randomized to perform a total of 48 consultations. Effects of the consultations were measured by questionnaires that participants filled out directly after the consultation. With respect to patient-related outcomes, satisfaction, perceived information exchange, interpersonal relationship building, and perceived shared decision making showed no significant differences between face-to-face and screen-to-screen consultations. Patients' attitude toward Web-based communication (b=-.249, P=.02 and patients' perceived time and attention (b=.271, P=.03) significantly predicted patients' perceived interpersonal relationship building. Patients' perceived shared decision making was positively related to their satisfaction with the consultation (b=.254, P=.005). Overall, patients experienced significantly greater shared decision making with a female doctor (mean 4.21, SD 0.49) than with a male doctor (mean 3.66 [SD 0.73]; b=.401, P=.009). Doctor-related outcomes showed no significant differences in satisfaction, perceived information exchange, interpersonal relationship building, and perceived shared decision making between the conditions. There was a positive relationship between perceived information exchange and doctors' satisfaction with the consultation (b=.533, P<.001). Furthermore, doctors' perceived interpersonal relationship building was positively related to doctors' satisfaction with the consultation (b=.331, P=.003). In this study, the quality of doctor-patient communication, as indicated by information exchange, interpersonal relationship building, and shared decision making, did not differ significantly between Web-based and face-to-face consultations. Doctors and simulated patients were equally satisfied with both types of consultation medium, and no differences were found in the manner in which participants perceived communicative behavior during these consultations. The findings suggest that worries about a negative impact of Web-based video consultation on the quality of patient-provider consultations seem unwarranted as they offer the same interaction quality and satisfaction level as regular face-to-face consultations.

Journal Article

Share this book

Add to My Shelf

Using machine learning with intensive longitudinal data to predict depression and suicidal ideation among medical interns over time

by Sen, Srijan , Czyz, Ewa K. , Cleary, Jennifer in Accuracy , Affect , Data collection

2023

Use of intensive longitudinal methods (e.g. ecological momentary assessment, passive sensing) and machine learning (ML) models to predict risk for depression and suicide has increased in recent years. However, these studies often vary considerably in length, ML methods used, and sources of data. The present study examined predictive accuracy for depression and suicidal ideation (SI) as a function of time, comparing different combinations of ML methods and data sources. Participants were 2459 first-year training physicians (55.1% female; 52.5% White) who were provided with Fitbit wearable devices and assessed daily for mood. Linear [elastic net regression (ENR)] and non-linear (random forest) ML algorithms were used to predict depression and SI at the first-quarter follow-up assessment, using two sets of variables (daily mood features only, daily mood features + passive-sensing features). To assess accuracy over time, models were estimated iteratively for each of the first 92 days of internship, using data available up to that point in time. ENRs using only the daily mood features generally had the best accuracy for predicting mental health outcomes, and predictive accuracy within 1 standard error of the full 92 day models was attained by weeks 7-8. Depression at 92 days could be predicted accurately (area under the curve >0.70) after only 14 days of data collection. Simpler ML methods may outperform more complex methods until passive-sensing features become better specified. For intensive longitudinal studies, there may be limited predictive value in collecting data for more than 2 months.

Journal Article

Share this book

Add to My Shelf

The Longer The Shifts For Hospital Nurses, The Higher The Levels Of Burnout And Patient Dissatisfaction

by Sloane, Douglas M. , Aiken, Linda H. , Stimpfel, Amy Witkoski in Burnout , Correlation analysis , Discontent

2012

Extended work shifts of twelve hours or longer are common and even popular with hospital staff nurses, but little is known about how such extended hours affect the care that patients receive or the wellbeing of nurses. Survey data from nurses in four states showed that more than 80 percent of the nurses were satisfied with scheduling practices at their hospital. However, as the proportion of hospital nurses working shifts of more than thirteen hours increased, patients' dissatisfaction with care increased. Furthermore, nurses working shifts of ten hours or longer were up to two and a half times more likely than nurses working shorter shifts to experience burnout and job dissatisfaction and to intend to leave the job. Extended shifts undermine nurses' well-being, may result in expensive job turnover, and can negatively affect patient care. Policies regulating work hours for nurses, similar to those set for resident physicians, may be warranted. Nursing leaders should also encourage workplace cultures that respect nurses' days off and vacation time, promote nurses' prompt departure at the end of a shift, and allow nurses to refuse to work overtime without retribution. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter