Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
by
Lambert, Lukas
, Junquero, Vanesa
, Oleaga, Laura
, Ozbek, Suha Sureyya
, Pristoupil, Jakub
, Merino, Cristina
in
Artificial intelligence
/ Chatbots
/ Confidence
/ Decision making
/ Diagnostic Radiology
/ Education (medical)
/ Educational measurement
/ Educational Measurement - methods
/ Europe
/ European Diploma in Radiology
/ Generative Artificial Intelligence
/ Humans
/ Imaging
/ Internal Medicine
/ Interventional Radiology
/ Language
/ Large language models
/ Medical education
/ Medicine
/ Medicine & Public Health
/ Neuroradiology
/ Original
/ Original Article
/ Radiology
/ Radiology - education
/ Self report
/ Statistical analysis
/ Ultrasound
/ Variance analysis
2025
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
by
Lambert, Lukas
, Junquero, Vanesa
, Oleaga, Laura
, Ozbek, Suha Sureyya
, Pristoupil, Jakub
, Merino, Cristina
in
Artificial intelligence
/ Chatbots
/ Confidence
/ Decision making
/ Diagnostic Radiology
/ Education (medical)
/ Educational measurement
/ Educational Measurement - methods
/ Europe
/ European Diploma in Radiology
/ Generative Artificial Intelligence
/ Humans
/ Imaging
/ Internal Medicine
/ Interventional Radiology
/ Language
/ Large language models
/ Medical education
/ Medicine
/ Medicine & Public Health
/ Neuroradiology
/ Original
/ Original Article
/ Radiology
/ Radiology - education
/ Self report
/ Statistical analysis
/ Ultrasound
/ Variance analysis
2025
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
by
Lambert, Lukas
, Junquero, Vanesa
, Oleaga, Laura
, Ozbek, Suha Sureyya
, Pristoupil, Jakub
, Merino, Cristina
in
Artificial intelligence
/ Chatbots
/ Confidence
/ Decision making
/ Diagnostic Radiology
/ Education (medical)
/ Educational measurement
/ Educational Measurement - methods
/ Europe
/ European Diploma in Radiology
/ Generative Artificial Intelligence
/ Humans
/ Imaging
/ Internal Medicine
/ Interventional Radiology
/ Language
/ Large language models
/ Medical education
/ Medicine
/ Medicine & Public Health
/ Neuroradiology
/ Original
/ Original Article
/ Radiology
/ Radiology - education
/ Self report
/ Statistical analysis
/ Ultrasound
/ Variance analysis
2025
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
Journal Article
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
2025
Request Book From Autostore
and Choose the Collection Method
Overview
Background
We compared the performance, confidence, and response consistency of five chatbots powered by large language models in solving European Diploma in Radiology (EDiR) text-based multiple-response questions.
Methods
ChatGPT-4o, ChatGPT-4o-mini, Copilot, Gemini, and Claude 3.5 Sonnet were tested using 52 text-based multiple-response questions from two previous EDiR sessions in two iterations. Chatbots were prompted to evaluate each answer as correct or incorrect and grade its confidence level on a scale of 0 (not confident at all) to 10 (most confident). Scores per question were calculated using a weighted formula that accounted for correct and incorrect answers (range 0.0–1.0).
Results
Claude 3.5 Sonnet achieved the highest score per question (0.84 ± 0.26, mean ± standard deviation) compared to ChatGPT-4o (0.76 ± 0.31), ChatGPT-4o-mini (0.64 ± 0.35), Copilot (0.62 ± 0.37), and Gemini (0.54 ± 0.39) (
p
< 0.001). A self-reported confidence in answering the questions was 9.0 ± 0.9 for Claude 3.5 Sonnet followed by ChatGPT-4o (8.7 ± 1.1), compared to ChatGPT-4o-mini (8.2 ± 1.3), Copilot (8.2 ± 2.2), and Gemini (8.2 ± 1.6,
p
< 0.001). Claude 3.5 Sonnet demonstrated superior consistency, changing responses in 5.4% of cases between the two iterations, compared to ChatGPT-4o (6.5%), ChatGPT-4o-mini (8.8%), Copilot (13.8%), and Gemini (18.5%). All chatbots outperformed human candidates from previous EDiR sessions, achieving a passing grade from this part of the examination.
Conclusion
Claude 3.5 Sonnet exhibited superior accuracy, confidence, and consistency, with ChatGPT-4o performing nearly as well. The variation in performance among the evaluated models was substantial.
Relevance statement
Variation in performance, consistency, and confidence among chatbots in solving EDiR test-based questions highlights the need for cautious deployment, particularly in high-stakes clinical and educational settings.
Key Points
Claude 3.5 Sonnet outperformed other chatbots in accuracy and response consistency.
ChatGPT-4o ranked second, showing strong but slightly less reliable performance.
All chatbots surpassed EDiR candidates in text-based EDiR questions.
Graphical Abstract
Publisher
Springer Vienna,Springer Nature B.V,SpringerOpen
Subject
This website uses cookies to ensure you get the best experience on our website.