Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Benchmarking large language models against clinicians across hospital levels in cardiovascular decision-making: a cross-sectional vignette-based study
by
Zhang, Zixi
, Dai, Yongguo
, Liu, Qiming
, Liu, Chan
, Tu, Tao
, Ma, Yingxu
, Xiao, Yichao
, Lin, Qiuzhen
, Wang, Cancan
in
631/114
/ 692/308
/ 692/700
/ Accuracy
/ Adult
/ Artificial intelligence
/ Benchmarking
/ Cardiovascular Diseases - diagnosis
/ Chatbots
/ ChatGPT 4.0
/ China
/ Clinical Competence
/ Clinical decision-making
/ Clinical Decision-Making - methods
/ Cross-Sectional Studies
/ Decision making
/ DeepSeek-R1
/ Female
/ Hospitals
/ Humanities and Social Sciences
/ Humans
/ Language
/ Large Language Models
/ Male
/ Memory
/ Middle Aged
/ multidisciplinary
/ Multiple choice
/ Science
/ Science (multidisciplinary)
/ Sensitivity analysis
2025
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Benchmarking large language models against clinicians across hospital levels in cardiovascular decision-making: a cross-sectional vignette-based study
by
Zhang, Zixi
, Dai, Yongguo
, Liu, Qiming
, Liu, Chan
, Tu, Tao
, Ma, Yingxu
, Xiao, Yichao
, Lin, Qiuzhen
, Wang, Cancan
in
631/114
/ 692/308
/ 692/700
/ Accuracy
/ Adult
/ Artificial intelligence
/ Benchmarking
/ Cardiovascular Diseases - diagnosis
/ Chatbots
/ ChatGPT 4.0
/ China
/ Clinical Competence
/ Clinical decision-making
/ Clinical Decision-Making - methods
/ Cross-Sectional Studies
/ Decision making
/ DeepSeek-R1
/ Female
/ Hospitals
/ Humanities and Social Sciences
/ Humans
/ Language
/ Large Language Models
/ Male
/ Memory
/ Middle Aged
/ multidisciplinary
/ Multiple choice
/ Science
/ Science (multidisciplinary)
/ Sensitivity analysis
2025
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Benchmarking large language models against clinicians across hospital levels in cardiovascular decision-making: a cross-sectional vignette-based study
by
Zhang, Zixi
, Dai, Yongguo
, Liu, Qiming
, Liu, Chan
, Tu, Tao
, Ma, Yingxu
, Xiao, Yichao
, Lin, Qiuzhen
, Wang, Cancan
in
631/114
/ 692/308
/ 692/700
/ Accuracy
/ Adult
/ Artificial intelligence
/ Benchmarking
/ Cardiovascular Diseases - diagnosis
/ Chatbots
/ ChatGPT 4.0
/ China
/ Clinical Competence
/ Clinical decision-making
/ Clinical Decision-Making - methods
/ Cross-Sectional Studies
/ Decision making
/ DeepSeek-R1
/ Female
/ Hospitals
/ Humanities and Social Sciences
/ Humans
/ Language
/ Large Language Models
/ Male
/ Memory
/ Middle Aged
/ multidisciplinary
/ Multiple choice
/ Science
/ Science (multidisciplinary)
/ Sensitivity analysis
2025
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Benchmarking large language models against clinicians across hospital levels in cardiovascular decision-making: a cross-sectional vignette-based study
Journal Article
Benchmarking large language models against clinicians across hospital levels in cardiovascular decision-making: a cross-sectional vignette-based study
2025
Request Book From Autostore
and Choose the Collection Method
Overview
Large language models (LLMs) have showed strong performance on standardized medical examinations, yet their comparative clinical relevance against human clinicians remains limited. This study benchmarked the performance of DeepSeek-R1 and ChatGPT 4.0 against cardiovascular clinicians from different hospital levels in China. We conducted a cross-sectional, vignette-based assessment consisting of 100 standardized cardiovascular multiple-choice questions covering four competency domains: clinical reasoning (CR), frontier updates (FU), basic memory (BM), and emergency decision (ED). Thirty clinicians from six hospitals (three primary and three tertiary) were compared with two LLMs. Each question was executed five times per model, and run-to-run consistency was evaluated. Mean differences (LLM − clinician) with 95% confidence intervals (CIs) were estimated using nonparametric bootstrap resampling (10,000 iterations). Clinicians achieved a mean total score of 69.7 ± 7.9, whereas DeepSeek-R1 and ChatGPT-4.0 scored 97 and 95, respectively. The mean total score differences were + 27.3 points (95% CI 24.4–30.1) for DeepSeek-R1 and + 25.3 points (22.4–28.1) for ChatGPT 4.0. Both models outperformed clinicians in CR, FU, BM, and ED. Run-to-run agreement was high (DeepSeek-R1 κ = 0.73; ChatGPT 4.0 κ = 0.76). LLMs substantially outperformed clinicians in knowledge- and decision-based tasks while approaching clinician-level performance in CR. These findings suggest that LLMs may complement clinical expertise and enhance diagnostic consistency across hospital levels.
Publisher
Nature Publishing Group UK,Nature Publishing Group,Nature Portfolio
Subject
/ 692/308
/ 692/700
/ Accuracy
/ Adult
/ Cardiovascular Diseases - diagnosis
/ Chatbots
/ China
/ Clinical Decision-Making - methods
/ Female
/ Humanities and Social Sciences
/ Humans
/ Language
/ Male
/ Memory
/ Science
This website uses cookies to ensure you get the best experience on our website.