Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning

by Tianyue Li , Runze Duan , Lu Zheng , Yujing Hu , Yanzhu Bian , Jing Pang , Ziyu Guo

in [18F]FDG PET/CT / artificial intelligence / Chatbot / DeepSeek-R1 / GPT-5.3 / patient communication

2026

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Do you wish to request the book?

Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning

by Tianyue Li , Runze Duan , Lu Zheng , Yujing Hu , Yanzhu Bian , Jing Pang , Ziyu Guo

in [18F]FDG PET/CT / artificial intelligence / Chatbot / DeepSeek-R1 / GPT-5.3 / patient communication

2026

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning

Tianyue Li,

Runze Duan,

Lu Zheng,

Yujing Hu,

Yanzhu Bian,

Jing Pang,

Ziyu Guo

2026

Overview

ObjectiveTo evaluate the performance of DeepSeek (R1 version), an open-source large language model, in three core clinical scenarios: answering patients’ common questions, interpreting PET/CT reports with follow-up inquiries, and diagnosing complex cases, and comparison with GPT-5.3, to verify the clinical applicability of DeepSeek-R1 as an alternative AI assistant.MethodsA total of 39 standardized tasks were assigned to both models, including responding to 15 frequently asked questions about [18F]FDG PET/CT, interpreting 12 anonymized reports of lung cancer and lymphoma (with follow-up inquiries regarding tumor staging or treatment), and providing primary and differential diagnoses for 10 difficult cases. Both models were accessed via their official platforms with default parameters, and all prompts and evaluation criteria were kept identical for cross-model comparison. Two senior nuclear medicine physicians independently rated the model responses using a 4-point standardized scale (assessing appropriateness, helpfulness, inter-trial consistency, and reference validity) and a binary scale for empathy; Cohen’s Kappa coefficient was used to evaluate inter-rater agreement. McNemar’s test was used to compare paired proportions of appropriateness, empathy, and response inconsistency between the two models.ResultsAcross the 39 tasks, DeepSeek-R1 achieved 94.9% appropriateness and 100% helpfulness. Specifically, 91.7% of responses to follow-up inquiries about tumor staging or treatment were rated empathetic. However, 7.7% of regenerated responses showed substantial inconsistencies, primarily in tumor staging, and only 37% of cited references were fully valid, with 11.1% being invalid. GPT-5.3 exhibited equivalent core performance to DeepSeek-R1 with 94.9% appropriateness and 100% helpfulness, a slightly lower substantial inconsistency rate (5.1%), favorable reference validity (33% fully valid, 7.4% invalid), but a notably lower empathy score (66.7%) for follow-up inquiries. McNemar tests showed identical appropriateness (p = 1.00) and no significant difference in inconsistency (p = 1.00, 95% CI 0.60–14.80) between models. DeepSeek-R1 had higher empathy, the difference was not significant (p = 0.25, 95% CI 0.09–0.66). For the 10 identical difficult cases, both models reached 10% primary diagnosis accuracy and 60% differential diagnosis accuracy.ConclusionDeepSeek-R1 and GPT-5.3 have complementary strengths but similar reference hallucination issues and cannot replace clinicians. DeepSeek-R1 is a cost-effective auxiliary tool, with future optimization needed for consistency, diagnostic accuracy and reference validity.

Share this book

Add to My Shelf

Publisher

Frontiers Media S.A

Subject

[18F]FDG PET/CT

/ artificial intelligence

/ Chatbot

/ DeepSeek-R1

/ GPT-5.3

/ patient communication