Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
60
result(s) for
"Cacciamani, Giovanni E."
Sort by:
A framework for human evaluation of large language models in healthcare derived from literature review
by
McCarthy, Karleigh R.
,
Osterhoudt, Hunter
,
Sivarajkumar, Sonish
in
692/308
,
692/700
,
Biomedicine
2024
With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.
Journal Article
Study protocol for the Intraoperative Complications Assessment and Reporting with Universal Standards (ICARUS) global cross-specialty surveys and consensus
by
Desai, Mihir M.
,
Sholklapper, Tamir
,
Eppler, Michael B.
in
Care and treatment
,
Consensus
,
Diagnosis
2024
Annually, about 300 million surgeries lead to significant intraoperative adverse events (iAEs), impacting patients and surgeons. Their full extent is underestimated due to flawed assessment and reporting methods. Inconsistent adoption of new grading systems and a lack of standardization, along with litigation concerns, contribute to underreporting. Only half of relevant journals provide guidelines on reporting these events, with a lack of standards in surgical literature. To address these issues, the Intraoperative Complications Assessment and Reporting with Universal Standard (ICARUS) Global Surgical Collaboration was established in 2022. The initiative involves conducting global surveys and a Delphi consensus to understand the barriers for poor reporting of iAEs, validate shared criteria for reporting, define iAEs according to surgical procedures, evaluate the existing grading systems’ reliability, and identify strategies for enhancing the collection, reporting, and management of iAEs. Invitation to participate are extended to all the surgical specialties, interventional cardiology, interventional radiology, OR Staffs and anesthesiology. This effort represents an essential step towards improved patient safety and the well-being of healthcare professionals in the surgical field.
Journal Article
GPT-4 generates accurate and readable patient education materials aligned with current oncological guidelines: A randomized assessment
by
Baekelandt, Loïc
,
Veccia, Alessandro
,
De Backer, Pieter
in
Accuracy
,
Artificial Intelligence
,
Automation
2025
Guideline-based patient educational materials (PEMs) empower patients and reduce misinformation, but require frequent updates and must be adapted to the readability level of patients. The aim is to assess whether generative artificial intelligence (GenAI) can provide readable, accurate, and up-to-date PEMs that can be subsequently translated into multiple languages for broad dissemination.
The European Association of Urology (EAU) guidelines for prostate, bladder, kidney, and testicular cancer were used as the knowledge base for GPT-4 to generate PEMs. Additionally, the PEMs were translated into five commonly spoken languages within the European Union (EU). The study was conducted through a single-blinded, online randomized assessment survey. After an initial pilot assessment of the GenAI-generated PEMs, thirty-two members of the Young Academic Urologists (YAU) groups evaluated the accuracy, completeness, and clarity of the original versus GPT-generated PEMs. The translation assessment involved two native speakers from different YAU groups for each language: Dutch, French, German, Italian, and Spanish. The primary outcomes were readability, accuracy, completeness, faithfulness, and clarity. Readability was measured using Flesch Kincaid Reading Ease (FKRE), Flesch Kincaid Grade Level (FKGL), Gunning Fog (GFS) scores and Smog (SI), Coleman Liau (CLI), Automated Readability (ARI) indexes. Accuracy, completeness, faithfulness, and clarity were rated on a 5-item Likert scale.
The mean time to create layperson PEMs based on the latest guideline by GPT-4 was 52.1 seconds. The readability scores for the 8 original PEMs were lower than for the 8 GPT-4-generated PEMs (Mean FKRE: 43.5 vs. 70.8; p < .001). The required reading education levels were higher for original PEMs compared to GPT-4 generated PEMs (Mean FKGL: 11.6 vs. 6.1; p < .001). For all urological localized cancers, the original PEMs were not significantly different from the GPT-4 generated PEMs in accuracy, completeness, and clarity. Similarly, no differences were observed for metastatic cancers. Translations of GPT-generated PEMs were rated as faithful in 77.5% of cases and clear in 67.5% of cases.
GPT-4 generated PEMs have better readability levels compared to original PEMs while maintaining similar accuracy, completeness, and clarity. The use of GenAI's information extraction and language capabilities, integrated with human oversight, can significantly reduce the workload and ensure up-to-date and accurate PEMs.
Some cancer facts made for patients can be hard to read or not in the right words for those with prostate, bladder, kidney, or testicular cancer. This study used AI to quickly make short and easy-to-read content from trusted facts. Doctors checked the AI content and found that they were just as accurate, complete, and clear as the original text made for patients. They also worked well in many languages. This AI tool can assist providers in making it easier for patients to understand their cancer and the best care they can get.
Journal Article
Performance of Narrow Band Imaging (NBI) and Photodynamic Diagnosis (PDD) Fluorescence Imaging Compared to White Light Cystoscopy (WLC) in Detecting Non-Muscle Invasive Bladder Cancer: A Systematic Review and Lesion-Level Diagnostic Meta-Analysis
2021
Despite early detection and regular surveillance of non-muscle invasive bladder cancer (NMIBC), recurrence and progression rates remain exceedingly high for this highly prevalent malignancy. Limited visualization of malignant lesions with standard cystoscopy and associated false-negative biopsy rates have been the driving force for investigating alternative and adjunctive technologies for improved cystoscopy. The aim of our systematic review and meta-analysis was to compare the sensitivity, specificity, and oncologic outcomes of photodynamic diagnosis (PDD) fluorescence, narrow band imaging (NBI), and conventional white light cystoscopy (WLC) in detecting NMIBC. Out of 1,087 studies reviewed, 17 prospective non-randomized and randomized controlled trials met inclusion criteria for the study. We demonstrated that tumor resection with either PDD and NBI exhibited lower recurrence rates and greater diagnostic sensitivity compared to WLC alone. NBI demonstrated superior disease sensitivity and specificity as compared to WLC and an overall greater hierarchical summary receiver operative characteristic. Our findings are consistent with emerging guidelines and underscore the value of integrating these enhanced technologies as a part of the standard care for patients with suspected or confirmed NMIBC.
Journal Article
Transperineal vs transrectal magnetic resonance and ultrasound image fusion prostate biopsy: a pair-matched comparison
2023
The objective of this study was to compare transperineal (TP) versus transrectal (TR) magnetic resonance imaging (MRI) and transrectal ultrasound (TRUS) fusion prostate biopsy (PBx). Consecutive men who underwent prostate MRI followed by a systematic biopsy. Additional target biopsies were performed from Prostate Imaging Reporting & Data System (PIRADS) 3–5 lesions. Men who underwent TP PBx were matched 1:2 with a synchronous cohort undergoing TR PBx by PSA, Prostate volume (PV) and PIRADS score. Endpoint of the study was the detection of clinically significant prostate cancer (CSPCa; Grade Group ≥ 2). Univariate and multivariable analyses were performed. Results were considered statistically significant if p < 0.05. Overall, 504 patients met the inclusion criteria. A total of 168 TP PBx were pair-matched to 336 TR PBx patients. Baseline demographics and imaging characteristics were similar between the groups. Per patient, the CSPCa detection was 2.1% vs 6.3% (p = 0.4) for PIRADS 1–2, and 59% vs 60% (p = 0.9) for PIRADS 3–5, on TP vs TR PBx, respectively. Per lesion, the CSPCa detection for PIRADS 3 (21% vs 16%; p = 0.4), PIRADS 4 (51% vs 44%; p = 0.8) and PIRADS 5 (76% vs 84%; p = 0.3) was similar for TP vs TR PBx, respectively. However, the TP PBx showed a longer maximum cancer core length (11 vs 9 mm; p = 0.02) and higher cancer core involvement (83% vs 65%; p < 0.001) than TR PBx. Independent predictors for CSPCa detection were age, PSA, PV, abnormal digital rectal examination findings, and PIRADS 3–5. Our study demonstrated transperineal MRI/TRUS fusion PBx provides similar CSPCa detection, with larger prostate cancer core length and percent of core involvement, than transrectal PBx.
Journal Article
Reporting guidelines for studies involving generative artificial intelligence applications: what do I use, and when?
2025
With a growing number of studies applying generative artificial intelligence (GAI) models for health purposes, reporting standards are being developed to guide authors in this space. We describe the currently available reporting guidelines that apply to GAI models and provide an overview of upcoming reporting standards. Investigators must remain up-to-date with the most applicable tools to guide the comprehensive reporting of their research as we integrate GAI in healthcare.
Journal Article
Robotic versus open urological oncological surgery: study protocol of a systematic review and meta-analysis
by
Gill, Inderbir S
,
Cacciamani, Giovanni E
,
Gill, Karanvir
in
Biopsy
,
Bladder cancer
,
Blood transfusions
2020
IntroductionMinimally invasive surgery in urology has grown considerably in application since its initial description in the early 1990s. Herein, we present the protocol for a systematic review and meta-analysis comparing open versus robotic urological oncological surgery for various clinically relevant outcomes, as well as to assess their comparative penetrance over the past 20 years (2000–2020).Methods and analysisWe will document the penetrance of robotic versus open surgery in the urological oncological field using a national database.Second, we will perform a systematic review and meta-analysis of all published full-text English and non-English language articles from Pubmed, Scopus and Web of Science search engines on surgical treatment of localised prostate, bladder, kidney and testis cancer published between 1st January 2000 to 10th January 2020. We will focus on the highest-volume urological oncological surgeries, namely, radical prostatectomy, radical cystectomy, partial nephrectomy, radical nephrectomy and retroperitoneal lymph node dissection. Study inclusion criteria will comprise clinical trials and prospective and retrospective studies (cohort or case–control series) comparing robotic versus open surgery. Exclusion criteria will comprise meta-analyses, multiple papers with overalapping study-periods, studies analysing national databases and case series describing only one approach (robotic or open). Risk of bias for included studies will be assessed by the appropriate Cochrane risk of bias tool. Principal outcomes assessed will include perioperative, functional, oncological survival and financial outcomes of open versus robotic uro-oncological surgery. Sensitivity analyses will be performed to correlate outcomes of interest with key baseline characteristics and surrogates of surgical expertise.Ethics and disseminationThis comprehensive systematic review and meta-analysis will provide rigorous, consolidated information on contemporary outcomes and trends of open versus robotic urological oncological surgery based on all the available literature. These aggregate data will help physicians better advise patients seeking surgical care for urological cancers.PROSPERO registration numberCRD42017064958.
Journal Article
ChatGPT: standard reporting guidelines for responsible use
by
Collins, Gary S.
,
Cacciamani, Giovanni E.
,
Gill, Inderbir S.
in
631/114/1305
,
706/689/112
,
Correspondence
2023
Letter to the Editor
Journal Article
The long but necessary road to responsible use of large language models in healthcare research
2024
Large language models (LLMs) have shown promise in reducing time, costs, and errors associated with manual data extraction. A recent study demonstrated that LLMs outperformed natural language processing approaches in abstracting pathology report information. However, challenges include the risks of weakening critical thinking, propagating biases, and hallucinations, which may undermine the scientific method and disseminate inaccurate information. Incorporating suitable guidelines (e.g., CANGARU), should be encouraged to ensure responsible LLM use.
Journal Article
ChatGPT and large language models (LLMs) awareness and use. A prospective cross-sectional survey of U.S. medical students
by
Ramacciotti, Lorenzo Storino
,
Yazdi, Bayan
,
O’Brien, Devon
in
Artificial intelligence
,
Chatbots
,
Data analysis
2024
Generative-AI (GAI) models like ChatGPT are becoming widely discussed and utilized tools in medical education. For example, it can be used to assist with studying for exams, shown capable of passing the USMLE board exams. However, there have been concerns expressed regarding its fair and ethical use. We designed an electronic survey for students across North American medical colleges to gauge their views on and current use of ChatGPT and similar technologies in May, 2023. Overall, 415 students from at least 28 medical schools completed the questionnaire and 96% of respondents had heard of ChatGPT and 52% had used it for medical school coursework. The most common use in pre-clerkship and clerkship phase was asking for explanations of medical concepts and assisting with diagnosis/treatment plans, respectively. The most common use in academic research was for proof reading and grammar edits. Respondents recognized the potential limitations of ChatGPT, including inaccurate responses, patient privacy, and plagiarism. Students recognized the importance of regulations to ensure proper use of this novel technology. Understanding the views of students is essential to crafting workable instructional courses, guidelines, and regulations that ensure the safe, productive use of generative-AI in medical school.
Journal Article