Non-Moderated Poster Abstract
Eposter Presentation
https://storage.unitedwebnetwork.com/files/1237/a65d422a1979d786a4b5fdd47fa7c4dd.pdf
Accept format: PDF. The file size should not be more than 5MB
https://storage.unitedwebnetwork.com/files/1237/467fe96c67e58145bc235a7f9d8a995a.jpg
Accept format: PNG/JPG/WEBP. The file size should not be more than 2MB
 
Submitted
Abstract
Evaluating Large Language Models for Clinical Documentation in Urology: A QNOTE-Based Comparison of GPT-4 and GPT-4o
Podium Abstract
Clinical Research
AI in Urology
Author's Information
5
No more than 10 authors can be listed (as per the Good Publication Practice (GPP) Guidelines).
Please ensure the authors are listed in the right order.
Taiwan
Liang-Chen Huang sam831009@gmail.com National Taiwan University Hospital Urology Taipei City Taiwan *
Yun-Sheng Wu b07401082@ntu.edu.tw National Taiwan University Hospital Taipei City Taiwan -
Jung-Yang Yu ericyu29218218@gmail.com National Taiwan University Hospital Urology Taipei City Taiwan -
Chung-Cheng Wang ericwcc@ms27.hinet.net En Chu Kong Hospital Urol New Taipei City Taiwan -
Jian-Hua Hong cliffordhong622@gmail.com National Taiwan University Hospital Urology Taipei City Taiwan -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Abstract Content
The urology department frequently manages a high volume of inpatient elective procedures, contributing to a demanding clinical environment. The integration of Artificial Intelligence (AI) has the potential to streamline repetitive tasks, particularly in clinical documentation. This study aims to assess the performance and accuracy of generative AI and large language models (GAI/LLMs) in producing admission summaries based on outpatient clinic notes.
Patients undergoing inpatient elective procedures, arranged through outpatient clinic visits between January and July 2024, were included in this study. AI models, GPT-4 (Model 1) and GPT-4o (Model 2), were prompted to generate admission summaries based on a single original clinical input derived from each patient’s outpatient clinic note. The quality of the generated summaries was assessed using the QNOTE scoring system, a non-disease-specific, 12-category, 44-item rubric evaluating the quality of clinical documentation across various domains.
A total of 60 patients were included in the evaluation. Both AI models generated high-quality admission summaries, with Model 1 (GPT-4) achieving an average QNOTE score of 86.17, and Model 2 (GPT-4o) scoring 92.68 (out of 100). Both models received perfect scores in several categories. However, Model 2 consistently outperformed Model 1 in subjective assessments and across multiple QNOTE domains. The distribution of QNOTE scores for admission notes generated from the original clinical input, comparing Model 1 and Model 2, is shown in Figure 1.
GAI/LLMs demonstrate the ability to generate high-quality admission summaries for inpatient elective urology procedures based on a single outpatient clinic note. GPT-4o outperformed GPT-4 in both objective and subjective evaluations. While these AI models show strong potential, clinicians should review the generated summaries for accuracy and consistency. Further research with larger sample sizes and continued development of AI models is necessary to validate these findings and refine their clinical application.
Electronic medical record, Large language model, Efficiency
https://storage.unitedwebnetwork.com/files/1237/7f73abab6ad0a760baab10251d5b9b51.jpg
Qualitative assessment of the clinical note. The bars represent the percentage of different components of the 12 elements of QNOTE and an overall note score located at the bottom of the chart.
 
 
 
 
 
 
 
 
1551
 
Presentation Details