Home
Abstract
My Abstract(s)
Login
ePosters
Back
Final Presentation Format
Moderated Poster Abstract
Eposter Presentation
Eposter in PDF Format
Accept format: PDF. The file size should not be more than 5MB
Eposter in Image Format
Accept format: PNG/JPG/WEBP. The file size should not be more than 2MB
Presentation Date / Time
Submission Status
Submitted
Abstract
Abstract Title
Vulnerabilities to stealth prompt injection attacks in medical recommendation systems using large-scale language models: Focusing on urology
Presentation Type
Moderated Poster Abstract
Manuscript Type
Basic Research
Abstract Category *
AI in Urology
Author's Information
Number of Authors (including submitting/presenting author) *
1
No more than 10 authors can be listed (as per the Good Publication Practice (GPP) Guidelines).
Please ensure the authors are listed in the right order.
Country
Korea (Republic of)
Co-author 1
Jungyo Suh crazyslime@gmail.com Asan Medical Center Urology Seoul Korea (Republic of) *
Co-author 2
-
Co-author 3
-
Co-author 4
-
Co-author 5
-
Co-author 6
-
Co-author 7
-
Co-author 8
-
Co-author 9
-
Co-author 10
-
Co-author 11
Co-author 12
Co-author 13
Co-author 14
Co-author 15
Co-author 16
Co-author 17
Co-author 18
Co-author 19
Co-author 20
Abstract Content
Introduction
This study aims to evaluate the susceptibility of clinical recommendation systems using large language models (LLMs) to stealth prompt injection attacks in simulated urologic situations of dialogues, specifically assessing whether controlled prompt injections could covertly influence model-generated treatment recommendations.
Materials and Methods
A controlled, paired-design simulation study was conducted using Google's gemma-2-2b-it LLM. Dialogue samples were derived from urologic scenarios in the MedQA-USMLE dataset, explicitly excluding pediatric cases. Each simulated conversation consisted of eight alternating conversational turns between the user and LLM, covering urologic symptoms, preliminary diagnoses, treatment recommendations, and follow-up queries. Stealth prompt injections, promoting complementary therapies such as "red ginseng," were covertly inserted at critical conversational points (turns 4, 6, and 8), employing obfuscated Python scripts (PyArmor and Cython) to simulate realistic third-party attacks. Outcomes measured included recommendation strength, response time, coherence scores, medical term density, readability (Flesch Reading Ease), and inter-turn correlation.
Results
Prompt injections significantly increased the frequency and intensity of red ginseng recommendations by turn 6 (Injection: 78% vs. Control: 0%, p<0.001), with persistent effects at turn 8 (Injection: 63% vs. Control: 0%, p<0.001). Response times were slightly prolonged in the Injection group at later dialogue turns, although differences did not reach statistical significance. By turn 8, medical term density was significantly higher in injected dialogues (p=0.032), and coherence scores were significantly reduced compared to controls (Injection median: 0.40 vs. Control median: 0.42, p<0.001). Injection dialogues also demonstrated lower inter-turn correlations, indicating subtle disruptions in conversational consistency.
Conclusions
Urology-specific recommendations generated by clinical LLMs are demonstrably susceptible to covert prompt injection attacks, significantly affecting therapeutic recommendations and dialogue coherence. These findings highlight an urgent need for advanced detection methods, robust mitigation strategies, and regulatory oversight to ensure the safety and reliability of AI-supported clinical decision-making in urology.
Keywords
Stealth prompt injection, large language models, medical recommendation systems, urology, AI security, conversational AI, red ginseng, coherence analysis
Figure 1
https://storage.unitedwebnetwork.com/files/1237/ae38f144c6fa2df602fab0ffba703872.jpg
Figure 1 Caption
Comparison of recommendation strength for red ginseng across dialogue turns.
Figure 2
https://storage.unitedwebnetwork.com/files/1237/2e71d38836b67440f3d89007a032b68e.jpg
Figure 2 Caption
Comparison of response time, medical term density, coherence scores, and readability (Flesch Reading Ease) by dialogue turns between Injection and Control groups.
Figure 3
https://storage.unitedwebnetwork.com/files/1237/ddba2901e6d6cc738180af3e24679a39.jpg
Figure 3 Caption
Comparison of inter-turn correlations for response time, coherence scores, medical term density, and Flesch Reading Ease scores between Injection and Control groups.
Figure 4
Figure 4 Caption
Figure 5
Figure 5 Caption
Character Count
1905
Vimeo Link
Presentation Details
Session
Free Paper Moderated Poster(08): Transplantation & AI & Training/Education
Date
Aug. 16 (Sat.)
Time
13:40 - 13:44
Presentation Order
1