Moderated Poster Abstract
Eposter Presentation
 
Accept format: PDF. The file size should not be more than 5MB
 
Accept format: PNG/JPG/WEBP. The file size should not be more than 2MB
 
Submitted
Abstract
Vulnerabilities to stealth prompt injection attacks in medical recommendation systems using large-scale language models: Focusing on urology
Moderated Poster Abstract
Basic Research
AI in Urology
Author's Information
1
No more than 10 authors can be listed (as per the Good Publication Practice (GPP) Guidelines).
Please ensure the authors are listed in the right order.
Korea (Republic of)
Jungyo Suh crazyslime@gmail.com Asan Medical Center Urology Seoul Korea (Republic of) *
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
Abstract Content
This study aims to evaluate the susceptibility of clinical recommendation systems using large language models (LLMs) to stealth prompt injection attacks in simulated urologic situations of dialogues, specifically assessing whether controlled prompt injections could covertly influence model-generated treatment recommendations.
A controlled, paired-design simulation study was conducted using Google's gemma-2-2b-it LLM. Dialogue samples were derived from urologic scenarios in the MedQA-USMLE dataset, explicitly excluding pediatric cases. Each simulated conversation consisted of eight alternating conversational turns between the user and LLM, covering urologic symptoms, preliminary diagnoses, treatment recommendations, and follow-up queries. Stealth prompt injections, promoting complementary therapies such as "red ginseng," were covertly inserted at critical conversational points (turns 4, 6, and 8), employing obfuscated Python scripts (PyArmor and Cython) to simulate realistic third-party attacks. Outcomes measured included recommendation strength, response time, coherence scores, medical term density, readability (Flesch Reading Ease), and inter-turn correlation.
Prompt injections significantly increased the frequency and intensity of red ginseng recommendations by turn 6 (Injection: 78% vs. Control: 0%, p<0.001), with persistent effects at turn 8 (Injection: 63% vs. Control: 0%, p<0.001). Response times were slightly prolonged in the Injection group at later dialogue turns, although differences did not reach statistical significance. By turn 8, medical term density was significantly higher in injected dialogues (p=0.032), and coherence scores were significantly reduced compared to controls (Injection median: 0.40 vs. Control median: 0.42, p<0.001). Injection dialogues also demonstrated lower inter-turn correlations, indicating subtle disruptions in conversational consistency.
Urology-specific recommendations generated by clinical LLMs are demonstrably susceptible to covert prompt injection attacks, significantly affecting therapeutic recommendations and dialogue coherence. These findings highlight an urgent need for advanced detection methods, robust mitigation strategies, and regulatory oversight to ensure the safety and reliability of AI-supported clinical decision-making in urology.
Stealth prompt injection, large language models, medical recommendation systems, urology, AI security, conversational AI, red ginseng, coherence analysis
https://storage.unitedwebnetwork.com/files/1237/ae38f144c6fa2df602fab0ffba703872.jpg
Comparison of recommendation strength for red ginseng across dialogue turns.
https://storage.unitedwebnetwork.com/files/1237/2e71d38836b67440f3d89007a032b68e.jpg
Comparison of response time, medical term density, coherence scores, and readability (Flesch Reading Ease) by dialogue turns between Injection and Control groups.
https://storage.unitedwebnetwork.com/files/1237/ddba2901e6d6cc738180af3e24679a39.jpg
Comparison of inter-turn correlations for response time, coherence scores, medical term density, and Flesch Reading Ease scores between Injection and Control groups.
 
 
 
 
1905
 
Presentation Details
Free Paper Moderated Poster(08): Transplantation & AI & Training/Education
Aug. 16 (Sat.)
13:40 - 13:44
1