- UAA Congress 2025

Final Presentation Format

Moderated Poster Abstract

Eposter Presentation

Eposter in PDF Format

Accept format: PDF. The file size should not be more than 5MB

Eposter in Image Format

Accept format: PNG/JPG/WEBP. The file size should not be more than 2MB

Presentation Date / Time

Submission Status

Submitted

Abstract

Abstract Title

Vulnerabilities to stealth prompt injection attacks in medical recommendation systems using large-scale language models: Focusing on urology

Presentation Type

Moderated Poster Abstract

Manuscript Type

Basic Research

Abstract Category *

AI in Urology

Author's Information

Number of Authors (including submitting/presenting author) *

1

No more than 10 authors can be listed (as per the Good Publication Practice (GPP) Guidelines).
Please ensure the authors are listed in the right order.

Country

Korea (Republic of)

Co-author 1

Jungyo Suh crazyslime@gmail.com Asan Medical Center Urology Seoul Korea (Republic of) *

Co-author 2

-

Co-author 3

-

Co-author 4

-

Co-author 5

-

Co-author 6

-

Co-author 7

-

Co-author 8

-

Co-author 9

-

Co-author 10

-

Co-author 11

Co-author 12

Co-author 13

Co-author 14

Co-author 15

Co-author 16

Co-author 17

Co-author 18

Co-author 19

Co-author 20

Abstract Content

Introduction

This study aims to evaluate the susceptibility of clinical recommendation systems using large language models (LLMs) to stealth prompt injection attacks in simulated urologic situations of dialogues, specifically assessing whether controlled prompt injections could covertly influence model-generated treatment recommendations.

Materials and Methods

A controlled, paired-design simulation study was conducted using Google's gemma-2-2b-it LLM. Dialogue samples were derived from urologic scenarios in the MedQA-USMLE dataset, explicitly excluding pediatric cases. Each simulated conversation consisted of eight alternating conversational turns between the user and LLM, covering urologic symptoms, preliminary diagnoses, treatment recommendations, and follow-up queries. Stealth prompt injections, promoting complementary therapies such as "red ginseng," were covertly inserted at critical conversational points (turns 4, 6, and 8), employing obfuscated Python scripts (PyArmor and Cython) to simulate realistic third-party attacks. Outcomes measured included recommendation strength, response time, coherence scores, medical term density, readability (Flesch Reading Ease), and inter-turn correlation.

Results

Prompt injections significantly increased the frequency and intensity of red ginseng recommendations by turn 6 (Injection: 78% vs. Control: 0%, p<0.001), with persistent effects at turn 8 (Injection: 63% vs. Control: 0%, p<0.001). Response times were slightly prolonged in the Injection group at later dialogue turns, although differences did not reach statistical significance. By turn 8, medical term density was significantly higher in injected dialogues (p=0.032), and coherence scores were significantly reduced compared to controls (Injection median: 0.40 vs. Control median: 0.42, p<0.001). Injection dialogues also demonstrated lower inter-turn correlations, indicating subtle disruptions in conversational consistency.

Conclusions

Urology-specific recommendations generated by clinical LLMs are demonstrably susceptible to covert prompt injection attacks, significantly affecting therapeutic recommendations and dialogue coherence. These findings highlight an urgent need for advanced detection methods, robust mitigation strategies, and regulatory oversight to ensure the safety and reliability of AI-supported clinical decision-making in urology.

Keywords

Stealth prompt injection, large language models, medical recommendation systems, urology, AI security, conversational AI, red ginseng, coherence analysis

Figure 1

https://storage.unitedwebnetwork.com/files/1237/ae38f144c6fa2df602fab0ffba703872.jpg

Figure 1 Caption

Comparison of recommendation strength for red ginseng across dialogue turns.

Figure 2

https://storage.unitedwebnetwork.com/files/1237/2e71d38836b67440f3d89007a032b68e.jpg

Figure 2 Caption

Comparison of response time, medical term density, coherence scores, and readability (Flesch Reading Ease) by dialogue turns between Injection and Control groups.

Figure 3

https://storage.unitedwebnetwork.com/files/1237/ddba2901e6d6cc738180af3e24679a39.jpg

Figure 3 Caption

Comparison of inter-turn correlations for response time, coherence scores, medical term density, and Flesch Reading Ease scores between Injection and Control groups.

Figure 4

Figure 4 Caption

Figure 5

Figure 5 Caption

Character Count

1905

Vimeo Link

Presentation Details

Session

Free Paper Moderated Poster(08): Transplantation & AI & Training/Education

Date

Aug. 16 (Sat.)

Time

13:40 - 13:44

Presentation Order

1