Can AI Guide Us? Comparing ChatGPT-4 and Deepseek in the Management of Postprostatectomy Incontinence

Pinto, Vicktor Bruno Pereira; Gaspar, Cristiane de Barros; Nascimento, Lucas Ant&#244;nio Pereira; Ataides, Romullo Jos&#233; Costa; Alves, Priscila Ferreira; Pereira, Magnum Adriel Santos; Macedo Filho, Manoel Jos&#233;; Bessa Junior, Jos&#233;; Gomes, Cristiano Mendes

Can AI Guide Us? Comparing ChatGPT-4 and Deepseek in the Management of Postprostatectomy Incontinence

Pinto V¹, Gaspar C¹, Nascimento L¹, Ataides R¹, Alves P¹, Pereira M², Macedo Filho M³, Bessa Junior J⁴, Gomes C¹

Research Type

Pure and Applied Science / Translational

Abstract Category

Urotechnology

Links
	Best in Category Prize: Urotechnology
	Abstract 47
	Urology 2 - Male Stress Urinary Incontinence Scientific Podium Short Oral Session 4
	Thursday 18th September 2025
	12:15 - 12:22
	Parallel Hall 2
	Stress Urinary Incontinence Male Outcomes Research Methods
	1. University of Sao Paulo School of Medicine, 2. Hospital do Servidor Publico Estadual, 3. UNDB University Center, 4. State University of Feira de Santana
Presenter
V Vicktor Bruno Pereira Pinto University of Sao Paulo School of Medicine
Edit Abstract Abstract Centre

Abstract

Hypothesis / aims of study

The recent advances in Artificial Intelligence (AI) have led to the emergence of increasingly sophisticated language models, such as ChatGPT-4 and Deepseek, which are gaining popularity among healthcare professionals as potential tools for clinical decision support. We aimed to compare the accuracy and clinical relevance of recommendations provided by ChatGPT-4 and Deepseek regarding the assessment and management of postprostatectomy urinary incontinence (PPUI).

Study design, materials and methods

A total of 20 questions were prepared by urologists with expertise in PPUI. The questions had uncontroversial answers based on the Incontinence after Prostate Treatment: AUA/SUFU Guideline. Ten were conceptual questions and ten were based on clinical cases, designed to evaluate the models’ ability to apply knowledge and critical thinking. All questions were submitted in English, anonymously (without IP identification), separately, to ChatGPT 4o and Deepseek. The engine was prompted to be specific and limit the answers to 200 words for greater objectivity and was not prompted to incorporate any specific guideline. Each question was entered as a separate, independent prompt using the “New Chat” function. AI generated answers were independently analyzed by the experts who provided the questions. The accuracy of each response was graded as (A) Correct (1 point); (B) partially correct (0.5 point); or (C)Incorrect (0 point).

Results

ChatGPT had a global accuracy of 95% (19 out of 20 questions), with 90% accuracy in conceptual questions (9 correct answers) and 100% in clinical cases. Deepseek reached a global accuracy of 72.5%, with 80% accuracy in conceptual questions (8 correct answers) and 65% in clinical cases (6.5 correct answers). Deepseek showed more partial answers and incorrect interpretations in questions addressing treatment options, complications, and special clinical situations. The Table shows examples of performance differences between the two AI models across various domains.

Interpretation of results

ChatGPT had a global accuracy of 95% (19 out of 20 questions), with 90% accuracy in conceptual questions (9 correct answers) and 100% in clinical cases. Deepseek reached a global accuracy of 72.5%, with 80% accuracy in conceptual questions (8 correct answers) and 65% in clinical cases (6.5 correct answers). Deepseek showed more partial answers and incorrect interpretations in questions addressing treatment options, complications, and special clinical situations. The Table shows examples of performance differences between the two AI models across various domains.

Concluding message

Both AI tools demonstrated potential to support clinical reasoning in the context of PPUI. However, ChatGPT outperformed Deepseek in both accuracy and consistency, especially in complex clinical scenarios. Despite promising results, careful human validation remains essential before incorporating AI-generated recommendations into clinical practice.

Figure 1

Figure 2

Disclosures

Funding none Clinical Trial No Subjects None

Citation

Continence 15S (2025) 101971
DOI: 10.1016/j.cont.2025.101971

International Continence Society

Can AI Guide Us? Comparing ChatGPT-4 and Deepseek in the Management of Postprostatectomy Incontinence

Pinto V1, Gaspar C1, Nascimento L1, Ataides R1, Alves P1, Pereira M2, Macedo Filho M3, Bessa Junior J4, Gomes C1

Abstract

Pinto V¹, Gaspar C¹, Nascimento L¹, Ataides R¹, Alves P¹, Pereira M², Macedo Filho M³, Bessa Junior J⁴, Gomes C¹