Hypothesis / aims of study
The recent advances in Artificial Intelligence (AI) have led to the emergence of increasingly sophisticated language models, such as ChatGPT-4 and Deepseek, which are gaining popularity among healthcare professionals as potential tools for clinical decision support. We aimed to compare the accuracy and clinical relevance of recommendations provided by ChatGPT-4 and Deepseek regarding the assessment and management of postprostatectomy urinary incontinence (PPUI).
Study design, materials and methods
A total of 20 questions were prepared by urologists with expertise in PPUI. The questions had uncontroversial answers based on the Incontinence after Prostate Treatment: AUA/SUFU Guideline. Ten were conceptual questions and ten were based on clinical cases, designed to evaluate the models’ ability to apply knowledge and critical thinking. All questions were submitted in English, anonymously (without IP identification), separately, to ChatGPT 4o and Deepseek. The engine was prompted to be specific and limit the answers to 200 words for greater objectivity and was not prompted to incorporate any specific guideline. Each question was entered as a separate, independent prompt using the “New Chat” function. AI generated answers were independently analyzed by the experts who provided the questions. The accuracy of each response was graded as (A) Correct (1 point); (B) partially correct (0.5 point); or (C)Incorrect (0 point).
Interpretation of results
ChatGPT had a global accuracy of 95% (19 out of 20 questions), with 90% accuracy in conceptual questions (9 correct answers) and 100% in clinical cases. Deepseek reached a global accuracy of 72.5%, with 80% accuracy in conceptual questions (8 correct answers) and 65% in clinical cases (6.5 correct answers). Deepseek showed more partial answers and incorrect interpretations in questions addressing treatment options, complications, and special clinical situations. The Table shows examples of performance differences between the two AI models across various domains.