Bladder diary analysis using artificial intelligence

Zamudio Martínez A1, Bouchard B2, Hashim H2

Research Type

Clinical

Abstract Category

Overactive Bladder

Abstract 85
Urology 3 - Overactive Bladder
Scientific Podium Short Oral Session 8
Thursday 18th September 2025
14:00 - 14:07
Parallel Hall 3
Detrusor Overactivity Incontinence Overactive Bladder Voiding Diary
1. Instituto Tecnológico y de Estudios Superiores de Monterrey, 2. Functional and Reconstructive Urology and Urodynamics Unit, Bristol Urological Institute, Southmead hospital, North Bristol NHS Trust
Presenter
Links

Abstract

Hypothesis / aims of study
Introduction and Objectives 

Artificial intelligence (AI), designed to perform human cognitive tasks, is transforming various sectors in modern life, and health care is no exception. Daily advancements highlight AI’s potential to improve medical practice, benefiting both physicians and patients. However, it is essential to question whether these advantages are being directed appropriately, particularly as a support tool in overcrowded health systems. 

Bladder diary analysis is a key diagnostic tool for lower urinary tract dysfunction but is often time-consuming. This study compares bladder diary analyses performed by different AI models with those conducted by a clinician to assess the potential for AI as a routine tool in urological practice.
Study design, materials and methods
Randomly selected 3-day ICIQ-bladder diaries were analyzed using four different AI models (Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Copilot, Gemini, and Jasper). The diaries were manually completed by patients, scanned and analyzed. ​A single clinician analyzed each bladder diary, recording results in an Excel spreadsheet. The time taken for each analysis was noted to estimate the average review duration. 

 This study employs descriptive statistics to calculate the mean and standard deviation (SD) for each parameter across the different AI models and the clinician. To assess agreement between these sources, Intraclass Correlation Coefficient (ICC) and Kappa statistics are utilized. Statistical significance is determined using p-values, with p < 0.05 indicating significant differences between methods.
Results
Twenty five bladder diaries were analyzed. Two of the AI models were unable to analyze the data; Jasper AI was not designed for image analysis, and Gemini AI provided steps for manual calculation of the bladder diary rather than performing direct analysis itself. The comparative analysis of bladder diary parameters between ChatGPT, Microsoft Copilot, and the clinician reference values reveals varying levels of agreement. Parameters like maximum voided volume and pad usage showed high reliability between AI and clinician (ICC>0.80). However, other metrics, including nocturnal urine volume, micturition frequency, and nocturnal polyuria index, exhibited statistically significant differences (p < 0.05), with mean discrepancies up to 1000 mL in urine volume and 4-5 in voiding frequency.
Interpretation of results
Incorporating AI into bladder diary analysis offers valuable enhancements but requires standardizing how patients complete these charts to ensure compatibility, the different results obtained can be directly associated with the patient’s handwriting and interpretations of numbers, showing that changes in human handwriting can affect AI perception, for example confusing similar-looking number such as 5s and 8s. If the diaries were digitally filled and integrated into software with built-in AI capabilities, the process would be simplified for clinicians. Nevertheless, this could impose an additional burden on patients, potentially discouraging them from consistently completing the bladder diaries. The time required to analyze each bladder diary varied across cases, influenced by the quality of the bladder diary filling and handwriting. The mean analysis time was 5:08 minutes, with the shortest analysis taking 3:47 minutes and the longest 8:29 minutes. Implementing AI in bladder diary evaluations can automate data extraction and classification, significantly reducing the time required for analysis. This automation enhances efficiency, ensures consistent and accurate assessments, and alleviates clinician workload.
Concluding message
AI models, particularly those designed for pattern analysis, are not yet fully comparable to human analysis for critical aspects of clinical practice. The inconsistencies found in this study suggest that AI software may classify hand-filled data differently from the clinician, resulting in significant variations in metrics. ​While some parameters align well, differences in classification methods suggest that certain AI-generated values require closer calibration to meet clinician standards. Improper use of these tools could negatively impact patient treatment. Therefore, further development and refinement are necessary before integrating them into daily urological practice.
Figure 1
Figure 2 Figure 1: Differences in handwriting can influence AI interpretation of numbers. Notably, the similarities between the numbers 2, 8, 3, and 5 were among the most commonly confused by AI, leading to inaccuracies in the data.
References
  1. EAU Guidelines on Non-neurogenic Female LUTS - Uroweb. Uroweb.org. Published 2021.
  2. Shah M, Naik N, Somani BK, Hameed Z. Artificial intelligence (AI) in urology-Current use and future directions: An iTRUE study. Turkish journal of urology. 2020;46(Supp1):S27-S39. doi:https://doi.org/10.5152/tud.2020.20117
  3. Abrams P et al. The standardization of terminology of the lower urinary tract function: Report from Standardization sub-committee of International Continence Society. Neurourol and Urodynam 21:167-178(2002).
Disclosures
Funding This abstract was created during the observership awarded by Laborie as part of the ICS Observership Award Clinical Trial No Subjects None
06/07/2025 02:14:51