Hypothesis / aims of study
Patients increasingly turn to the internet for health information, traditionally using Google and YouTube (1-3).
The emergence of Large Language Models (LLMs) has transformed this landscape, shifting from traditional link-based search results to direct, synthesized answers.
(4-7). However, as health care providers, we lack the knowledge and understanding of how Artificial Intelligence (AI) generated health information compares to traditional sources in terms of quality, readability, and tone. We claim that the patient’s mindset informed by asserting knowledge from either traditional online searches or AI-generated results, might affect joint decision-making and case management.
Our Objective was to compare the tone, readability, and credibility of urogynecology content between traditional online sources (Google, YouTube) and AI-generated responses (ChatGPT, Claude, Gemini).
Study design, materials and methods
Study Design: A cross-sectional comparative analysis was conducted using 10 high-frequency urogynecologic search terms identified via Google Trends data (e.g., "Pelvic Organ Prolapse Surgery"). Each term was queried across traditional platforms Google Search (top 5 results) and YouTube (top 5 videos) and AI models (ChatGPT, Claude, Gemini), followed by a systematic evaluation of the generated content (Figure 1)
Platforms Compared:
• Traditional sources: Google Search (top 5 results) and YouTube (top 5 videos)
• AI sources: Three publicly available chatbots (ChatGPT, Claude, Gemini)
Analysis Tools:
• Emotional tone: VADER (lexicon-based) and BERT (AI-based) sentiment analysis
• Readability: Flesch-Kincaid Grade Level and SMOG Index (lower scores indicate higher readability)
• Information quality: DISCERN tool (measures reliability of treatment information, scored 0-80)
• Source credibility: JAMA benchmarks (checks for authorship and attribution criteria)
Statistical analyses were conducted using SPSS version 28, and statistical significance was defined as p < 0.05.
Results
AI-generated responses demonstrated distinct performance profiles compared to traditional sources, with significant differences observed in readability and structural credibility (Table 1). Regarding emotional tone, AI-generated responses leaned toward a neutral-to-positive profile, with higher mean lexicon-based sentiment scores than on traditional platforms, though the difference was not statistically significant (p=0.15).
AI responses outperformed traditional platforms in readability, providing significantly simpler and more accessible information for laypersons (p < 0.05).
AI-generated responses achieved higher informational quality, reflected by significantly higher mean DISCERN scores, particularly when compared to video-based platforms (p=0.02).. Conversely, traditional online sources more frequently met formal authorship and attribution criteria, as reflected in higher adherence to structural credibility benchmarks.
Interpretation of results
AI models significantly improve access to urogynecologic information by offering superior readability compared to traditional sources. While this empowers patients with more digestible information that shapes their clinical expectations, the lack of formal source attribution remains a limitation for informed decision-making