Hypothesis / aims of study
The success rates of transobturator male slings have been reported as >70%. Clinical characteristics that were identified as predictors of success based on traditional regression included concomitant urge incontinence symptoms and preoperative SUI severity.5 Machine learning (ML) has been utilized in clinical medicine, especially in area of personalized medicine, to use data to predict and classify patients based on diagnosis or likelihood of treatment success.6 Despite its potential benefits, the use of ML has not been applied in creating a model to predict male SUI patients who may benefit from transobturator male slings. Hence, this investigation aims to develop an ML algorithm to predict male SUI patients who will have success following transobturator male sling insertion.
Study design, materials and methods
All transobturator male sling cases from August 2006 to June 2012 by a single surgeon were reviewed. Outcome of interest was defined as ‘cure’: complete dryness with 0 pads used, without the need for additional procedures. The machine learning classifier models assessed included K-nearest neighbor (KNN), Naïve Bayes, Decision Tree, and Random Forest. The clinical variables included in machine learning models were number of pads used daily, age, height, weight, race, incontinence type, etiology of incontinence, concomitant procedures, history of radiation, smoking, bladder neck contracture, and prostatectomy. The outcome of interest was cure, defined as complete dryness with 0 wet pads. Model performance was assessed using AUROC, AUPRC, and F1-score.
Python 3.8.2 (Python Software Foundation, http://python.org) was used for model development. Following confirmation of lack of significant outliers, the continuous variables were standardized and scaled to each feature, avoiding biases due to variables being measured at different scales and contributing unequally to models. The models were built using an 85:15 train-test split (85% of data used for model training, 15% used for model performance evaluation). Grid search was performed to optimize and tune the hyperparameters of KNN and Random Forest models. Based on the highest performing algorithm, further ensemble learning method was applied. For this study, bagging method was chosen as it allows a random sample data in a training set to be selected with replacement to allow individual data to be utilized more than once. These new samples are trained independently and parallel to each other. In the end, based on these independently trained models, the majority prediction is taken and used to produce a more accurate outcome. The KNN algorithm was chosen based on its grid search values to undergo bagging based on its grid search values.
Results
A total of 215 patients were identified and included in our analysis. The mean follow up was 56.4 months (SD 41.6). 44.7% (96/215) of patients had procedural success.
The four classifier models were developed using our data. Grid search showed that the best hyperparameters for Random Forest classifier was n-estimator of 57 and best hyperparameters for KNN was n-neighbor of 23 with uniform weight. The AUROC and AUPRC were developed for all models (Figure 1, Figure 2, respectively).
KNN model had the highest performance among the four models with AUROC of 0.759, AUPRC of 0.882, and F1-score of 0.833. The KNN model was developed further using the bagging method. Following bagging, the KNN model was able to be further improved to AUROC of 0.850, AUPRC of 0.907, and F-1 score of 0.882.
Interpretation of results
There are several limitations to this study. The first is the sample size – there were limited number of patients involved in developing this model, which may make our model noisy with high degree of variance. Nonetheless, as KNN performed the best among the four models that were initially assessed, we attempted to decrease the amount of noise and variance that would be affecting the prediction and were able to do so with ensemble learning. While the high predictive potential from our current model suggests there may be a role for utilization of machine learning models in clinical practice of male stress urinary incontinence, its immediate utilization and generalization is limited with both small numbers and single surgeon series of patients. As the model was developed using a retrospectively collected data, there may also be predisposed to selection bias. Moreover, the outcome measures did not utilize validated questions such as ICIQ-SF to assess patient-reported outcomes. However, as our outcome measure was complete dryness or zero pads used, there may be less subjectivity in reporting by patients.