Comparison of machine learning techniques to predict unplanned readmission following total shoulder arthroplasty.
Academic Article
Overview
abstract
BACKGROUND: Machine learning (ML) techniques have been shown to successfully predict postoperative complications for high-volume orthopedic procedures such as hip and knee arthroplasty and to stratify patients for risk-adjusted bundled payments. The latter has not been done for more heterogeneous, lower-volume procedures such as total shoulder arthroplasty (TSA) with equally limited discussion around strategies to optimize the predictive ability of ML algorithms. The purpose of this study was to (1) assess which of 5 ML algorithms best predicts 30-day readmission, (2) test select ML strategies to optimize the algorithms, and (3) report on which patient variables contribute most to risk prediction in TSA across algorithms. METHODS: We identified 9043 patients in the American College of Surgeons National Surgical Quality Improvement Database who underwent primary TSA between 2011 and 2015. Predictors included demographics, comorbidities, laboratory data, and intraoperative variables. The outcome of interest was 30-day unplanned readmission. Five ML algorithms-support-vector machine (SVM), logistic regression, random forest (RF), an adaptive boosting algorithm, and neural network-were trained on the derivation cohort (2011-2014 TSA patients) to predict 30-day unplanned readmission rates. After training, weights for each respective model were fixed and the classifiers were evaluated on the 2015 TSA cohort to simulate a prospective evaluation. C-statistic and f1 scores were used to assess the performance of each classifier. After evaluation, features were removed independently to assess which features most affected classifier performance. RESULTS: The derivation and validation cohorts comprised 5857 and 3186 primary TSA patients, respectively, with similar demographics, comorbidities, and 30-day unplanned readmission rates (2.9% vs. 2.7%). Of the ML algorithms, SVM performed the worst with a c-statistic of 0.54 and an f1-score of 0.07, whereas the random-forest classifier performed the best with the highest c-statistic of 0.74 and an f1-score of 0.18. In addition, SVM was most sensitive to loss of single features, whereas the performance of RF did not dramatically decrease after loss of single features. Within the trained RF classifier, 5 variables achieved weights >0.5 in descending order: high bilirubin (>1.9 mg/dL), age >65, race, chronic obstructive pulmonary disease, and American Society of Anesthesiologists' scores ≥3. In our validation cohort, we observed a 2.7% readmission rate. From this cohort, using the RF classifier we were then able to identify 436 high-risk patients with a predicted risk score >0.6, of whom 36 were readmitted (readmission rate of 8.2%). CONCLUSION: Predictive analytics algorithms can achieve acceptable prediction of unplanned readmission for TSA with the RF classifier outperforming other common algorithms.