Developing nonlinear k-nearest neighbors classification algorithms to identify patients at high risk of increased length of hospital stay following spine surgery.
Academic Article
Overview
abstract
OBJECTIVE: Spondylolisthesis is a common operative disease in the United States, but robust predictive models for patient outcomes remain limited. The development of models that accurately predict postoperative outcomes would be useful to help identify patients at risk of complicated postoperative courses and determine appropriate healthcare and resource utilization for patients. As such, the purpose of this study was to develop k-nearest neighbors (KNN) classification algorithms to identify patients at increased risk for extended hospital length of stay (LOS) following neurosurgical intervention for spondylolisthesis. METHODS: The Quality Outcomes Database (QOD) spondylolisthesis data set was queried for patients receiving either decompression alone or decompression plus fusion for degenerative spondylolisthesis. Preoperative and perioperative variables were queried, and Mann-Whitney U-tests were performed to identify which variables would be included in the machine learning models. Two KNN models were implemented (k = 25) with a standard training set of 60%, validation set of 20%, and testing set of 20%, one with arthrodesis status (model 1) and the other without (model 2). Feature scaling was implemented during the preprocessing stage to standardize the independent features. RESULTS: Of 608 enrolled patients, 544 met prespecified inclusion criteria. The mean age of all patients was 61.9 ± 12.1 years (± SD), and 309 (56.8%) patients were female. The model 1 KNN had an overall accuracy of 98.1%, sensitivity of 100%, specificity of 84.6%, positive predictive value (PPV) of 97.9%, and negative predictive value (NPV) of 100%. Additionally, a receiver operating characteristic (ROC) curve was plotted for model 1, showing an overall area under the curve (AUC) of 0.998. Model 2 had an overall accuracy of 99.1%, sensitivity of 100%, specificity of 92.3%, PPV of 99.0%, and NPV of 100%, with the same ROC AUC of 0.998. CONCLUSIONS: Overall, these findings demonstrate that nonlinear KNN machine learning models have incredibly high predictive value for LOS. Important predictor variables include diabetes, osteoporosis, socioeconomic quartile, duration of surgery, estimated blood loss during surgery, patient educational status, American Society of Anesthesiologists grade, BMI, insurance status, smoking status, sex, and age. These models may be considered for external validation by spine surgeons to aid in patient selection and management, resource utilization, and preoperative surgical planning.