Comparisons of Machine Learning Models to Logistic Regression in Orthopedic Sports Medicine are Confounded by Methodological Heterogeneity: A Systematic Review and Meta-Analysis.
Review
Overview
abstract
IMPORTANCE: The modeling methodology and reporting of performance metrics in the development of machine learning models in orthopedic sports medicine have not yet been systematically assessed. OBJECTIVE: The purpose of this study was to systematically review this literature for clinical prediction models utilizing machine learning and to evaluate the methodological quality of modeling and performance reporting, as well as compare the performance of machine learning with logistic regression predictions where applicable. EVIDENCE REVIEW: A systematic search was conducted of the MEDLINE, Scopus, and Embase databases in September 2025 for articles pertaining to clinical prediction models using machine learning in orthopedic sports medicine. Study demographics, outcomes, modeling workflow, and risk of bias information was collected. A random effects meta-regression controlling for article and sample size was performed to compare the differences, where applicable, in performance benefit measured by area under the curve (AUC), of utilizing machine learning models over logistic regression. FINDINGS: A total of 1033 articles were screened to include 52 articles in the final analysis. The most frequently utilized machine learning algorithm was random forest, followed by boosted trees and support vector machines. Most noteworthy sources of potential bias were encountered in outcome imbalance and management of continuous predictors. A total of 25 studies performed a total of 168 pairwise comparisons between machine learning and logistic regression. For 43 comparisons at high risk of bias, logit-transformed AUC regression (logit(AUC)) was 0.08 (-0.22-0.48) higher for machine learning, for 125 comparisons at low risk of bias, logit(AUC) was 0.00 (-0.18-0.18) lower for machine learning. Overall, random forest models demonstrated superior performance with a logit(AUC) of 0.11 (0.00-0.21). CONCLUSION AND RELEVANCE: While random forest algorithms were associated with higher performance relative to traditional methods in well-constructed prediction problems (i.e., adequately powered datasets with appropriate feature handling, class balance considerations, and rigorous validation), no conclusive recommendations can be made regarding the superiority of machine learning over logistic regression given the current evidence. Improvements in methodology and standardized reporting of performance metrics are required for useful interpretation of future comparisons. LEVEL OF EVIDENCE: IV, Systematic Review and Meta-Analysis with more than 2 negative criteria.