Machine learning to predict venous thrombosis in acutely ill medical patients.
Academic Article
Overview
abstract
BACKGROUND: The identification of acutely ill patients at high risk for venous thromboembolism (VTE) may be determined clinically or by use of integer-based scoring systems. These scores demonstrated modest performance in external data sets. OBJECTIVES: To evaluate the performance of machine learning models compared to the IMPROVE score. METHODS: The APEX trial randomized 7513 acutely medically ill patients to extended duration betrixaban vs. enoxaparin. Including 68 variables, a super learner model (ML) was built to predict VTE by combining estimates from 5 families of candidate models. A "reduced" model (rML) was also developed using 16 variables that were thought, a priori, to be associated with VTE. The IMPROVE score was calculated for each patient. Model performance was assessed by discrimination and calibration to predict a composite VTE end point. The frequency of predicted risks of VTE were plotted and divided into tertiles. VTE risks were compared across tertiles. RESULTS: The ML and rML algorithms outperformed the IMPROVE score in predicting VTE (c-statistic: 0.69, 0.68 and 0.59, respectively). The Hosmer-Lemeshow goodness-of-fit P-value was 0.06 for ML, 0.44 for rML, and <0.001 for the IMPROVE score. The observed event rate in the lowest tertile was 2.5%, 4.8% in tertile 2, and 11.4% in the highest tertile. Patients in the highest tertile of VTE risk had a 5-fold increase in odds of VTE compared to the lowest tertile. CONCLUSION: The super learner algorithms improved discrimination and calibration compared to the IMPROVE score for predicting VTE in acute medically ill patients.