Machine Learning for Dynamic and Short-Term Prediction of Preeclampsia Using Routine Clinical Data.
Academic Article
Overview
abstract
IMPORTANCE: Preeclampsia is a leading cause of maternal and perinatal morbidity and mortality, yet its unpredictable onset and rapid progression hinder timely management. Existing prediction tools often rely on specialized biomarkers, static assessments, or limited study cohorts, impeding clinical utility and generalizability. OBJECTIVE: To develop and validate machine learning models for dynamic, short-term prediction of preeclampsia onset using longitudinal electronic health record (EHR) data. DESIGN, SETTING, AND PARTICIPANTS: This retrospective, multisite cohort study included pregnancies delivered between October 1, 2020, and May 31, 2025, at 3 NewYork-Presbyterian hospitals: Weill Cornell Medical College (WCMC), Lower Manhattan Hospital (LMH), and Brooklyn Methodist Hospital (BMH). Extreme gradient boosting models were developed to predict preeclampsia onset within 1, 2, and 4 weeks. Performance was assessed using nested cross-validation at the training site and external validation via direct transfer, fine-tuning, and retraining. The study included pregnancies among individuals 18 years or older (35 895 at WCMC, 8664 at LMH, and 14 280 at BMH). EXPOSURE: Routine information captured within the EHR, including blood pressure, maternal characteristics and routine laboratory test results. MAIN OUTCOMES AND MEASURES: The main outcome was development of preeclampsia within specified prediction windows. Model performance was evaluated using area under the receiver operating characteristic curve, specificity and positive predictive value at 90% sensitivity. RESULTS: Among 58 839 pregnancies (mean [SD] maternal age, 33.3 [5.3] years; 10 196 [17.3%] Asian, 6525 [11.1%] Black or African American, 32 675 [55.5%] White, and 9443 [16.0%] other [ie, those who were races other than Asian, Black, or White] or unknown race), individuals who developed preeclampsia were older (median [IQR] age, 35.0 [31.0-38.0] vs 34.0 [31.0-37.0] years in the WCMC group [P < .001], 35.0 [31.0-38.0] years vs 34.0 [32.0-37.0] years in the LMH group [P = .003], and 33.0 [28.0-36.0] vs 31.0 [26.0-35.0] years in the BMH group [P < .001]) and more frequently Black (335 of 2227 [15.0%] vs 2178 of 3668 [6.5%] in the WCMC group [P < .001], 117 of 792 [14.8%] vs 566 of 7872 [7.2%] in the LMH group [P < .001], and 455 of 1088 [41.8%] vs 2874 of 13,192 [21.8%] in the BMH group [P < .001]). Predictive performance increased from 28 to 34 weeks' gestation and peaked at 34 weeks' gestation (areas under the receiver operating characteristic curves, 0.863 at training and 0.808-0.834 at validation). The positive predictive values increased from approximately 0.001 to 0.002 at 28 weeks to peak values at 36 weeks (mean [SD], 0.057 [0.012] at LMH and 0.046 [0.007] at BMH), whereas the negative predictive values were greater than 0.993. Blood pressure was the most informative predictor, whereas laboratory measures (including albumin, alkaline phosphatase, and hematologic indexes) contributed to earlier gestation, with demographic and obstetric factors increasing in importance later. CONCLUSIONS AND RELEVANCE: In this retrospective, multisite cohort study of pregnancies in late gestation, dynamic short-term prediction of preeclampsia was feasible using routinely available clinical and laboratory data. These results suggest that this approach provided opportunities for earlier intervention and would be adaptable across diverse health care settings.