Development and validation of a machine learning algorithm for predicting the risk of postpartum depression among pregnant women.
OBJECTIVE: There is a scarcity in tools to predict postpartum depression (PPD). We propose a machine learning framework for PPD risk prediction using data extracted from electronic health records (EHRs). METHODS: Two EHR datasets containing data on 15,197 women from 2015 to 2018 at a single site, and 53,972 women from 2004 to 2017 at multiple sites were used as development and validation sets, respectively, to construct the PPD risk prediction model. The primary outcome was a diagnosis of PPD within 1 year following childbirth. A framework of data extraction, processing, and machine learning was implemented to select a minimal list of features from the EHR datasets to ensure model performance and to enable future point-of-care risk prediction. RESULTS: The best-performing model uses from clinical features related to mental health history, medical comorbidity, obstetric complications, medication prescription orders, and patient demographic characteristics. The model performances as measured by area under the receiver operating characteristic curve (AUC) are 0.937 (95% CI 0.912 - 0.962) and 0.886 (95% CI 0.879-0.893) in the development and validation datasets, respectively. The model performances were consistent when tested using data ending at multiple time periods during pregnancy and at childbirth. LIMITATIONS: The prevalence of PPD in the study data represented a treatment prevalence and is likely lower than the illness prevalence. CONCLUSIONS: EHRs and machine learning offer the ability to identify women at risk for PPD early in their pregnancy. This may facilitate scalable and timely prevention and intervention, reducing negative outcomes and the associated burden.