Dynamic Survival Analysis for EHR Data with Personalized Parametric Distributions.
Academic Article
Overview
abstract
The widespread availability of high-dimensional electronic healthcare record (EHR) datasets has led to significant interest in using such data to derive clinical insights and make risk predictions. More specifically, techniques from machine learning are being increasingly applied to the problem of dynamic survival analysis, where updated time-to-event risk predictions are learned as a function of the full covariate trajectory from EHR datasets. EHR data presents unique challenges in the context of dynamic survival analysis, involving a variety of decisions about data representation, modeling, interpretability, and clinically meaningful evaluation. In this paper we propose a new approach to dynamic survival analysis which addresses some of these challenges. Our modeling approach is based on learning a global parametric distribution to represent population characteristics and then dynamically locating individuals on the time-axis of this distribution conditioned on their histories. For evaluation we also propose a new version of the dynamic C-Index for clinically meaningful evaluation of dynamic survival models. To validate our approach we conduct dynamic risk prediction on three real-world datasets, involving COVID-19 severe outcomes, cardiovascular disease (CVD) onset, and primary biliary cirrhosis (PBC) time-to-transplant. We find that our proposed modeling approach is competitive with other well-known statistical and machine learning approaches for dynamic risk prediction, while offering potential advantages in terms of interepretability of predictions at the individual level.