Penalized landmark supermodels (penLM) for dynamic prediction for time-to-event outcomes in high-dimensional data. Academic Article uri icon

Overview

abstract

  • BACKGROUND: To effectively monitor long-term outcomes among cancer patients, it is critical to accurately assess patients' dynamic prognosis, which often involves utilizing multiple data sources (e.g., tumor registries, treatment histories, and patient-reported outcomes). However, challenges arise in selecting features to predict patient outcomes from high-dimensional data, aligning longitudinal measurements from multiple sources, and evaluating dynamic model performance. METHODS: We provide a framework for dynamic risk prediction using the penalized landmark supermodel (penLM) and develop novel metrics ([Formula: see text] and [Formula: see text]) to evaluate and summarize model performance across different timepoints. Through simulations, we assess the coverage of the proposed metrics' confidence intervals under various scenarios. We applied penLM to predict the updated 5-year risk of lung cancer mortality at diagnosis and for subsequent years by combining data from SEER registries (2007-2018), Medicare claims (2007-2018), Medicare Health Outcome Survey (2006-2018), and U.S. Census (1990-2010). RESULTS: The simulations confirmed valid coverage (~ 95%) of the confidence intervals of the proposed summary metrics. Of 4,670 lung cancer patients, 41.5% died from lung cancer. Using penLM, the key features to predict lung cancer mortality included long-term lung cancer treatments, minority races, regions with low education attainment or racial segregation, and various patient-reported outcomes beyond cancer staging and tumor characteristics. When evaluated using the proposed metrics, the penLM model developed using multi-source data ([Formula: see text]of 0.77 [95% confidence interval: 0.74-0.79]) outperformed those developed using single-source data ([Formula: see text]range: 0.50-0.74). CONCLUSIONS: The proposed penLM framework with novel evaluation metrics offers effective dynamic risk prediction when leveraging high-dimensional multi-source longitudinal data.

publication date

  • January 27, 2025

Research

keywords

  • Lung Neoplasms

Identity

PubMed Central ID

  • PMC11771018

Digital Object Identifier (DOI)

  • 10.1186/s12874-024-02418-9

PubMed ID

  • 39871161

Additional Document Info

volume

  • 25

issue

  • 1