Bias correction models for electronic health records data in the presence of non-random sampling. Academic Article uri icon

Overview

abstract

  • Electronic health records (EHRs) contain rich clinical information for millions of patients and are increasingly used for public health research. However, non-random inclusion of subjects in EHRs can result in selection bias, with factors such as demographics, socioeconomic status, healthcare referral patterns, and underlying health status playing a role. While this issue has been well documented, little work has been done to develop or apply bias-correction methods, often due to the fact that most of these factors are unavailable in EHRs. To address this gap, we propose a series of Heckman type bias correction methods by incorporating social determinants of health selection covariates to model the EHR non-random sampling probability. Through simulations under various settings, we demonstrate the effectiveness of our proposed method in correcting biases in both the association coefficient and the outcome mean. Our method augments the utility of EHRs for public health inferences, as we show by estimating the prevalence of cardiovascular disease and its correlation with risk factors in the New York City network of EHRs.

publication date

  • January 29, 2024

Research

keywords

  • Electronic Health Records
  • Health Status

Identity

PubMed Central ID

  • PMC10941326

Scopus Document Identifier

  • 85187969057

Digital Object Identifier (DOI)

  • 10.1093/biomtc/ujae014

PubMed ID

  • 38488466

Additional Document Info

volume

  • 80

issue

  • 1