Validating smoking data from the Veteran's Affairs Health Factors dataset, an electronic data source.
Academic Article
Overview
abstract
INTRODUCTION: We assessed smoking data from the Veterans Health Administration (VHA) electronic medical record (EMR) Health Factors dataset. METHODS: To assess the validity of the EMR Health Factors smoking data, we first created an algorithm to convert text entries into a 3-category smoking variable (never, former, and current). We compared this EMR smoking variable to 2 different sources of patient self-reported smoking survey data: (a) 6,816 HIV-infected and -uninfected participants in the 8-site Veterans Aging Cohort Study (VACS-8) and (b) a subset of 13,689 participants from the national VACS Virtual Cohort (VACS-VC), who also completed the 1999 Large Health Study (LHS) survey. Sensitivity, specificity, and kappa statistics were used to evaluate agreement of EMR Health Factors smoking data with self-report smoking data. RESULTS: For the EMR Health Factors and VACS-8 comparison of current, former, and never smoking categories, the kappa statistic was .66. For EMR Health Factors and VACS-VC/LHS comparison of smoking, the kappa statistic was .61. CONCLUSIONS: Based on kappa statistics, agreement between the EMR Health Factors and survey sources is substantial. Identification of current smokers nationally within the VHA can be used in future studies to track smoking status over time, to evaluate smoking interventions, and to adjust for smoking status in research. Our methodology may provide insights for other organizations seeking to use EMR data for accurate determination of smoking status.