Augmenting community-level social determinants of health data with individual-level survey data.

Overview

Social determinants of health (SDH) such as education and socioeconomic status are strongly associated with health and health outcomes. Incorporating SDH variables into clinical data sets could therefore improve the accuracy of predictive analytics, but individual-level SDH are rarely available and must be inferred from community-level data. We propose a method for doing so leveraging the joint probability distribution of the basic demographics available from the patient's clinical record and known community-level SDH. We demonstrate the method using two data sets, the New York City (NYC) subset of the US census data and the NYC Health and Nutrition Estimation Survey (NYCHANES) and provide sample results for 2 census tracts in NYC. The advantage of this approach is that it does not simplistically assume that all residents within a census tract share the same average/median socioeconomic status, but instead recognizes and leverages the strong known associations between demographics and SDH within localities. Results could explain some of the discrepancies appearing in the SDH-big data literature. Future studies are needed for using the augmented SHD to improve clinically relevant use cases, such as predictive analytics.