Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data.
Academic Article
Overview
abstract
INTRODUCTION: Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data. METHODS: We used EHRs from the University of Florida Health (UFHealth) system and created rule-based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UTHealth) and the University of Minnesota (UMN). RESULTS: Our best-performing CP was "patient has at least 2 AD diagnoses and AD-related keywords in AD encounters," with an F1-score of 0.817 at UF, 0.961 at UTHealth, and 0.623 at UMN, respectively. DISCUSSION: We developed and validated rule-based CPs for AD identification with good performance, which will be crucial for studies that aim to use real-world data like EHRs. HIGHLIGHTS: Developed a computable phenotype (CP) to identify Alzheimer's disease (AD) patients using EHR data.Utilized both structured and unstructured EHR data to enhance CP accuracy.Achieved a high F1-score of 0.817 at UFHealth, and 0.961 and 0.623 at UTHealth and UMN.Validated the CP across different demographics, ensuring robustness and fairness.