Natural Language Processing Can Automate Extraction of Barrett's Esophagus Endoscopy Quality Metrics.
Overview
abstract
OBJECTIVES: To develop an automated natural language processing (NLP) method for extracting high-fidelity Barrett's Esophagus (BE) endoscopic surveillance and treatment data from the electronic health record (EHR). METHODS: Patients who underwent BE-related endoscopies between 2016 and 2020 at a single medical center were randomly assigned to a development or validation set. Those not aged 40 to 80 and those without confirmed BE were excluded. For each patient, free text pathology reports and structured procedure data were obtained. Gastroenterologists assigned ground truth labels. An NLP method leveraging MetaMap Lite generated endoscopy-level diagnosis and treatment data. Performance metrics were assessed for this data. The NLP methodology was then adapted to label key endoscopic eradication therapy (EET)-related endoscopy events and thereby facilitate calculation of patient-level pre-EET diagnosis, endotherapy time, and time to CE-IM. RESULTS: 99 patients (377 endoscopies) and 115 patients (399 endoscopies) were included in the development and validation sets respectively. When assigning high-fidelity labels to the validation set, NLP achieved high performance (recall: 0.976, precision: 0.970, accuracy: 0.985, and F1-score: 0.972). 77 patients initiated EET and underwent 554 endoscopies. Key EET-related clinical event labels had high accuracy (EET start: 0.974, CE-D: 1.00, and CE-IM: 1.00), facilitating extraction of pre-treatment diagnosis, endotherapy time, and time to CE-IM. CONCLUSIONS: High-fidelity BE endoscopic surveillance and treatment data can be extracted from routine EHR data using our automated, transparent NLP method. This method produces high-level clinical datasets for clinical research and quality metric assessment.