A Framework for Data Quality Assessment in Clinical Research Datasets. Academic Article uri icon



  • The wide availability of electronic health record (EHR) data for multi-institutional clinical research relies on accurately defined patient cohorts to ensure validity, especially when used in conjunction with open-access research data. There is a growing need to utilize a consensus-driven approach to assess data quality. To achieve this goal, we modified an existing data quality assessment (DQA) framework by re-operationalizing dimensions of quality for a clinical domain of interest - heart failure. We then created an inventory of common phenotype data elements (CPDEs) derived from open-access datasets and evaluated it against the modified DQA framework. We measured our inventory of CPDEs for Conformance, Completeness, and Plausibility. DQA scores were high on Completeness, Value Conformance, and Atemporal and Temporal Plausibility. Our work exhibits a generalizable approach to DQA for clinical research. Future work will 1) map datasets to standard terminologies and 2) create a quantitative DQA tool for research datasets.

publication date

  • April 16, 2018



  • Data Accuracy
  • Datasets as Topic
  • Electronic Health Records


PubMed Central ID

  • PMC5977591

Scopus Document Identifier

  • 85058740651

PubMed ID

  • 29854176

Additional Document Info


  • 2017