Standardized Representation of Clinical Study Data Dictionaries with CIMI Archetypes. Academic Article uri icon



  • Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary's metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized archetypes can help to overcome this problem. The Archetype Modeling Language (AML) as developed by the Clinical Information Modeling Initiative (CIMI) can serve as a common format for the representation of data dictionary models. We mapped three different data dictionaries (identified from dbGAP, PheKB and TCGA) onto AML archetypes by aligning dictionary variable definitions with the AML archetype elements. The near complete alignment of data dictionaries helped map them into valid AML models that captured all data dictionary model metadata. The outcome of the work would help subject matter experts harmonize data models for quality, semantic interoperability and better downstream data integration.

publication date

  • February 10, 2017



  • Biomedical Research
  • Databases, Factual
  • Metadata


PubMed Central ID

  • PMC5333261

Scopus Document Identifier

  • 85027524088

PubMed ID

  • 28269909

Additional Document Info


  • 2016