A scalable method for supporting multiple patient cohort discovery projects using i2b2. Academic Article uri icon

Overview

abstract

  • Although i2b2, a popular platform for patient cohort discovery using electronic health record (EHR) data, can support multiple projects specific to individual disease areas or research interests, the standard approach for doing so duplicates data across projects, requiring additional disk space and processing time, which limits scalability. To address this deficiency, we developed a novel approach that stored data in a single i2b2 fact table and used structured query language (SQL) views to access data for specific projects. Compared to the standard approach, the view-based approach reduced required disk space by 59% and extract-transfer-load (ETL) time by 46%, without substantially impacting query performance. The view-based approach has enabled scalability of multiple i2b2 projects and generalized to another data model at our institution. Other institutions may benefit from this approach, code of which is available on GitHub (https://github.com/wcmc-research-informatics/super-i2b2).

publication date

  • July 19, 2018

Research

keywords

  • Electronic Health Records
  • Medical Informatics

Identity

Scopus Document Identifier

  • 85050125132

Digital Object Identifier (DOI)

  • 10.1016/j.jbi.2018.07.010

PubMed ID

  • 30009991

Additional Document Info

volume

  • 84