Containers in Bioinformatics: Applications, Practical Considerations, and Best Practices in Molecular Pathology.

Overview

abstract

Systematic implementation of bioinformatics resources for next generation sequencing (NGS)-based clinical testing is an arduous undertaking. One of the key challenges involves developing an ecosystem of IT infrastructure for enabling scalable and reproducible bioinformatics services that is resilient and secure for handling genetic and protected health information (PHI), often embedded in an existing non-bioinformatics-oriented infrastructure. Container technology provides an ideal and infrastructure-agnostic solution for molecular laboratories developing and using bioinformatics pipelines, whether on-premise or using the cloud. A container is a technology that provides a consistent computational environment and enables reproducibility, scalability, and security when developing NGS bioinformatics analysis pipelines. Containers can increase the bioinformatics team's productivity by automating and simplifying the maintenance of complex bioinformatics resources as well as facilitate validation, version control, and documentation necessary for clinical laboratory regulatory compliance. While there is increasing popularity in adopting containers for developing NGS bioinformatics pipelines, there is wide variability and inconsistency in the usage of containers that may result in suboptimal performance and potentially compromise the security and privacy of PHI. In this article, we highlight the current state and provide best or recommended practices for building, using containers in NGS bioinformatics solutions in a clinical setting with focus on scalability, optimization, maintainability, and data security.

authors

Kadri, Sabah

Sboner, Andrea
Sigaras, Alexandros
Roy, Somak

publication date

February 18, 2022

published in

The Journal of molecular diagnostics : JMD Journal

Research

keywords

Computational Biology
Pathology, Molecular

Identity

Digital Object Identifier (DOI)

10.1016/j.jmoldx.2022.01.006

PubMed ID

35189355

VIVO Weill Cornell Medical College

Containers in Bioinformatics: Applications, Practical Considerations, and Best Practices in Molecular Pathology. Review

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

PubMed ID