A Bayesian approach to accurate and robust signature detection on LINCS L1000 data. Academic Article uri icon

Overview

abstract

  • MOTIVATION: LINCS L1000 dataset contains numerous cellular expression data induced by large sets of perturbagens. Although it provides invaluable resources for drug discovery as well as understanding of disease mechanisms, the existing peak deconvolution algorithms cannot recover the accurate expression level of genes in many cases, inducing severe noise in the dataset and limiting its applications in biomedical studies. RESULTS: Here, we present a novel Bayesian-based peak deconvolution algorithm that gives unbiased likelihood estimations for peak locations and characterize the peaks with probability based z-scores. Based on the above algorithm, we build a pipeline to process raw data from L1000 assay into signatures that represent the features of perturbagen. The performance of the proposed pipeline is evaluated using similarity between the signatures of bio-replicates and the drugs with shared targets, and the results show that signatures derived from our pipeline gives a substantially more reliable and informative representation for perturbagens than existing methods. Thus, the new pipeline may significantly boost the performance of L1000 data in the downstream applications such as drug repurposing, disease modeling and gene function prediction. AVAILABILITY AND IMPLEMENTATION: The code and the precomputed data for LINCS L1000 Phase II (GSE 70138) are available at https://github.com/njpipeorgan/L1000-bayesian. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

publication date

  • May 1, 2020

Research

keywords

  • Algorithms
  • Drug Discovery

Identity

PubMed Central ID

  • PMC7203754

Scopus Document Identifier

  • 85084379894

Digital Object Identifier (DOI)

  • 10.1093/bioinformatics/btaa064

PubMed ID

  • 32003771

Additional Document Info

volume

  • 36

issue

  • 9