RRmix: A method for simultaneous batch effect correction and analysis of metabolomics data in the absence of internal standards.
Academic Article
Overview
abstract
With the surge of interest in metabolism and the appreciation of its diverse roles in numerous biomedical contexts, the number of metabolomics studies using liquid chromatography coupled to mass spectrometry (LC-MS) approaches has increased dramatically in recent years. However, variation that occurs independently of biological signal and noise (i.e. batch effects) in metabolomics data can be substantial. Standard protocols for data normalization that allow for cross-study comparisons are lacking. Here, we investigate a number of algorithms for batch effect correction and differential abundance analysis, and compare their performance. We show that linear mixed effects models, which account for latent (i.e. not directly measurable) factors, produce satisfactory results in the presence of batch effects without the need for internal controls or prior knowledge about the nature and sources of unwanted variation in metabolomics data. We further introduce an algorithm-RRmix-within the family of latent factor models and illustrate its suitability for differential abundance analysis in the presence of strong batch effects. Together this analysis provides a framework for systematically standardizing metabolomics data.