Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.

Overview

abstract

BACKGROUND: Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation. METHODS: We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci. RESULTS: Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable. CONCLUSION: Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.

authors

Suhre, Karsten
Strauch, Konstantin
Peters, Annette
Gieger, Christian
Langenberg, Claudia
Stewart, Isobel D
Theis, Fabian J
Grallert, Harald
Kastenmüller, Gabi
Krumsiek, Jan

publication date

September 20, 2018

published in

Metabolomics : Official journal of the Metabolomic Society Journal

Research

keywords

Mass Spectrometry
Metabolomics

Identity

PubMed Central ID

PMC6153696

Scopus Document Identifier

85053638868

Digital Object Identifier (DOI)

10.1007/s11306-018-1420-2

PubMed ID

30830398

Additional Document Info

has global citation frequency

158

volume

14

issue

10

VIVO Weill Cornell Medical College

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

PubMed Central ID

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume

issue