Correcting common errors in identifying cancer-specific serum peptide signatures.
Academic Article
Overview
abstract
"Molecular signatures" are the qualitative and quantitative patterns of groups of biomolecules (e.g., mRNA, proteins, peptides, or metabolites) in a cell, tissue, biological fluid, or an entire organism. To apply this concept to biomarker discovery, the measurements should ideally be noninvasive and performed in a single read-out. We have therefore developed a peptidomics platform that couples magnetics-based, automated solid-phase extraction of small peptides with a high-resolution MALDI-TOF mass spectrometric readout (Villanueva, J.; Philip, J.; Entenberg, D.; Chaparro, C. A.; Tanwar, M. K.; Holland, E. C.; Tempst, P. Anal. Chem. 2004, 76, 1560-1570). Since hundreds of peptides can be detected in microliter volumes of serum, it allows to search for disease signatures, for instance in the presence of cancer. We have now evaluated, optimized, and standardized a number of clinical and analytical chemistry variables that are major sources of bias; ranging from blood collection and clotting, to serum storage and handling, automated peptide extraction, crystallization, spectral acquisition, and signal processing. In addition, proper alignment of spectra and user-friendly visualization tools are essential for meaningful, certifiable data mining. We introduce a minimal entropy algorithm, "Entropycal", that simplifies alignment and subsequent statistical analysis and increases the percentage of the highly distinguishing spectral information being retained after feature selection of the datasets. Using the improved analytical platform and tools, and a commercial statistics program, we found that sera from thyroid cancer patients can be distinguished from healthy controls based on an array of 98 discriminant peptides. With adequate technological and computational methods in place, and using rigorously standardized conditions, potential sources of patient related bias (e.g., gender, age, genetics, environmental, dietary, and other factors) may now be addressed.