A comparison of statistical methods for the study of etiologic heterogeneity.
Academic Article
Overview
abstract
Cancer epidemiologic research has traditionally been guided by the premise that certain diseases share an underlying etiology, or cause. However, with the rise of molecular and genomic profiling, attention has increasingly focused on identifying subtypes of disease. As subtypes are identified, it is natural to ask the question of whether they share a common etiology or in fact arise from distinct sets of risk factors. In this context, epidemiologic questions of interest include (1) whether a risk factor of interest has the same effect across all subtypes of disease and (2) whether risk factor effects differ across levels of each individual tumor marker of which the subtypes are comprised. A number of statistical models have been proposed to address these questions. In an effort to determine the similarities and differences among the proposed methods, and to identify any advantages or disadvantages, we use a simplified data example to elucidate the interpretation of model parameters and available hypothesis tests, and we perform a simulation study to assess bias in effect size, type I error, and power. The results show that when the number of tumor markers is small enough that the cross-classification of markers can be evaluated in the traditional polytomous logistic regression framework, then the statistical properties are at least as good as the more complex modeling approaches that have been proposed. The potential advantage of more complex methods is in the ability to accommodate multiple tumor markers in a model of reduced parametric dimension.