Hierarchical modeling for estimating relative risks of rare genetic variants: properties of the pseudo-likelihood method.

Overview

abstract

Many major genes have been identified that strongly influence the risk of cancer. However, there are typically many different mutations that can occur in the gene, each of which may or may not confer increased risk. It is critical to identify which specific mutations are harmful, and which ones are harmless, so that individuals who learn from genetic testing that they have a mutation can be appropriately counseled. This is a challenging task, since new mutations are continually being identified, and there is typically relatively little evidence available about each individual mutation. In an earlier article, we employed hierarchical modeling (Capanu et al., 2008, Statistics in Medicine 27, 1973-1992) using the pseudo-likelihood and Gibbs sampling methods to estimate the relative risks of individual rare variants using data from a case-control study and showed that one can draw strength from the aggregating power of hierarchical models to distinguish the variants that contribute to cancer risk. However, further research is needed to validate the application of asymptotic methods to such sparse data. In this article, we use simulations to study in detail the properties of the pseudo-likelihood method for this purpose. We also explore two alternative approaches: pseudo-likelihood with correction for the variance component estimate as proposed by Lin and Breslow (1996, Journal of the American Statistical Association 91, 1007-1016) and a hybrid pseudo-likelihood approach with Bayesian estimation of the variance component. We investigate the validity of these hierarchical modeling techniques by looking at the bias and coverage properties of the estimators as well as at the efficiency of the hierarchical modeling estimates relative to that of the maximum likelihood estimates. The results indicate that the estimates of the relative risks of very sparse variants have small bias, and that the estimated 95% confidence intervals are typically anti-conservative, though the actual coverage rates are generally above 90%. The widths of the confidence intervals narrow as the residual variance in the second-stage model is reduced. The results also show that the hierarchical modeling estimates have shorter confidence intervals relative to estimates obtained from conventional logistic regression, and that these relative improvements increase as the variants become more rare.

authors

Capanu, Marinela

Begg, Colin B.

publication date

August 5, 2010

published in

Biometrics Journal

Research

keywords

Genetic Variation
Likelihood Functions
Mutation
Neoplasms

Identity

PubMed Central ID

PMC3015025

Scopus Document Identifier

79958085139

Digital Object Identifier (DOI)

10.1111/j.1541-0420.2010.01469.x

PubMed ID

20707869

Additional Document Info

has global citation frequency

19

volume

67

issue

2

VIVO Weill Cornell Medical College

Hierarchical modeling for estimating relative risks of rare genetic variants: properties of the pseudo-likelihood method. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

PubMed Central ID

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume

issue