Some Contributions to Bayesian Regularization Methods with Applications to Genetics and Clinical Trials

abstract

Variable selection refers to the class of problems where one tries to find an optimal subset of relevant variables, which can be used to accurately predict the outcome of a certain response variable. Typically, a large number of variables are often collected; however, all but a few important variables are relevant for the prediction of the outcome, so the underlying representation is sparse. To this end, variable selection is fundamental in high-dimensional data analysis, playing a crucial role in important scientific discovery and decision-making, and has received enormous attention in the literature. Regularization method is one attractive approach that has proven successful for dealing with high-dimensional data. In the last two decades, a large amount of effort has gone into the development of regularization methods for high-dimensional variable selection problems arising in various diverse scientific disciplines. These methods facilitate automatic variable selection by setting certain coefficients to zero and shrinking the remainder, and provide useful estimates even if the model includes a large number of highly correlated variables. While great progress has been made in the last two decades, further improvements are still possible. With the dramatic increase in computing power and new and emerging efficient algorithms, Bayesian regularization approaches to variable selection have become increasingly popular. Bayesian regularization methods often outperform their frequentist analogs by providing smaller prediction errors while selecting the most relevant variables in a parsimonious way. In this dissertation, we have developed a set of novel Bayesian regularization methods for linear models. Extensive simulation studies and real data analyses are carried forward, which demonstrate superior performance of the proposed methods as compared to their frequentist counterparts. Some of these methods are applied to select subgroups of patients with differential treatment effects from a historical clinical trial data. We also apply our methods to the problem of detecting rare variants in a genetic association study. In addition, some associated theoretical properties are investigated, which gives deeper insight into the asymptotic behavior of these approaches. Finally, we discuss possible extensions of our approach and present a unified framework for variable selection in general models beyond linear regression. This dissertation thus introduces new and efficient tools for practitioners and researchers alike for conducting Bayesian variable selection in a variety of real life problems.

VIVO Weill Cornell Medical College

Some Contributions to Bayesian Regularization Methods with Applications to Genetics and Clinical Trials Thesis

Overview

abstract

authors

publication date