Abstract

Despite many recent methodological developments, variable selection in high dimensional settings where the number of covariates (p) is larger than the sample size (n) remains a difficult problem, especially when the covariates with zero coefficients are correlated with some covariates with nonzero coefficients. One such example is genome-wide multiple loci mapping with dense genetic markers, where the number of covariates (i.e., the number of genetic markers) are often larger than the sample size and nearby markers often share similar genotype profiles due to linkage or linkage disequilibrium. The adaptive Lasso (Zou, H. 2006) is a state-of-the-art method for simultaneous variable selection and estimation in the setting of linear regression. However, it requires consistent initial estimates of the regression coefficients, which are generally not available in the aforementioned high-dimensional settings. In this paper, we propose two variable selection methods: the Bayesian adaptive Lasso and the iterative adaptive Lasso. These two methods extend the adaptive Lasso in the sense that they do not require any informative initial estimates of the regression coefficients. We systematically evaluate the variable selection performance of the proposed methods as well as several existing methods within the framework of genome-wide multiple loci mapping. We show that the proposed methods have improved variable selection performance compared to most existing methods and the iterative adaptive Lasso also has superior computational efficiency.

Disciplines

Bioinformatics | Computational Biology | Genetics | Numerical Analysis and Computation | Statistical Methodology | Statistical Models | Statistical Theory