COBRA Preprint Series

Minimum Description Length and Empirical Bayes Methods of Identifying SNPs Associated with Disease

Ye Yang, Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology
David R. Bickel, Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, Department of Mathematics and StatisticsFollow

Abstract

The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.

We address this problem of the semiparametric approach by proposing two simple parametric methods under the minimum description length (MDL) and empirical Bayes frameworks. The performances of the estimators corresponding to the two proposed parametric models and of the popular semiparametric model are compared by simulation to select a method for analyzing genome-wide association data.

The application of the coronary artery disease data indicates that the semiparametric method sometimes leads to overfitting due to nonparametric density estimation. Unlike semiparametric methods, the analyses based on the two parametric models can measure the amount of information in the data that favors one hypothesis over another. In multiple simulation studies, the estimators associated with the parametric mixture model consistently performs better than those of the other two models.

Disciplines

Epidemiology | Genetics | Statistical Methodology | Statistical Models | Statistical Theory

Suggested Citation

Yang, Ye and Bickel, David R., "Minimum Description Length and Empirical Bayes Methods of Identifying SNPs Associated with Disease" (November 2010). COBRA Preprint Series. Working Paper 74.
https://biostats.bepress.com/cobra/art74

Download

Included in

Epidemiology Commons, Genetics Commons, Statistical Methodology Commons, Statistical Models Commons, Statistical Theory Commons

COinS

Collection of Biostatistics Research Archive

COBRA Preprint Series

Minimum Description Length and Empirical Bayes Methods of Identifying SNPs Associated with Disease

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

Collection of Biostatistics Research Archive

COBRA Preprint Series

Minimum Description Length and Empirical Bayes Methods of Identifying SNPs Associated with Disease

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner