Whole-genome studies are becoming a mainstay of biomedical research. Examples include expression array experiments, comparative genomic hybridization analyses and large case-control studies for detecting polymorphism/disease associations. The tactic of applying a regression model to every locus to obtain test statistics is useful in such studies. However, this approach ignores potential correlation structure in the data that could be used to gain power, particularly when a Bonferroni correction is applied to adjust for multiple testing. In this article, we propose using regression techniques for misspecified multivariate outcomes to increase statistical power over independence-based modeling at each locus. Even when the outcome is not ordinarily regarded as multivariate, it is mathematically valid to view the outcome as a set of (identical) repeated measurements, one associated with each genetic locus. Rather than joint modeling of all observations, we propose to apply joint modeling to subgroups of data. The primary example in this article focuses on the use of generalized estimating equations (GEE) software to apply the method. We describe conditions under which the proposed method provides more power than applying independence-based methods. In simulation studies of plausible and interesting scenarios, power gains are as large as 35% compared to modeling the outcomes univariately with a one genetic covariate. In contrast, modeling the outcome as univariate with multiple genetic covariates performs very poorly when data are correlated. The proposed method is easy to apply, allows adjustment for confounding and can be combined with other methods for increasing power in multiple testing situations.


Bioinformatics | Computational Biology | Epidemiology | Genetics | Microarrays | Multivariate Analysis