Identifying differentially expressed (DE) genes associated with a sample characteristic is the primary objective of many microarray studies. As more and more studies are carried out with observational rather than well controlled experimental samples, it becomes important to evaluate and properly control the impact of sample heterogeneity on DE gene finding. Typical methods for identifying DE genes require ranking all the genes according to a pre-selected statistic based on a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches unavoidably result in model misspecification, which can lead to increased error due to bias for some genes and reduced efficiency for the others. We evaluated the impact of model misspecification from such approaches on detecting DE genes and identified parameters that affect the magnitude of impact. To properly control for sample heterogeneity and to provide a flexible and coherent framework for identifying simultaneously DE genes associated with a single or multiple sample characteristics and/or their interactions, we proposed a Bayesian model averaging approach which corrects the model misspecification by averaging over model space formed by all relevant covariates. An empirical approach is suggested for specifying prior model probabilities. We demonstrated through simulated microarray data that this approach resulted in improved performance in DE gene identification compared to the single model approaches. The flexibility of this approach is demonstrated through our analysis of data from two observational microarray studies.


Bioinformatics | Computational Biology | Microarrays