In a gene expression array study, the expression levels of thousands of genes are monitored simultaneously across various biological conditions on a small set of subjects. One goal of such studies is to explore a large pool of genes in order to select a subset of genes that appear to be differently expressed for further investigation. Of particular interest here is how to select the top k genes once genes are ranked based on their evidence for differential expression in two tissue types. We consider statistical methods that provide a more rigorous and intuitively appealing selection process for k. We propose to choose genes based on adjusted p-values (AP values). The AP values are calculated with a resampling based algorithm assuming that no genes are truly differentially expressed, and take into account the multiplicity and dependence encountered in microarray data. Using both simulated data and real microarray data, we assess and compare the performance of our new method with existing methods. The intuitive basis for the AP values and the fact that our procedure has operating characteristics at least as good as existing procedures make it attractive for practical application.


Genetics | Microarrays | Multivariate Analysis | Statistical Methodology | Statistical Theory