Gene-set analysis evaluates the expression of biological pathways, or a priori defined gene sets, rather than that of single genes, in association with a binary phenotype, and is of great biologic interest in many DNA microarray studies. Gene Set Enrichment Analysis (GSEA) has been applied widely as a tool for gene-set analyses. We describe here some critical problems with GSEA and propose an alternative method by extending the single-gene analysis method, Significance Analysis of Microarray (SAM), to gene-set analyses (SAM-GS). Specifically, we illustrate, in a simulation study, that GSEA gives statistical significance to gene sets that have no gene associated with the phenotype (null gene sets), and has very low power to detect gene sets in which half the genes are highly associated with the phenotype (truly-associated gene sets). SAM-GS, on the other hand, performs perfectly in the simulation study: none of the null gene sets is identified with statistical significance, while all of the truly-associated gene sets are. The two methods are also compared in the analyses of three real microarray datasets and relevant pathways, the diverging results of which clearly show the advantages of SAM-GS over GSEA, both statistically and biologically.


Bioinformatics | Computational Biology