In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks the minimax optimality of the normalized maximum likelihood (NML) of recent minimum description length methodology.

On the basis of either of two NMLs, the strength of evidence for enrichment may be measured by the discrimination information (DI) in the data that favors the alternative hypothesis over the null hypothesis. One of the NMLs, the normalized maximum conditional likelihood (NMCL), is supported by the conditionality principle.

We assessed measures of evidence derived from the two NMLs, two BFs and the p-value for one-sided and two-sided hypothesis comparisons using a gene expression data set from an experiment on a breast cancer cell line. These measures, for most GO terms, give the same results for the two-sided hypothesis comparison. However, they do not agree as well for the one-sided hypothesis comparison, in which case the DI based on the NMCL cannot be closely approximated by any of the faster methods.


Bioinformatics | Biostatistics | Computational Biology | Genetics | Microarrays | Statistical Methodology | Statistical Theory