Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. The widespread confusion on which method to use in practice has been exacerbated by the finding that simply ranking genes by their fold changes sometimes outperforms popular statistical tests.

Algorithms may be compared by quantifying each method's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on the lognormality assumption tend to outperform both a nonparametric method and algorithms based on selecting genes by their fold changes. The general comparison methodology is applicable to both single-channel and dual-channel microarrays.

As a theoretically sound method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing methods for detecting differential gene expression.


Microarrays | Statistical Methodology | Statistical Theory