Abstract
Learning curves are used to model improvements in task performance with accumulated experience. This concept has been adapted to characterize improvements in classifier performance with increasing numbers of training samples. Recently, learning curves fit with power-law models have been used effectively with microarray data. We fit learning curves to performance data from 12 published microarray datasets using a variety of classifiers.
Our results confirm that the power-law model captures trends in classification test performance given the training sample size. We find that: (1) a reasonable prediction of performance is achievable across a variety of binary classifiers and microarray datasets with at least 75 samples, (2) learning curves sometimes cross, indicating that the preferred classifier may depend on sample numbers available for training, (3) outcome studies are generally harder to classify than other kinds of microarray data and (4) bootstrap resampling can yield approximate prediction intervals.
Disciplines
Microarrays | Multivariate Analysis
Suggested Citation
Hess, Kenneth R. and Gold, David L., "Learning Curves in Classification with Microarray Data" (November 2004). UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series. Working Paper 4.
http://biostats.bepress.com/mdandersonbiostat/paper4
