Robust Multi-array Analysis (RMA) is the most widely used preprocessing algorithm for Affymetrix and Nimblegen gene-expression microarrays. RMA performs background correction, normalization, and summarization in a modular way. The last two steps require multiple arrays to be analyzed simultaneously. The ability to borrow information across samples provides RMA various advantages. For example, the summarization step fits a parametric model that accounts for probe-effects, assumed to be fixed across arrays, and improves outlier detection. Residuals, obtained from the fitted model, permit the creation of useful quality metrics. However, the dependence on multiple arrays has two drawbacks: (1) RMA can- not be used in clinical settings where samples must be processed individu- ally or in small batches and (2) data sets preprocessed separately are not comparable. We propose a preprocessing algorithm, frozen RMA (fRMA), which allows one to analyze microarrays individually or in small batches and then combine the data for analysis. This is accomplished by utilizing information from the large publicly available microarray databases. In particular, estimates of probe-specific effects and variances are precomputed and frozen. Then, with new data sets, these are used in concert with information from the new array(s) to normalize and summarize the data. We find that fRMA is comparable to RMA when the data are analyzed as a single batch and outperforms RMA when analyzing multiple batches. The methods described here are implemented in the R package frma and are currently available for download from the software section of http://rafalab.jhsph.edu


Bioinformatics | Computational Biology