The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous. In this article, we outline a general framework for determining differential expression in the presence of mixed cell populations. We consider study designs in which paired tissues and unpaired tissues are available. A hierarchical mixture model is used for modelling the data; a combination of methods of moments procedures and the expectation-maximization (EM) algorithm are used to estimate the model parameters. Links with the false discovery rate are discussed. The methods are applied to two microarray datasets from cancer studies as well as to simulated data.


Bioinformatics | Computational Biology | Microarrays | Statistical Models