A central problem in genomic research is the identification of genes and pathways involved in diseases and other biological processes. The genes identified or the univariate test statistics are often linked to known biological pathways through gene set enrichment analysis in order to identify the pathways involved. However, most of the procedures for identifying differentially expressed genes do not utilize the known pathway information in the phase of identifying such genes. In this paper, we develop a Markov random field (MRF)-based method for identifying genes and subnetworks that are related to diseases. Such a procedure models the dependency of the differential expression patterns of genes on the networks using a local discrete MRF model. Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity and lower false discovery rates than the commonly used procedures that do not use the pathway structure information. Applications to two breast cancer microarray gene expression datasets identified several subnetworks on several of the KEGG transcriptional pathways that are related to breast cancer recurrence or survival due to breast cancer. The proposed MRF-based model efficiently utilizes the known pathway structures in identifying the differentially expressed genes and the subnetworks that might be related to phenotype. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes.


Bioinformatics | Computational Biology