Abstract
Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of {\it a priori} information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this paper, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the $L_1$-norm of the coefficients but encourages smoothness of the coefficients on the network. An efficient algorithm is also proposed for computing the network-constrained regularization paths, much like the Lars algorithm does for the lasso. We illustrate the methods using simulated data and analysis of a microarray gene expression data set of glioblastoma.
Disciplines
Bioinformatics | Computational Biology
Suggested Citation
Li, Caiyan and Li, Hongzhe, "Network-constrained Regularization and Variable Selection for Analysis of Genomic Data" (December 2007). UPenn Biostatistics Working Papers. Working Paper 23.
https://biostats.bepress.com/upennbiostat/art23