"Statistical Inference for Simultaneous Clustering of Gene Expression D" by Katherine S. Pollard and Mark J. van der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Statistical Inference for Simultaneous Clustering of Gene Expression Data

Authors

Katherine S. Pollard, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow
Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow

Comments

Published 2002 in Mathematical Biosciences, 176(1): 99-121.

Abstract

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distributions of estimated simultaneous clustering parameters. The method is illustrated on a publicly available data set.

Disciplines

Suggested Citation

Pollard, Katherine S. and van der Laan, Mark J., "Statistical Inference for Simultaneous Clustering of Gene Expression Data" (July 2001). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 96.
https://biostats.bepress.com/ucbbiostat/paper96

Download

Included in

Bioinformatics Commons, Computational Biology Commons, Microarrays Commons, Multivariate Analysis Commons, Statistical Methodology Commons, Statistical Theory Commons

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

UCB Biostatistics