"Hybrid Clustering of Gene Expression Data with Visualization and the B" by Mark J. van der Laan and Katherine S. Pollard

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Hybrid Clustering of Gene Expression Data with Visualization and the Bootstrap

Authors

Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow
Katherine S. Pollard, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow

Comments

Paper copy available from biostat@berkeley.edu. Include a surface mail address with your request.

Abstract

Large-scale gene expression studies are coming increasingly common as new technologies make it possible to capture expression profiles for thousands of genes at once. One important goal with these high dimensional data structures is to find biologically important subsets and clusters of genes. In this paper, we propose a hybrid clustering method, Hierarchical Ordered Partitioning And Collapsing Hybrid (HOPACH), which is a hierarchical tree of clusters. The methodology combines the strengths of both partitioning (or divisive) and agglomerative clustering methods. At each node, a cluster is split into two or more smaller clusters with an enforced ordering of the clusters. We propose to visualize the clusters at any level of the tree by plotting the distance matrix corresponding with the ordering of the clusters and an ordering of genes within the clusters. A collapsing step uniting the two closest clusters into one cluster can be used to correct for errors in the number of clusters. A final ordered list of genes is obtained by running down the tree completely, possibly intervening with collapsing steps. Visual comparison of the distance matrix for different levels of the tree with the final distance matrix typically identifies the main clustering structure. After identifying the cluster, the bootstrap can be used to establish the reproducibility of these clusters and the overall variability of the followed procedure. The power of the methodology is illustrated with simulated and publicly available data sets consisting of cell lines from a variety of tumors.

Disciplines

Bioinformatics | Computational Biology | Multivariate Analysis

Suggested Citation

van der Laan, Mark J. and Pollard, Katherine S., "Hybrid Clustering of Gene Expression Data with Visualization and the Bootstrap" (May 2001). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 93.
https://biostats.bepress.com/ucbbiostat/paper93

This document is currently not available here.

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Share

Browse

Search

Author Corner

UCB Biostatistics