Abstract

A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from these procedures. While much work has been done on assessing the global question of number of clusters in a dataset, relatively little research exists on assessing stability of individual clusters. A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from these procedures. While much work has been done on assessing the global question of number of clusters in a dataset, relatively little research exists on assessing stability of individual clusters. We address this problem by developing cluster stability scores using subsampling techniques. These scores exploit the redundancy in biologically discriminatory information on the chip. Our approach is generic and can be used with any clustering algorithm. We propose procedures for calculating cluster stability scores for situations involving both known and unknown numbers of clusters. The methods are illustrated on data from a childhood cancer study (Khan et al., 2001).

Disciplines

Bioinformatics | Computational Biology | Microarrays | Multivariate Analysis

Share

COinS