Published 2003 in Journal of Statistical Computation and Simulation 73, No 8, pp. 575-584.


Kaufman & Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this note, we propose to partition around medoids by maximizing a criteria "Average Silhouette'' defined by Kaufman & Rousseeuw. We also propose a fast-to-compute approximation of "Average Silhouette''. We implement these two new partitioning around medoids algorithms and illustrate their performance relative to existing partitioning methods in simulations.


Genetics | Multivariate Analysis | Numerical Analysis and Computation | Statistical Methodology | Statistical Theory