Fitting of Mixtures with Unspecified Number of Components Using Cross Validation Distance Estimate
Estimation of the number of mixture components (k) is an unsolved problem. Available methods for estimation of k include bootstrapping the likelihood ratio test statistics and optimizing a variety of validity functionals such as AIC, BIC/MDL, and ICOMP. We investigate the minimization of distance between fitted mixture model and the true density as a method for estimating k. The distances considered are Kullback-Leibler (KL) and “L sub 2”. We estimate these distances using cross validation. A reliable estimate of k is obtained by voting of B estimates of k corresponding to B cross validation estimates of distance. This estimation methods with KL distance is very similar to Monte Carlo cross validated likelihood methods discussed by Smyth (2000). With focus on univariate normal mixtures, we present simulation studies that compare the cross validated distance method with AIC, BIC/MDL, and ICOMP. We also apply the cross validation estimate of distance approach along with AIC, BIC/MDL and ICOMP approach, to data from an osteoporosis drug trial in order to find groups that differentially respond to treatment.
Miloslavsky, Maja and van der Laan, Mark J., "Fitting of Mixtures with Unspecified Number of Components Using Cross Validation Distance Estimate" (February 2001). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 89.