In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formula are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, also is more robust than the traditional stepwise variable selection. The application is illustrated with a data set on health care demand in Germany. The proposed techniques have been implemented in an open-source R package mpath.
Biostatistics | Statistical Methodology
Wang, Zhu; Ma, Shuangge; and Wang, Ching-Yun, "Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany" (May 2014). COBRA Preprint Series. Working Paper 110.