This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, we traditionally rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence to the Normal in distribution as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) that construct confidence intervals for the mean will typically be too narrow and significantly undercover in the case of high dispersion. To address this problem, we rely upon confidence intervals constructed from Bernstein's inequality as an alternative to standard methods when the sample size is small and the dispersion is high. We also propose and provide empirical evidence for a Chi Square model as an approximate distribution for the sample mean of Negative Binomial random variables of high dispersion when the mean and sample size are small. This Chi Square model leads directly to an alternative method for constructing confidence intervals in this setting. We subsequently prove a limit theorem demonstrating that the sample mean converges in distribution to a Gamma random variable, of which the Chi Square distribution is a special case. We then undertake a variety of simulation experiments to compare the proposed methods to standard techniques in terms of empirical coverage and provide concrete recommendations for the settings in which particular intervals are preferred. We subsequently conduct a sensitivity analysis of the choice of the upper bound in Bernstein confidence intervals that may serve as an avenue for improving the coverage of this method at extreme degrees of dispersion and very small sample sizes. We also apply the proposed methods to examples arising in the serial analysis of gene expression and traffic flow in a communications network to illustrate both the strengths and weaknesses of these procedures along with those of standard techniques.


Biostatistics | Statistical Methodology | Statistical Models | Statistical Theory