In this paper we introduce a semi-parametric regression model for estimating the difference in the expected value of two positive and highly skewed random variables as a function of covariates. Our method extends Smooth Quantile Ratio Estimation (SQUARE), a novel estimator of the mean difference of two positive random variables, to a regression model.

The methodological development of this paper is motivated by a common problem in econometrics where we are interested in estimating the difference in the average expenditures between two populations, say with and without a disease, taking covariates into account. Let Y1 and Y2 be two positive random variables denoting the health expenditures for cases and controls. SQUARE estimates Delta = E[Y1] - E[Y2] by smoothing across percentiles the log-transformed ratio of the two quantile functions.

Dominici et al. (2003) have shown that SQUARE: defines a large class of estimators of Delta, is more efficient than common parametric and non-parametric estimators of Delta, and is consistent and asymptotically normal.

In applications it is often desirable to estimate Delta(x)=E[Y1|x] - E[Y2|x], that is the difference in means as a function of x. In this paper we introduce a two-part regression SQUARE for estimating Delta (x). We use the first part of the model to estimate the probability of incurring any costs, and the second part of the model to estimate the mean difference in health expenditures, given that a non-zero cost is observed. In the second part of the model, we apply the basic definition of SQUARE for positive costs to compare expenditures for the cases and controls having ``similar" covariate profiles. We determine strata of cases and control with ``similar" covariate profiles by use of propensity score matching.

We then apply two-part regression SQUARE to the 1987 National Medicare Expenditure Survey to estimate the difference Delta(x) between persons suffering from smoking attributable diseases and persons without these diseases. Using a simulation study, we compare frequentist properties of two-part regression SQUARE with approaches based upon ordinary least square estimates for the log-transformed expenditures.


Health Services Research | Statistical Methodology | Statistical Models | Statistical Theory