In this paper, we illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides three main benefits. First, by including the individual level subsample data, the biases associated with linear ecological inference can be eliminated. Second, by supplementing the subsample data with ecological data, the information about parameters will be increased. Third, we can use readily available ecological data to design optimal subsampling schemes, so as to further increase the information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages. We show that combining ecological data with subsample data provides precise estimates of this value, and that optimal subsampling schemes (conditional on the ecological data) can provide good precision with only a fraction of the observations.
Statistical Methodology | Statistical Theory
Glynn, Adam; Wakefield, Jon; Handcock, Mark; and Richardson, Thomas, "Alleviating Linear Ecological Bias and Optimal Design with Subsample Data" (December 2005). UW Biostatistics Working Paper Series. Working Paper 275.