Subset Selection in Explanatory Regression Analyses


Paper copy may be requested from biostat@berkeley.edu. Include a surface mail address with your request.


We present a new, data-driven method for automatically choosing a good subset of potential confounding variables to include in an explanatory linear regression model. This same model selection scheme can also be used in a less focused analysis to simply identify those variables that are jointly related to the response. Our procedure differs from most existing subset selection procedures in that it is directly aimed at finding a model family which gives rise to good estimates of specified coefficients rather than predicted values. The performance of our method is demonstrated in a simulation study where interest is focused one estimating a particular conditional association. The relative impact of selection bias due to dual use of the data to both select the subset and estimate the regression parameters is also investigated.


Statistical Models

This document is currently not available here.