Subset Selection in Explanatory Regression Analyses
We present a new, data-driven method for automatically choosing a good subset of potential confounding variables to include in an explanatory linear regression model. This same model selection scheme can also be used in a less focused analysis to simply identify those variables that are jointly related to the response. Our procedure differs from most existing subset selection procedures in that it is directly aimed at finding a model family which gives rise to good estimates of specified coefficients rather than predicted values. The performance of our method is demonstrated in a simulation study where interest is focused one estimating a particular conditional association. The relative impact of selection bias due to dual use of the data to both select the subset and estimate the regression parameters is also investigated.
Peterson, Derick R. and Brand, Richard J., "Subset Selection in Explanatory Regression Analyses" (April 1998). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 72.