"Nonparametric variable importance assessment using machine learning te" by Brian D. Williamson, Peter B. Gilbert et al.

UW Biostatistics Working Paper Series

Title

Nonparametric variable importance assessment using machine learning techniques

Authors

Brian D. Williamson, Department of Biostatistics, University of WashingtonFollow
Peter B. Gilbert, Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterFollow
Noah Simon, Department of Biostatistics, University of WashingtonFollow
Marco Carone, Department of Biostatistics, University of WashingtonFollow

Abstract

In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often sub-optimal for predicting response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a novel variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. Specifically, we propose a generalization of the ANOVA variable importance measure, and discuss how it facilitates the use of possibly-complex machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. Using the tools of targeted learning, we also describe how to construct an efficient estimator of this measure, as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of the median house price in the Boston area, and a study of risk factors for cardiovascular disease in South Africa.

Disciplines

Biostatistics

Suggested Citation

Williamson, Brian D.; Gilbert, Peter B.; Simon, Noah; and Carone, Marco, "Nonparametric variable importance assessment using machine learning techniques" (August 2017). UW Biostatistics Working Paper Series. Working Paper 422.
https://biostats.bepress.com/uwbiostat/paper422

Download

Included in

Biostatistics Commons

COinS

Collection of Biostatistics Research Archive

UW Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

UW Biostatistics

Collection of Biostatistics Research Archive

UW Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

UW Biostatistics