Individual-level data are often not publicly available due to confidentiality. Instead, masked data are released for public use. However, analyses performed using masked data may produce invalid statistical results such as biased parameter estimates or incorrect standard errors. In this paper, we propose a data masking method using spatial smoothing, and we investigate the bias of parameter estimates resulting from analyses using the masked data for Generalized Linear Models (GLM). The method allows for varying both the form and the degree of masking by utilizing a smoothing weight function and a smoothness parameter. We show that data masking by using a smoothing weight function that accounts for the prior knowledge on the spatial pattern of exposure may lead to less biased parameter estimates when using the masked data for analyses. Under our method, first-order bias of the association between regressors and outcome when estimated using the masked data has a closed-form expression.

We apply the method to the study of racial disparities in mortality rates using data on more than 4 million Medicare enrollees residing in 2095 zip codes in the Northeast region of the United States. We find that the bias of the estimated association between race and mortality rates when using the masked data is highly sensitive to both the form and the degree of masking.



Included in

Biostatistics Commons