December 2016

IZA DP No. 10402: Missing Data, Imputation, and Endogeneity

published in: Journal of Econometrics, 2017, 199 (2), 141-155

Basmann (Basmann, R.L., 1957, A generalized classical method of linear estimation of coefficients in a structural equation. Econometrica 25, 77-83; Basmann, R.L., 1959, The computation of generalized classical estimates of coefficients in a structural equation. Econometrica 27, 72-81) introduced two-stage least squares (2SLS). In subsequent work, Basmann (Basmann, R.L., F.L. Brown, W.S. Dawes and G.K. Schoepfle, 1971, Exact finite sample density functions of GCL estimators of structural coefficients in a leading exactly identifiable case. Journal of the American Statistical Association 66, 122-126) investigated its finite sample performance. Here, we build on this tradition focusing on the issue of 2SLS estimation of a structural model when data on the endogenous covariate is missing for some observations. Many such imputation techniques have been proposed in the literature. However, there is little guidance available for choosing among existing techniques, particularly when the covariate being imputed is endogenous. Moreover, because the finite sample bias of 2SLS is not monotonically decreasing in the degree of measurement accuracy, the most accurate imputation method is not necessarily the method that minimizes the bias of 2SLS. Instead, we explore imputation methods designed to increase the first-stage strength of the instrument(s), even if such methods entail lower imputation accuracy. We do so via simulations as well as with an application related to the medium-run effects of birth weight.