Seminar by Anqi Zhao

Friday, January 13, 2023 12:00 pm - 12:00 pm EST (GMT -05:00)

Please Note:ÌýThis seminar will be given virtually.

Department Seminar

Anqi Zhao
National University of Singapore

ToÌýAdjustÌýorÌýnotÌýtoÌýAdjust? Estimating the AverageÌýTreatment Effect in Randomized Experiments withÌýMissing CovariatesÌý

Randomized experiments allow for consistent estimation of the average treatment effect based on the difference in mean outcomes without strong modeling assumptions. Appropriate use of pretreatment covariates can further improve the estimation efficiency. Missingness in covariates is nevertheless common in practice and raises an important question: should weÌýadjustÌýfor covariates subjectÌýtoÌýmissingness, and if so, how? The unadjusted difference in means is always unbiased. The complete-covariate analysisÌýadjusts for all completely observed covariates and is asymptotically more efficient than the difference in means if at least one completely observed covariate is predictive of the outcome. Then what is the additional gain ofÌýadjusting for covariates subjectÌýtoÌýmissingness?ÌýToÌýreconcile the conflicting recommendations in the literature, we analyze and compare five strategies for handling missing covariates in randomized experiments under the design-based framework, and recommend the missingness-indicator method, asÌýa known butÌýnotÌýso popular strategy in the literature, dueÌýtoÌýits multiple advantages. First, it removes the dependence of the regression-adjusted estimators on the imputed values for the missing covariates. Second, it doesÌýnotÌýrequire modeling the missingness mechanism, and yields consistent estimators even when the missingness mechanism is relatedÌýtoÌýthe missing covariates and unobservable potential outcomes. Third, it ensures large-sample efficiency over the complete-covariate analysis and the analysisÌýbased on only the imputed covariates. Lastly, it is easyÌýtoÌýimplement via least squares. We also propose modificationsÌýtoÌýit based on asymptotic and finite sample considerations. Importantly, our theory views randomization as the basis for inference, and doesÌýnotÌýimpose any modeling assumptions on the data generating processÌýorÌýmissingness mechanism.ÌýÌý