Research

Publications

Robustness to missing data: breakdown point analysis
Journal of Econometrics, Vol. 253 (2026)
[arXiv] [Slides]

Missing data is pervasive in econometric applications, and rarely is it plausible that the data are missing (completely) at random. This paper proposes a methodology for studying the robustness of results drawn from incomplete datasets. Selection is measured as the divergence from the distribution of complete observations to the distribution of incomplete observations. The breakdown point is defined as the minimal amount of selection needed to overturn a given result. Reporting point estimates and lower confidence intervals of the breakdown point is a simple, concise way to communicate the robustness of a result. An estimator of the breakdown point is proposed and shown root-n consistent and asymptotically normal. This estimator can be applied directly to conclusions drawn from any model identified with the generalized method of moments (GMM) that satisfies mild assumptions. Simulations demonstrate the finite sample performance of the breakdown point estimator on averages, linear regression, and logistic regression. The methodology is illustrated by estimating the breakdown point of conclusions drawn from several randomized controlled trails suffering from missing data due to attrition.

Working Papers

Estimating Functionals of the Joint Distribution of Potential Outcomes with Optimal Transport
[Draft] [Slides]

Many causal parameters depend on a moment of the joint distribution of potential outcomes. Such parameters are especially relevant in policy evaluation settings, where noncompliance is common and accommodated through the model of Imbens & Angrist (1994). This paper shows that the sharp identified set for these parameters is an interval with endpoints characterized by the value of optimal transport problems. Sample analogue estimators are proposed based on the dual problem of optimal transport. These estimators are root-n consistent and converge in distribution under mild assumptions. Inference procedures based on the bootstrap are straightforward and computationally convenient. The ideas and estimators are demonstrated in an application revisiting the National Supported Work Demonstration job training program. I find suggestive evidence that workers who would see below average earnings without treatment tend to see above average benefits from treatment.