β‘ TL;DR
Modern treatment of triple-differences (DDD) estimators β DiD's natural extension when a third dimension of variation is available. Shows that common DDD implementations are invalid when conditioning on covariates is required, and that in staggered settings, pooling not-yet-treated units as controls introduces additional bias. Proposes regression-adjustment, IPW, and doubly-robust DDD estimators that remain valid. Companion R package: triplediff.
π§© Setup & motivation
Triple-differences (DDD) designs are widely used to relax parallel-trends assumptions in DiD. A typical setup: units are observed in two groups (treated industry vs control industry) Γ two regions (high-exposure state vs low-exposure state) Γ two periods. The DDD difference compares the DiD in the treated industry to the DiD in the control industry.
The paper shows that the usual DDD implementations have hidden problems: (i) the "difference of two DiDs" approach is biased when identification requires covariate conditioning; (ii) the three-way fixed-effects regression is also biased in the same case; (iii) in staggered adoption settings, the common practice of pooling all not-yet-treated units introduces bias even without covariates.
π Main results
The problem with common implementations
Standard DDD takes the form \(\hat\tau_{\text{DDD}} = (\bar Y_{1,T,A} - \bar Y_{1,C,A}) - (\bar Y_{0,T,A} - \bar Y_{0,C,A}) - [(\bar Y_{1,T,B} - \bar Y_{1,C,B}) - (\bar Y_{0,T,B} - \bar Y_{0,C,B})]\) where indices are period Γ group Γ stratum. This identifies the DDD-ATT only under a strong parallel-trends assumption that is generally not equivalent to the implicit identifying assumption in covariate-adjusted versions.
Three valid estimators
The paper proposes:
- Regression-adjustment DDD: model the conditional expectation of the outcome difference and impute counterfactuals.
- Inverse-probability-weighted DDD: weight observations by the propensity of being in each cell.
- Doubly-robust DDD: combines RA + IPW so identification holds if either is correctly specified.
Staggered DDD
In staggered settings, the paper shows that pooling all not-yet-treated units biases the DDD-ATT due to differential composition. Solution: cohort-by-cohort DDD estimation followed by aggregation, similar to the Callaway-Sant'Anna logic for DiD.
π οΈ Implications for practice
- If you're running DDD, replace the "difference of two DiDs" with the doubly-robust estimator from this paper.
- For staggered DDD, do cohort-specific estimation before aggregation.
- The R package
triplediffimplements all three estimators with a single call.
π§ Where this sits in the broader DiD literature
Direct extension of Sant'Anna-Zhao (2020, J Econometrics) doubly-robust DiD framework to triple differences. Cited as the modern DDD reference in the BCCGS 2026 JEL guide. Related to Olden-MΓΈen (2022) on triple-differences identification.
π₯ Read the paper
- Local PDF (1.3 MB) β instant, no external request
- arXiv 2505.09942