Better Understanding Triple Differences Estimators

⚡ TL;DR

Modern treatment of triple-differences (DDD) estimators — DiD's natural extension when a third dimension of variation is available. Shows that common DDD implementations are invalid when conditioning on covariates is required, and that in staggered settings, pooling not-yet-treated units as controls introduces additional bias. Proposes regression-adjustment, IPW, and doubly-robust DDD estimators that remain valid. Companion R package: triplediff.

🧩 Setup & motivation

Triple-differences (DDD) designs are widely used to relax parallel-trends assumptions in DiD. A typical setup: units are observed in two groups (treated industry vs control industry) × two regions (high-exposure state vs low-exposure state) × two periods. The DDD difference compares the DiD in the treated industry to the DiD in the control industry.

The paper shows that the usual DDD implementations have hidden problems: (i) the "difference of two DiDs" approach is biased when identification requires covariate conditioning; (ii) the three-way fixed-effects regression is also biased in the same case; (iii) in staggered adoption settings, the common practice of pooling all not-yet-treated units introduces bias even without covariates.

📐 Main results

The problem with common implementations

Standard DDD takes the form \(\hat\tau_{\text{DDD}} = (\bar Y_{1,T,A} - \bar Y_{1,C,A}) - (\bar Y_{0,T,A} - \bar Y_{0,C,A}) - [(\bar Y_{1,T,B} - \bar Y_{1,C,B}) - (\bar Y_{0,T,B} - \bar Y_{0,C,B})]\) where indices are period × group × stratum. This identifies the DDD-ATT only under a strong parallel-trends assumption that is generally not equivalent to the implicit identifying assumption in covariate-adjusted versions.

Three valid estimators

The paper proposes:

Regression-adjustment DDD: model the conditional expectation of the outcome difference and impute counterfactuals.
Inverse-probability-weighted DDD: weight observations by the propensity of being in each cell.
Doubly-robust DDD: combines RA + IPW so identification holds if either is correctly specified.

Staggered DDD

In staggered settings, the paper shows that pooling all not-yet-treated units biases the DDD-ATT due to differential composition. Solution: cohort-by-cohort DDD estimation followed by aggregation, similar to the Callaway-Sant'Anna logic for DiD.

🛠️ Implications for practice

If you're running DDD, replace the "difference of two DiDs" with the doubly-robust estimator from this paper.
For staggered DDD, do cohort-specific estimation before aggregation.
The R package triplediff implements all three estimators with a single call.

🧭 Where this sits in the broader DiD literature

Direct extension of Sant'Anna-Zhao (2020, J Econometrics) doubly-robust DiD framework to triple differences. Cited as the modern DDD reference in the BCCGS 2026 JEL guide. Related to Olden-Møen (2022) on triple-differences identification.

📥 Read the paper

Local PDF (1.3 MB) — instant, no external request
arXiv 2505.09942

Literature Readings