Interpreting Event-Studies from Recent Difference-in-Differences Methods

⚡ TL;DR

A short, sharp practitioner warning: default event-study plots produced by software for the post-2020 DiD methods (Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess) do NOT match traditional TWFE event-study plots, even on non-staggered timing. The new methods construct pre-treatment coefficients asymmetrically from post-treatment coefficients. As a result, visual heuristics for evaluating parallel-trends violations developed for TWFE event-studies cannot be transported to the new plots.

🧩 Setup & motivation

For 30+ years, applied econometrics has built visual intuition around the TWFE event-study plot: pre-treatment coefficients close to zero are "good", a kink at \(t = 0\) indicates the treatment effect, a smooth line through pre and post indicates pre-trends, etc. These visual heuristics drive whether papers get accepted.

Roth points out that the new heterogeneity-robust estimators (CS, SA, BJS) produce event-study plots that look different even when the underlying data is the same. Specifically, they construct the pre-treatment coefficients using a different baseline period or weighting scheme than the post-treatment coefficients. The result: kinks and jumps that don't exist in the TWFE plot, or absent kinks where TWFE shows them.

📐 Main results

The asymmetric construction

In TWFE event-studies, pre-treatment coefficients and post-treatment coefficients are estimated symmetrically: both are deviations from a reference period (usually \(k = -1\)) using the same control comparisons.

In the new methods:

Callaway-Sant'Anna: pre-period coefficients use never-treated (or not-yet-treated) units as controls; post-period coefficients also use never-treated, but the "treated" group composition differs across event times.
Sun-Abraham IW: weights pre-treatment leads using one cohort distribution and post-treatment lags using a different cohort distribution.
Borusyak-Jaravel-Spiess: imputes the counterfactual from pre-period unit-level fixed-effects fits; pre-period coefficients are residuals from the fit, post-period are imputation differences.

Practical consequence

The same underlying data, when plotted with TWFE vs CS/SA/BJS, can show different shapes — and the "kink at treatment" that practitioners interpret as the treatment effect can be a software artifact rather than economics.

🛠️ Implications for practice

Show both the TWFE plot AND the heterogeneity-robust plot. If they look different, explain why.
Do not use TWFE visual heuristics on CS/SA/BJS plots. They are different objects.
When citing pre-trend evidence, state explicitly which estimator's plot you are reading and what its construction implies.
The recommended practice in BCCGS (2026, JEL) is to plot the estimator's own pre-trend diagnostic, not to re-use TWFE intuition.

🧭 Where this sits in the broader DiD literature

A practical follow-up to Sun-Abraham (2021), Callaway-Sant'Anna (2021), and Borusyak-Jaravel-Spiess (2024). Should be read alongside Roth (2022) "Pretest with Caution" for the full picture on parallel-trends diagnostics. Cited in the BCCGS 2026 JEL guide.

📥 Read the paper

Local PDF (987 KB) — instant, no external request
arXiv 2401.12309
Springer

Literature Readings