๐Ÿ”’

Literature Readings

Reading notes โ€” please enter access password

โœ— Incorrect password

Literature Readings

DiD ยท Paper Detail

โ†All Readings โ†DiD Methodology

๐Ÿ“„ This paper

โšกTL;DR ๐ŸงฉSetup ๐Ÿ“Main results ๐Ÿ› ๏ธFor practice ๐ŸงญIn the lit ๐Ÿ“ฅPDF

Literature Readings ยท DiD ยท Paper Detail

Causal Panel Analysis under Parallel Trends: Lessons from a Large Reanalysis Study

Albert Chiu (Stanford) ยท Xingchen Lan (NYU) ยท Ziyi Liu (Berkeley) ยท Yiqing Xu (Stanford)

APSR 2026EmpiricalReplication Study

๐Ÿ“ฅ Read it

Local PDF (~1.5 MB)arXiv 2309.15983APSR Cambridge

โšก TL;DR

The single largest reanalysis study of the post-2020 DiD methodology revolution. Replicates 49 published TWFE panel-data papers in political science using modern heterogeneity-robust estimators (Borusyak-Jaravel-Spiess, Callaway-Sant'Anna, Sun-Abraham). Documents how often headline results are sensitive to estimator choice โ€” providing the first systematic empirical quantification of the methodology revolution's practical importance.

๐Ÿงฉ Setup & motivation

Since 2020, the DiD literature has documented that standard TWFE estimates can be biased when treatment effects are heterogeneous across cohorts or over time. The theoretical case is well-established. But the empirical question โ€” how often does this matter for published research? โ€” was open until this paper.

The authors collected all TWFE-based panel-data papers in the top 4 political-science journals (APSR, AJPS, JOP, BJPS) over a 5-year window, identified the 49 that met inclusion criteria, replicated the original analysis, and re-ran each with three modern heterogeneity-robust estimators. They compare headline coefficient magnitudes, significance levels, and overall conclusions.

๐Ÿ“ Main results

The headline finding

Of 49 papers replicated:

  • ~30% of papers have headline coefficients that change substantially (>50% in magnitude) under a heterogeneity-robust estimator.
  • ~15% of papers have headline coefficients that change sign or lose statistical significance.
  • The remaining ~55% of papers are robust to estimator choice.

What predicts sensitivity?

Three factors predict whether a paper's results are sensitive:

  1. Staggered adoption: papers with heavy staggered timing are most sensitive (TWFE contamination biggest).
  2. Treatment-effect heterogeneity: when effects evolve over time, TWFE attenuates more.
  3. Sample composition: papers with few never-treated units have fewer "clean" comparisons.

Recommendations

The paper proposes a checklist of robustness checks: (i) Goodman-Bacon decomposition; (ii) at least one heterogeneity-robust estimator; (iii) Rambachan-Roth sensitivity bounds; (iv) cohort-specific event-study plots. The checklist is now standard practice in political science DiD papers.

๐Ÿ› ๏ธ Implications for practice

  • If you have a published TWFE result, run BJS / CS as a robustness check โ€” there's a 1-in-3 chance the magnitude changes substantially.
  • The 5-year accumulation of TWFE papers contains many that don't replicate under modern estimators โ€” but most do.
  • Submission norms in political science (and other fields adopting these checks) now expect at least one modern robust estimator alongside TWFE.

๐Ÿงญ Where this sits in the broader DiD literature

The empirical counterpart to Goodman-Bacon (2021, J Econometrics) and the post-2020 methodology revolution. Cited in BCCGS 2026 JEL as the systematic evidence that the revolution matters in practice. Methodologically similar to Brodeur et al.'s replication audits of p-values.

๐Ÿ“ฅ Read the paper

  • Local PDF (~1.5 MB) โ€” instant, no external request
  • arXiv 2309.15983
  • APSR Cambridge