Selection and Parallel Trends — Literature Readings

Dalia Ghanem (UC Davis) · Pedro H. C. Sant'Anna (Emory) · Kaspar Wüthrich (Michigan)

⚡ TL;DR

This is the longer working paper that the AEA P&P 2026 piece distills into 8 pages. It develops a unified framework for analyzing when parallel-trends assumptions hold and when pre-trends tests are informative, indexed by the unit's selection mechanism into treatment. The AEA P&P version covers Propositions 1–2 (selection on fixed effects; selection on pre-treatment information); the full arXiv paper goes further into general selection classes, structural breaks, time-varying covariates, and the full set of testable conditional implications.

🧩 Setup & motivation

The framework formalizes the selection mechanism as \(G_i = g(\omega_i, v_i)\), where \(\omega_i\) is a subvector of unobservables the units select on and \(v_i\) is a selection-specific independent error. By varying the contents of \(\omega_i\), the authors characterize different "leading classes" of selection: (i) fixed-effects only; (ii) pre-treatment observables; (iii) time-varying unobservables; (iv) selection on the treatment effect itself.

For each class, the paper provides necessary and sufficient conditions on the data-generating process under which Assumptions PT (parallel post-trends) and PPT (parallel pre-trends) hold jointly for all nondegenerate selection mechanisms within the class. The "for all" quantification is crucial: it gives conditions on the DGP that are robust to the specific selection function the researcher does not see.

📐 Main results

The general lemma (Theorem 3.1)

The paper's central technical contribution is a general lemma that characterizes joint parallelism: for any choice of \(\omega_i\), Assumptions PPT and PT hold jointly for all nondegenerate selection mechanisms in the class \(\mathscr{G}_\omega\) if and only if \(E[\varepsilon_{i1} - \varepsilon_{i0} | \omega_i] = E[\varepsilon_{i0} - \varepsilon_{i,-1} | \omega_i] = 0\).

Specializations

Fixed-effects selection \(\omega_i = (\alpha_i, \mu_i)\): the necessary-and-sufficient condition reduces to a time-homogeneity assumption on the conditional mean of \(\varepsilon_{it}\). Pre-trends tests are informative.
Selection on pre-treatment information \(\omega_i = (\alpha_i, \varepsilon_i^0, \mu_i, \eta_i^0)\): requires a martingale-type restriction; in the AR(1) case, it forces the white-noise innovation variance to zero. Pre-trends tests can be uninformative.
Selection on time-varying unobservables: requires that the selection-driving residuals share the same time-series dependence as the outcome residuals — typically misleading in applied work.

Covariate extension

With time-varying exogenous covariates \(X_{it}\), the paper shows that conditional parallel post-trends require a separability assumption on the outcome model: \(Y_{it}(0) = \gamma_t(X_{it}) + \alpha_i + \lambda_t + \varepsilon_{it}\). Controlling on only pre-treatment covariate values can lead to failures of conditional PT even with exogenous covariates — see Caetano-Callaway-Payne-Sant'Anna Rodrigues (2022) for the practical guidance.

🛠️ Implications for practice

Treat pre-trend tests as conditional on the selection mechanism — and articulate that mechanism explicitly.
If selection is plausibly on pre-treatment outcomes (program selection, Roy-model selection), be sceptical that a passing pre-test implies validity of PT.
If selection is on fixed effects + standard time-series, pre-tests provide genuine evidence — but you should still pair them with Rambachan-Roth honest bounds and Roth-Sant'Anna functional-form checks.
For covariate-adjusted designs, prefer controlling on the entire covariate time series over controlling on pre-period values only.

🧭 Where this sits in the broader DiD literature

This paper provides the full theoretical machinery behind the more digestible AEA P&P piece (GSW 2026). Together with Roth-Sant'Anna 2023 Econometrica (functional form sensitivity), Roth 2022 AER:I (statistical issues with pre-tests), and Rambachan-Roth 2023 REStud (honest sensitivity bounds), it forms the modern foundational set on the parallel-trends assumption. Forms the theoretical basis for the upcoming BCCGS JEL 2026 practitioner's guide.

📥 Read the paper

Local PDF (870 KB) — instant, no external request
arXiv