πŸ”’

Literature Readings

Reading notes β€” please enter access password

βœ— Incorrect password

Literature Readings

DiD Methodology Β· Paper Detail

←All Readings ←DiD Methodology

πŸ“„ Paper Sections

⚑TL;DR 🧩Setup 🎯Four Scenarios πŸ“Main Results πŸ“ŠCovariate Issues πŸ› οΈFor Practice πŸ“₯PDF

Literature Readings Β· DiD Β· Paper Detail

When should pre-trends be parallel?

Dalia Ghanem (UC Davis) Β· Pedro H. C. Sant'Anna (Emory) Β· Kaspar WΓΌthrich (Michigan)

AEA Papers & Proceedings 2026 Methodology Parallel Trends May 2026

πŸ“₯ Read it

Local PDF (140 KB) Sant'Anna site (original) AEA P&P journal page Companion paper "Selection & PT"

⚑ TL;DR

This paper asks a deceptively simple methodological question: when is a pre-trend test informative about whether the post-trend parallel-trends assumption holds? The answer is: only under specific selection mechanisms. If treated and control units differ along time-invariant unobservables (classical fixed effects), pre-trend tests can be informative. If selection is based on pre-treatment information, pre-trend tests can be uninformative β€” researchers may wrongly discard valid DiD designs based on a failing pre-test. If selection is on time-varying unobservables, pre-trend tests can be misleading. Pre-trends tests should not be treated as a binary go/no-go gate; they require an explicit selection model to interpret.

This paper is a sophisticated companion to two recent papers that triangulate on the same theme from different angles: Roth (2022) "Pretest with Caution" warns about statistical issues with pre-tests (low power, gating-induced bias); Roth & Sant'Anna (2023, Econometrica) "When Is Parallel Trends Sensitive to Functional Form?" shows that levels vs logs are non-nested assumptions; and GSW (this paper) shows that the identification content of pre-trend tests depends on the underlying selection mechanism.

🧩 Setup

Consider a DiD setting with \(n\) units indexed by \(i = 1, \dots, n\) observed for three periods \(t \in \{-1, 0, 1\}\). In \(t \in \{-1, 0\}\) no unit is treated. In \(t = 1\), some units select into treatment. Let \(G_i = 1\) if treated and \(G_i = 0\) otherwise.

The parameter of interest is the ATT at period \(t = 1\):

\[\text{ATT} = E[Y_{i1}(1) - Y_{i1}(0) \mid G_i = 1]\]

To identify the ATT, DiD methods rely on the following parallel post-trends assumption:

Assumption PT: \(E[Y_{i1}(0) - Y_{i0}(0) \mid G_i = 1] = E[Y_{i1}(0) - Y_{i0}(0) \mid G_i = 0]\)

Assumption PT is fundamentally untestable. Researchers routinely assess its validity via the pre-treatment analog:

Assumption PPT (Parallel Pre-Trends): \(E[Y_{i0}(0) - Y_{i,-1}(0) \mid G_i = 1] = E[Y_{i0}(0) - Y_{i,-1}(0) \mid G_i = 0]\)

Unlike PT, Assumption PPT is directly testable, since it implies \(E[Y_{i0} - Y_{i,-1} \mid G_i = 1] = E[Y_{i0} - Y_{i,-1} \mid G_i = 0]\) β€” i.e., the standard pre-trend regression test.

The authors maintain a standard TWFE data-generating model for untreated potential outcomes:

\[Y_{it}(0) = \alpha_i + \lambda_t + \varepsilon_{it}, \quad E[\varepsilon_{it}] = 0\]

and a flexible selection mechanism \(G_i = g(\omega_i, v_i)\) where \(\omega_i\) is some subvector of unobservables the units select on and \(v_i\) is a selection-specific error.

🎯 The four scenarios + the central question

For any DiD application, exactly one of the following must hold:

  1. (a) Pre- AND post-trends parallel β€” both Assumptions PPT and PT hold.
  2. (b) Pre-trends NOT parallel but post-trends parallel β€” pre-test fails, but PT still identifies ATT.
  3. (c) Pre-trends parallel but post-trends NOT parallel β€” pre-test passes, but DiD is biased.
  4. (d) Neither parallel.

The paper's central question: under what selection mechanisms is scenario (a) β€” joint parallelism β€” what we actually have, so that a passing pre-test is genuine evidence in favor of PT? Equivalently: when does scenario (c) β€” the "false-positive" case where pre-trends pass but PT fails β€” get ruled out by the selection mechanism?

πŸ“ Main results

Result 1 β€” Selection on time-invariant unobservables only (Proposition 1)

Suppose units select on only fixed effects: \(\omega_i = (\alpha_i, \mu_i)\) where \(\mu_i\) is some other time-invariant characteristic. This is the classical case: job-training programs where selection is on permanent earnings (Ashenfelter-Card 1985), or state policies where selection is on time-invariant state characteristics.

Proposition 1: Assumptions PPT and PT hold jointly for all nondegenerate selection mechanisms \(g\) if and only if:

\[E[\varepsilon_{i1} \mid \alpha_i, \mu_i] = E[\varepsilon_{i0} \mid \alpha_i, \mu_i] = E[\varepsilon_{i,-1} \mid \alpha_i, \mu_i]\]

This is a time-homogeneity assumption on the conditional mean of \(\varepsilon_{it}\) given \((\alpha_i, \mu_i)\). It is closely related to classical strict-exogeneity assumptions in fixed-effects models.

Takeaway: In settings where selection is on fixed effects (and absent structural breaks), Assumptions PPT and PT hold jointly under standard time-series restrictions. Pre-trends tests can be informative in this case. βœ“

Result 2 β€” Selection on pre-treatment information (Proposition 2)

Suppose units select based on information available in the pre-treatment period: \(\omega_i = (\alpha_i, \varepsilon_i^0, \mu_i, \eta_i^0)\), plausible in many applications (e.g., job-training program selection based on pre-program earnings, or selection on the discounted sum of future earnings conditional on the pre-treatment information set).

Proposition 2: Assumptions PPT and PT hold jointly for all nondegenerate \(g\) if and only if:

\[\varepsilon_{i,-1} = \varepsilon_{i0} \text{ and } E[\varepsilon_{i1} \mid \alpha_i, \varepsilon_i^0, \mu_i, \eta_i^0] = \varepsilon_{i0}\]

The condition \(\varepsilon_{i,-1} = \varepsilon_{i0}\) means there are no time-varying shocks in the pre-treatment period. This is highly restrictive: an AR(1) error \(\varepsilon_{it} = \rho \varepsilon_{i,t-1} + \zeta_{it}\) with \(\zeta_{it} \sim WN(0, \sigma^2)\) only satisfies this when \(\sigma^2 = 0\), i.e., the white-noise process is degenerate.

Takeaway: When selection is based on pre-treatment information, the joint parallel-trends conditions are very restrictive and likely fail in most applications. Pre-trend tests can therefore be uninformative about post-trends β€” and can lead to DiD designs being wrongly discarded. βœ—

Result 3 β€” Selection on time-varying unobservables (Section II.C)

Even more restrictive. Consider the case where units select on the (known to them) treatment effect \(\tau_{i1} = Y_{i1}(1) - Y_{i1}(0)\). Necessary & sufficient conditions become:

  • \(E[\varepsilon_{i0} - \varepsilon_{i,-1} \mid \tau_{i1}] = 0\)
  • \(E[\varepsilon_{i1} - \varepsilon_{i0} \mid \tau_{i1}] = 0\)

Under white-noise errors, condition (ii) generally fails since \(\tau_{i1}\) is typically a function of \(\varepsilon_{i1}\). Pre-trend tests can be misleading in this case. βœ—

πŸ“Š Additional issues with time-varying covariates

Section III of the paper shows that pre-trends tests can also be misleading when researchers condition on pre-treatment values of time-varying covariates, even if those covariates are exogenous.

The model becomes \(Y_{it}(0) = \gamma_t(X_{it}) + \alpha_i + \lambda_t + \varepsilon_{it}\) with \(E[\varepsilon_{it} \mid X_i] = 0\). If we control only for the pre-treatment value \(X_i^0 = (X_{i,-1}, X_{i0})\) (the conditional analog of PT becomes PTX, and of PPT becomes PPTX), the necessary & sufficient conditions involve two equations β€” one for pre-trends, one for post-trends. The pre-trends condition involves only pre-period \(\varepsilon\)'s; the post-trends condition has an additional term:

\[0 = E[\varepsilon_{i1} - \varepsilon_{i0} \mid X_i^0, \omega_i] + E[\gamma_1(X_{i1}) \mid X_i^0, \omega_i] - E[\gamma_1(X_{i1}) \mid X_i^0]\]

If \(X_{i1}\) is correlated with the fixed effect \(\alpha_i\) β€” which is precisely why one uses fixed effects in the first place β€” the additional term \(E[\gamma_1(X_{i1}) \mid X_i^0, \omega_i] - E[\gamma_1(X_{i1}) \mid X_i^0]\) need not vanish, so PTX can fail even when PPTX holds.

Mitigation: condition on the entire time series of covariates \(X_i = (X_{i,-1}, X_{i0}, X_{i1})\), not just the pre-period values. This requires the covariates to be exogenous to the treatment β€” a potentially strong assumption (see Caetano et al. 2022).

πŸ› οΈ Implications for practice (Section IV)

  1. Pre-trends tests can be uninformative even without structural breaks. The fact that pre-trends are flat does not by itself mean post-trends will be parallel β€” it depends on selection.
  2. Correctly interpreting pre-trends tests requires understanding the selection mechanism β€” i.e., why units chose treatment. Pre-trend tests cannot replace economic arguments for parallel post-trends.
  3. If units select on pre-treatment information (a common case), pre-trend tests are typically not informative, and researchers may wrongly discard valid DiD designs (Type-II error).
  4. Conditioning only on pre-treatment values of time-varying covariates can lead to PT failures even if pre-trends are parallel and covariates are exogenous. Solution: control for the entire time series of exogenous covariates.

🧭 How this fits in the broader DiD literature

Three closely related papers triangulate the modern view on parallel trends:

  • Roth (2022, AER: Insights) β€” "Pretest with Caution" β€” addresses the statistical problems with pre-tests: low power against the violations that matter; conditioning publication on passing a pre-test biases inference.
  • Roth & Sant'Anna (2023, Econometrica) β€” "When Is Parallel Trends Sensitive to Functional Form?" β€” shows that parallel trends in levels vs. logs are non-nested identifying assumptions; the functional form is itself a researcher's choice.
  • Ghanem, Sant'Anna & WΓΌthrich (2026, AEA P&P) β€” THIS PAPER β€” shows that the identification content of pre-trend tests depends on the selection mechanism. Pre-trends are informative only when selection is on fixed effects (under time-homogeneity).

The companion paper Ghanem-Sant'Anna-WΓΌthrich (2025) "Selection and Parallel Trends" (arXiv 2203.09001) provides the full theoretical machinery; the AEA P&P piece is the digestible summary of the main implications.

The natural complementary tools for handling situations where pre-tests are uninformative or post-trends may fail:

  • Rambachan & Roth (2023, REStud) β€” Honest sensitivity bounds: report what post-treatment effects are consistent with a smoothness restriction on possible PT violations.
  • Kwon & Roth (2024, AEA P&P) β€” Empirical Bayes approaches: shrink the PT violation toward a prior calibrated from pre-period data.
  • Lu (2026, arXiv) β€” "In Defense of the Pre-Test" β€” modern pushback: argues conditional extrapolation makes the pre-test informative under transparent assumptions.

πŸ“₯ Read the paper

All versions of this paper:

  • Local PDF (140 KB) β€” instant, no external request
  • Sant'Anna's site (canonical version)
  • AEA Papers & Proceedings journal landing page
  • Companion paper: GSW (2025) "Selection and Parallel Trends" β€” the longer paper with full theory