When should pre-trends be parallel?

Dalia Ghanem (UC Davis) · Pedro H. C. Sant'Anna (Emory) · Kaspar Wüthrich (Michigan)

⚡ TL;DR

This paper asks a deceptively simple methodological question: when is a pre-trend test informative about whether the post-trend parallel-trends assumption holds? The answer is: only under specific selection mechanisms. If treated and control units differ along time-invariant unobservables (classical fixed effects), pre-trend tests can be informative. If selection is based on pre-treatment information, pre-trend tests can be uninformative — researchers may wrongly discard valid DiD designs based on a failing pre-test. If selection is on time-varying unobservables, pre-trend tests can be misleading. Pre-trends tests should not be treated as a binary go/no-go gate; they require an explicit selection model to interpret.

This paper is a sophisticated companion to two recent papers that triangulate on the same theme from different angles: Roth (2022) "Pretest with Caution" warns about statistical issues with pre-tests (low power, gating-induced bias); Roth & Sant'Anna (2023, Econometrica) "When Is Parallel Trends Sensitive to Functional Form?" shows that levels vs logs are non-nested assumptions; and GSW (this paper) shows that the identification content of pre-trend tests depends on the underlying selection mechanism.

🧩 Setup

Consider a DiD setting with \(n\) units indexed by \(i = 1, \dots, n\) observed for three periods \(t \in \{-1, 0, 1\}\). In \(t \in \{-1, 0\}\) no unit is treated. In \(t = 1\), some units select into treatment. Let \(G_i = 1\) if treated and \(G_i = 0\) otherwise.

The parameter of interest is the ATT at period \(t = 1\):

\[\text{ATT} = E[Y_{i1}(1) - Y_{i1}(0) \mid G_i = 1]\]

To identify the ATT, DiD methods rely on the following parallel post-trends assumption:

Assumption PT: \(E[Y_{i1}(0) - Y_{i0}(0) \mid G_i = 1] = E[Y_{i1}(0) - Y_{i0}(0) \mid G_i = 0]\)

Assumption PT is fundamentally untestable. Researchers routinely assess its validity via the pre-treatment analog:

Assumption PPT (Parallel Pre-Trends): \(E[Y_{i0}(0) - Y_{i,-1}(0) \mid G_i = 1] = E[Y_{i0}(0) - Y_{i,-1}(0) \mid G_i = 0]\)

Unlike PT, Assumption PPT is directly testable, since it implies \(E[Y_{i0} - Y_{i,-1} \mid G_i = 1] = E[Y_{i0} - Y_{i,-1} \mid G_i = 0]\) — i.e., the standard pre-trend regression test.

The authors maintain a standard TWFE data-generating model for untreated potential outcomes:

\[Y_{it}(0) = \alpha_i + \lambda_t + \varepsilon_{it}, \quad E[\varepsilon_{it}] = 0\]

and a flexible selection mechanism \(G_i = g(\omega_i, v_i)\) where \(\omega_i\) is some subvector of unobservables the units select on and \(v_i\) is a selection-specific error.

🎯 The four scenarios + the central question

For any DiD application, exactly one of the following must hold:

(a) Pre- AND post-trends parallel — both Assumptions PPT and PT hold.
(b) Pre-trends NOT parallel but post-trends parallel — pre-test fails, but PT still identifies ATT.
(c) Pre-trends parallel but post-trends NOT parallel — pre-test passes, but DiD is biased.
(d) Neither parallel.

The paper's central question: under what selection mechanisms is scenario (a) — joint parallelism — what we actually have, so that a passing pre-test is genuine evidence in favor of PT? Equivalently: when does scenario (c) — the "false-positive" case where pre-trends pass but PT fails — get ruled out by the selection mechanism?

📐 Main results

Result 1 — Selection on time-invariant unobservables only (Proposition 1)

Suppose units select on only fixed effects: \(\omega_i = (\alpha_i, \mu_i)\) where \(\mu_i\) is some other time-invariant characteristic. This is the classical case: job-training programs where selection is on permanent earnings (Ashenfelter-Card 1985), or state policies where selection is on time-invariant state characteristics.

Proposition 1: Assumptions PPT and PT hold jointly for all nondegenerate selection mechanisms \(g\) if and only if:

\[E[\varepsilon_{i1} \mid \alpha_i, \mu_i] = E[\varepsilon_{i0} \mid \alpha_i, \mu_i] = E[\varepsilon_{i,-1} \mid \alpha_i, \mu_i]\]

This is a time-homogeneity assumption on the conditional mean of \(\varepsilon_{it}\) given \((\alpha_i, \mu_i)\). It is closely related to classical strict-exogeneity assumptions in fixed-effects models.

Takeaway: In settings where selection is on fixed effects (and absent structural breaks), Assumptions PPT and PT hold jointly under standard time-series restrictions. Pre-trends tests can be informative in this case. ✓

Result 2 — Selection on pre-treatment information (Proposition 2)

Suppose units select based on information available in the pre-treatment period: \(\omega_i = (\alpha_i, \varepsilon_i^0, \mu_i, \eta_i^0)\), plausible in many applications (e.g., job-training program selection based on pre-program earnings, or selection on the discounted sum of future earnings conditional on the pre-treatment information set).

Proposition 2: Assumptions PPT and PT hold jointly for all nondegenerate \(g\) if and only if:

\[\varepsilon_{i,-1} = \varepsilon_{i0} \text{ and } E[\varepsilon_{i1} \mid \alpha_i, \varepsilon_i^0, \mu_i, \eta_i^0] = \varepsilon_{i0}\]

The condition \(\varepsilon_{i,-1} = \varepsilon_{i0}\) means there are no time-varying shocks in the pre-treatment period. This is highly restrictive: an AR(1) error \(\varepsilon_{it} = \rho \varepsilon_{i,t-1} + \zeta_{it}\) with \(\zeta_{it} \sim WN(0, \sigma^2)\) only satisfies this when \(\sigma^2 = 0\), i.e., the white-noise process is degenerate.

Takeaway: When selection is based on pre-treatment information, the joint parallel-trends conditions are very restrictive and likely fail in most applications. Pre-trend tests can therefore be uninformative about post-trends — and can lead to DiD designs being wrongly discarded. ✗

Result 3 — Selection on time-varying unobservables (Section II.C)

Even more restrictive. Consider the case where units select on the (known to them) treatment effect \(\tau_{i1} = Y_{i1}(1) - Y_{i1}(0)\). Necessary & sufficient conditions become:

\(E[\varepsilon_{i0} - \varepsilon_{i,-1} \mid \tau_{i1}] = 0\)
\(E[\varepsilon_{i1} - \varepsilon_{i0} \mid \tau_{i1}] = 0\)

Under white-noise errors, condition (ii) generally fails since \(\tau_{i1}\) is typically a function of \(\varepsilon_{i1}\). Pre-trend tests can be misleading in this case. ✗

📊 Additional issues with time-varying covariates

Section III of the paper shows that pre-trends tests can also be misleading when researchers condition on pre-treatment values of time-varying covariates, even if those covariates are exogenous.

The model becomes \(Y_{it}(0) = \gamma_t(X_{it}) + \alpha_i + \lambda_t + \varepsilon_{it}\) with \(E[\varepsilon_{it} \mid X_i] = 0\). If we control only for the pre-treatment value \(X_i^0 = (X_{i,-1}, X_{i0})\) (the conditional analog of PT becomes PTX, and of PPT becomes PPTX), the necessary & sufficient conditions involve two equations — one for pre-trends, one for post-trends. The pre-trends condition involves only pre-period \(\varepsilon\)'s; the post-trends condition has an additional term:

\[0 = E[\varepsilon_{i1} - \varepsilon_{i0} \mid X_i^0, \omega_i] + E[\gamma_1(X_{i1}) \mid X_i^0, \omega_i] - E[\gamma_1(X_{i1}) \mid X_i^0]\]

If \(X_{i1}\) is correlated with the fixed effect \(\alpha_i\) — which is precisely why one uses fixed effects in the first place — the additional term \(E[\gamma_1(X_{i1}) \mid X_i^0, \omega_i] - E[\gamma_1(X_{i1}) \mid X_i^0]\) need not vanish, so PTX can fail even when PPTX holds.

Mitigation: condition on the entire time series of covariates \(X_i = (X_{i,-1}, X_{i0}, X_{i1})\), not just the pre-period values. This requires the covariates to be exogenous to the treatment — a potentially strong assumption (see Caetano et al. 2022).

🛠️ Implications for practice (Section IV)

Pre-trends tests can be uninformative even without structural breaks. The fact that pre-trends are flat does not by itself mean post-trends will be parallel — it depends on selection.
Correctly interpreting pre-trends tests requires understanding the selection mechanism — i.e., why units chose treatment. Pre-trend tests cannot replace economic arguments for parallel post-trends.
If units select on pre-treatment information (a common case), pre-trend tests are typically not informative, and researchers may wrongly discard valid DiD designs (Type-II error).
Conditioning only on pre-treatment values of time-varying covariates can lead to PT failures even if pre-trends are parallel and covariates are exogenous. Solution: control for the entire time series of exogenous covariates.

🧭 How this fits in the broader DiD literature

Three closely related papers triangulate the modern view on parallel trends:

Roth (2022, AER: Insights) — "Pretest with Caution" — addresses the statistical problems with pre-tests: low power against the violations that matter; conditioning publication on passing a pre-test biases inference.
Roth & Sant'Anna (2023, Econometrica) — "When Is Parallel Trends Sensitive to Functional Form?" — shows that parallel trends in levels vs. logs are non-nested identifying assumptions; the functional form is itself a researcher's choice.
Ghanem, Sant'Anna & Wüthrich (2026, AEA P&P) — THIS PAPER — shows that the identification content of pre-trend tests depends on the selection mechanism. Pre-trends are informative only when selection is on fixed effects (under time-homogeneity).

The companion paper Ghanem-Sant'Anna-Wüthrich (2025) "Selection and Parallel Trends" (arXiv 2203.09001) provides the full theoretical machinery; the AEA P&P piece is the digestible summary of the main implications.

The natural complementary tools for handling situations where pre-tests are uninformative or post-trends may fail:

Rambachan & Roth (2023, REStud) — Honest sensitivity bounds: report what post-treatment effects are consistent with a smoothness restriction on possible PT violations.
Kwon & Roth (2024, AEA P&P) — Empirical Bayes approaches: shrink the PT violation toward a prior calibrated from pre-period data.
Lu (2026, arXiv) — "In Defense of the Pre-Test" — modern pushback: argues conditional extrapolation makes the pre-test informative under transparent assumptions.

📥 Read the paper

All versions of this paper:

Local PDF (140 KB) — instant, no external request
Sant'Anna's site (canonical version)
AEA Papers & Proceedings journal landing page
Companion paper: GSW (2025) "Selection and Parallel Trends" — the longer paper with full theory

Literature Readings