β‘ TL;DR
This paper asks a deceptively simple methodological question: when is a pre-trend test informative about whether the post-trend parallel-trends assumption holds? The answer is: only under specific selection mechanisms. If treated and control units differ along time-invariant unobservables (classical fixed effects), pre-trend tests can be informative. If selection is based on pre-treatment information, pre-trend tests can be uninformative β researchers may wrongly discard valid DiD designs based on a failing pre-test. If selection is on time-varying unobservables, pre-trend tests can be misleading. Pre-trends tests should not be treated as a binary go/no-go gate; they require an explicit selection model to interpret.
This paper is a sophisticated companion to two recent papers that triangulate on the same theme from different angles: Roth (2022) "Pretest with Caution" warns about statistical issues with pre-tests (low power, gating-induced bias); Roth & Sant'Anna (2023, Econometrica) "When Is Parallel Trends Sensitive to Functional Form?" shows that levels vs logs are non-nested assumptions; and GSW (this paper) shows that the identification content of pre-trend tests depends on the underlying selection mechanism.
π§© Setup
Consider a DiD setting with \(n\) units indexed by \(i = 1, \dots, n\) observed for three periods \(t \in \{-1, 0, 1\}\). In \(t \in \{-1, 0\}\) no unit is treated. In \(t = 1\), some units select into treatment. Let \(G_i = 1\) if treated and \(G_i = 0\) otherwise.
The parameter of interest is the ATT at period \(t = 1\):
\[\text{ATT} = E[Y_{i1}(1) - Y_{i1}(0) \mid G_i = 1]\]To identify the ATT, DiD methods rely on the following parallel post-trends assumption:
Assumption PT: \(E[Y_{i1}(0) - Y_{i0}(0) \mid G_i = 1] = E[Y_{i1}(0) - Y_{i0}(0) \mid G_i = 0]\)
Assumption PT is fundamentally untestable. Researchers routinely assess its validity via the pre-treatment analog:
Assumption PPT (Parallel Pre-Trends): \(E[Y_{i0}(0) - Y_{i,-1}(0) \mid G_i = 1] = E[Y_{i0}(0) - Y_{i,-1}(0) \mid G_i = 0]\)
Unlike PT, Assumption PPT is directly testable, since it implies \(E[Y_{i0} - Y_{i,-1} \mid G_i = 1] = E[Y_{i0} - Y_{i,-1} \mid G_i = 0]\) β i.e., the standard pre-trend regression test.
The authors maintain a standard TWFE data-generating model for untreated potential outcomes:
\[Y_{it}(0) = \alpha_i + \lambda_t + \varepsilon_{it}, \quad E[\varepsilon_{it}] = 0\]and a flexible selection mechanism \(G_i = g(\omega_i, v_i)\) where \(\omega_i\) is some subvector of unobservables the units select on and \(v_i\) is a selection-specific error.
π― The four scenarios + the central question
For any DiD application, exactly one of the following must hold:
- (a) Pre- AND post-trends parallel β both Assumptions PPT and PT hold.
- (b) Pre-trends NOT parallel but post-trends parallel β pre-test fails, but PT still identifies ATT.
- (c) Pre-trends parallel but post-trends NOT parallel β pre-test passes, but DiD is biased.
- (d) Neither parallel.
The paper's central question: under what selection mechanisms is scenario (a) β joint parallelism β what we actually have, so that a passing pre-test is genuine evidence in favor of PT? Equivalently: when does scenario (c) β the "false-positive" case where pre-trends pass but PT fails β get ruled out by the selection mechanism?
π Main results
Result 1 β Selection on time-invariant unobservables only (Proposition 1)
Suppose units select on only fixed effects: \(\omega_i = (\alpha_i, \mu_i)\) where \(\mu_i\) is some other time-invariant characteristic. This is the classical case: job-training programs where selection is on permanent earnings (Ashenfelter-Card 1985), or state policies where selection is on time-invariant state characteristics.
Proposition 1: Assumptions PPT and PT hold jointly for all nondegenerate selection mechanisms \(g\) if and only if:
\[E[\varepsilon_{i1} \mid \alpha_i, \mu_i] = E[\varepsilon_{i0} \mid \alpha_i, \mu_i] = E[\varepsilon_{i,-1} \mid \alpha_i, \mu_i]\]This is a time-homogeneity assumption on the conditional mean of \(\varepsilon_{it}\) given \((\alpha_i, \mu_i)\). It is closely related to classical strict-exogeneity assumptions in fixed-effects models.
Takeaway: In settings where selection is on fixed effects (and absent structural breaks), Assumptions PPT and PT hold jointly under standard time-series restrictions. Pre-trends tests can be informative in this case. β
Result 2 β Selection on pre-treatment information (Proposition 2)
Suppose units select based on information available in the pre-treatment period: \(\omega_i = (\alpha_i, \varepsilon_i^0, \mu_i, \eta_i^0)\), plausible in many applications (e.g., job-training program selection based on pre-program earnings, or selection on the discounted sum of future earnings conditional on the pre-treatment information set).
Proposition 2: Assumptions PPT and PT hold jointly for all nondegenerate \(g\) if and only if:
\[\varepsilon_{i,-1} = \varepsilon_{i0} \text{ and } E[\varepsilon_{i1} \mid \alpha_i, \varepsilon_i^0, \mu_i, \eta_i^0] = \varepsilon_{i0}\]The condition \(\varepsilon_{i,-1} = \varepsilon_{i0}\) means there are no time-varying shocks in the pre-treatment period. This is highly restrictive: an AR(1) error \(\varepsilon_{it} = \rho \varepsilon_{i,t-1} + \zeta_{it}\) with \(\zeta_{it} \sim WN(0, \sigma^2)\) only satisfies this when \(\sigma^2 = 0\), i.e., the white-noise process is degenerate.
Takeaway: When selection is based on pre-treatment information, the joint parallel-trends conditions are very restrictive and likely fail in most applications. Pre-trend tests can therefore be uninformative about post-trends β and can lead to DiD designs being wrongly discarded. β
Result 3 β Selection on time-varying unobservables (Section II.C)
Even more restrictive. Consider the case where units select on the (known to them) treatment effect \(\tau_{i1} = Y_{i1}(1) - Y_{i1}(0)\). Necessary & sufficient conditions become:
\(E[\varepsilon_{i0} - \varepsilon_{i,-1} \mid \tau_{i1}] = 0\)\(E[\varepsilon_{i1} - \varepsilon_{i0} \mid \tau_{i1}] = 0\)
Under white-noise errors, condition (ii) generally fails since \(\tau_{i1}\) is typically a function of \(\varepsilon_{i1}\). Pre-trend tests can be misleading in this case. β
π Additional issues with time-varying covariates
Section III of the paper shows that pre-trends tests can also be misleading when researchers condition on pre-treatment values of time-varying covariates, even if those covariates are exogenous.
The model becomes \(Y_{it}(0) = \gamma_t(X_{it}) + \alpha_i + \lambda_t + \varepsilon_{it}\) with \(E[\varepsilon_{it} \mid X_i] = 0\). If we control only for the pre-treatment value \(X_i^0 = (X_{i,-1}, X_{i0})\) (the conditional analog of PT becomes PTX, and of PPT becomes PPTX), the necessary & sufficient conditions involve two equations β one for pre-trends, one for post-trends. The pre-trends condition involves only pre-period \(\varepsilon\)'s; the post-trends condition has an additional term:
\[0 = E[\varepsilon_{i1} - \varepsilon_{i0} \mid X_i^0, \omega_i] + E[\gamma_1(X_{i1}) \mid X_i^0, \omega_i] - E[\gamma_1(X_{i1}) \mid X_i^0]\]If \(X_{i1}\) is correlated with the fixed effect \(\alpha_i\) β which is precisely why one uses fixed effects in the first place β the additional term \(E[\gamma_1(X_{i1}) \mid X_i^0, \omega_i] - E[\gamma_1(X_{i1}) \mid X_i^0]\) need not vanish, so PTX can fail even when PPTX holds.
Mitigation: condition on the entire time series of covariates \(X_i = (X_{i,-1}, X_{i0}, X_{i1})\), not just the pre-period values. This requires the covariates to be exogenous to the treatment β a potentially strong assumption (see Caetano et al. 2022).
π οΈ Implications for practice (Section IV)
- Pre-trends tests can be uninformative even without structural breaks. The fact that pre-trends are flat does not by itself mean post-trends will be parallel β it depends on selection.
- Correctly interpreting pre-trends tests requires understanding the selection mechanism β i.e., why units chose treatment. Pre-trend tests cannot replace economic arguments for parallel post-trends.
- If units select on pre-treatment information (a common case), pre-trend tests are typically not informative, and researchers may wrongly discard valid DiD designs (Type-II error).
- Conditioning only on pre-treatment values of time-varying covariates can lead to PT failures even if pre-trends are parallel and covariates are exogenous. Solution: control for the entire time series of exogenous covariates.
π§ How this fits in the broader DiD literature
Three closely related papers triangulate the modern view on parallel trends:
- Roth (2022, AER: Insights) β "Pretest with Caution" β addresses the statistical problems with pre-tests: low power against the violations that matter; conditioning publication on passing a pre-test biases inference.
- Roth & Sant'Anna (2023, Econometrica) β "When Is Parallel Trends Sensitive to Functional Form?" β shows that parallel trends in levels vs. logs are non-nested identifying assumptions; the functional form is itself a researcher's choice.
- Ghanem, Sant'Anna & WΓΌthrich (2026, AEA P&P) β THIS PAPER β shows that the identification content of pre-trend tests depends on the selection mechanism. Pre-trends are informative only when selection is on fixed effects (under time-homogeneity).
The companion paper Ghanem-Sant'Anna-WΓΌthrich (2025) "Selection and Parallel Trends" (arXiv 2203.09001) provides the full theoretical machinery; the AEA P&P piece is the digestible summary of the main implications.
The natural complementary tools for handling situations where pre-tests are uninformative or post-trends may fail:
- Rambachan & Roth (2023, REStud) β Honest sensitivity bounds: report what post-treatment effects are consistent with a smoothness restriction on possible PT violations.
- Kwon & Roth (2024, AEA P&P) β Empirical Bayes approaches: shrink the PT violation toward a prior calibrated from pre-period data.
- Lu (2026, arXiv) β "In Defense of the Pre-Test" β modern pushback: argues conditional extrapolation makes the pre-test informative under transparent assumptions.
π₯ Read the paper
All versions of this paper:
- Local PDF (140 KB) β instant, no external request
- Sant'Anna's site (canonical version)
- AEA Papers & Proceedings journal landing page
- Companion paper: GSW (2025) "Selection and Parallel Trends" β the longer paper with full theory