The Best of Both Worlds: Predicting Football Coverages with Supervised and Unsupervised Learning
Rouven Michels
Bielefeld University & TU Dortmund University
joint work with Robert Bajons (WU Vienna) and Jan-Ole Koslik (Bielefeld University)
2025 New England Symposium on Statistics in Sports
Harvard University, Cambridge, September 27, 2025
➡ Set up a model that forecasts the defensive schema based on pre-snap motion.
Supervised Learning
Model defenders’ pre-snap player movements with a hidden Markov model, where the latent states represent the offensive players to be guarded.
Use summary statistics from the decoded state sequences as additional features to improve the predictive performance of the ML models.
NFL Big Data Bowl 2025 provided tracking data of the first 9 weeks of the 2023 season
After data cleaning (e.g., omitting plays with bunch formations), we focus on pass-plays with pre-snap motion
Features before motion include
Motion data include
Each \(X_t\) is generated by one of \(N\) state-dependent distributions \(f_1,\ldots,f_N\).
The state process \(S_t\) selects which of the \(N\) distributions is active at any time, with the state at time \(t\) only depending on the state at time \(t-1\).
This first-order Markov chain is described by the initial distribution \(\delta\) and a transition probability matrix (t.p.m.) \(\Gamma = \color{brown}{(\gamma_{ij})}\), which is to be estimated.
Guarding assignments cannot be observed but inferred from the defenders’ responses to the offensive players’ movements (Franks et al., 2015) with
\(X_t\): y-coordinate of the individual defender at time \(t\),
\(S_t\): offensive player \(j = 1, \ldots, 5\) guarded at time \(t\),
\(N = 5\): one of five ofensive players to be guarded
As the state-dependent distribution, we choose a Gaussian, i.e., \[f(X_t \mid S_t = \color{darkblue}{j}) = \mathcal{N}(\color{darkblue}{\mu_j}, \color{darkgreen}{\sigma^2}), \ j = 1, \ldots, 5, \ \text{with}\]
the mean \(\color{darkblue}{\mu_j = y_{\text{off}_{j, t}}}\) being the y-coordinate of the offensive player \(j\) at time \(t\),
the standard deviation \(\color{darkgreen}{\sigma}\) being estimated.
We expect defenders to trail offensive players, i.e., alter the state-dependent means as \[ \color{darkblue}{\mu_j = y}_{\color{darkblue}{\text{off}}_{\color{darkblue}{j,} \color{red}{t-l}}} \]
thus, allowing that the state process at time \(t\) depends on observations at \(\color{red}{t-l}\).
The HMM with lag \(l = 4\) yields the best fit according to AIC.
To account for heterogeneity across positions and teams, we include random effects into the transition probabilities \(\color{brown}{\gamma_{ij}}\) as
\[ \text{logit}(\color{brown}{\gamma_{ij}^{(p,t)}}) = \beta_{ij} + u_{ij}^{(\text{position}[p])} + v_{ij}^{(\text{team}[t])} \]
where
\(\beta_{ij}\) is a global fixed effect,
\(u_{ij}^{(\text{position}[p])} \sim \mathcal{N}(0, \sigma^2_{\text{position}})\) captures differences between defensive positions,
\(v_{ij}^{(\text{team}[t])} \sim \mathcal{N}(0, \sigma^2_{\text{team}})\) captures team-specific tendencies in defensive schemes.
We assume independence between defenders, i.e., multiply all time series’ likelihoods
to retain probabilistic information, we perform local decoding (Zucchini et al., 2016)
we construct the conditional distributions \(P(S_t = j \mid X_1, \ldots, X_T)\) for each \(t\).
this assigns for every defender and time point a guarding probability of offensive player \(j = 1, \ldots, 5\).
playId player p_off1 p_off2 p_off3 p_off4 p_off5
31 291 Marco Wilson 0 0.0001 0 0 0.9998
32 291 Marco Wilson 0 0.0001 0 0 0.9998
33 291 Marco Wilson 0 0.0002 0 0 0.9998
34 291 Marco Wilson 0 0.0002 0 0 0.9998
35 291 Marco Wilson 0 0.0002 0 0 0.9998
36 291 Marco Wilson 0 0.0002 0 0 0.9998
37 291 Marco Wilson 0 0.0001 0 0 0.9999
38 291 Marco Wilson 0 0.0001 0 0 0.9999
39 291 Marco Wilson 0 0.0001 0 0 0.9999
40 291 Marco Wilson 0 0.0001 0 0 0.9999
A caveat of the decoded state probabilities is the multi-dimensional time series structure, complicating their direct inclusion in the ML models. Thus, we
Here, higher entropy indicates greater unpredictability, which indicates less persistent guarding allocations (and vice versa).
If you…
have questions,
comments,
or want to go to the Patriots with me tomorrow,
tell me now or send me a message to rouven.michels@tu-dortmund.de!
Thank you for your attention!
Predicting Football Coverages with Supervised and Unsupervised Learning