The Best of Both Worlds: Predicting Football Coverages with Supervised and Unsupervised Learning
Rouven Michels
Bielefeld University & TU Dortmund University
joint work with Robert Bajons (WU Vienna) and Jan-Ole Koslik (Bielefeld University)
2025 New England Symposium on Statistics in Sports
Harvard University, Cambridge, September 27, 2025
➡ Set up a model that forecasts the defensive schema based on pre-snap motion.
Supervised Learning
Model defenders’ pre-snap player movements with a hidden Markov model, where the latent states represent the offensive players to be guarded.
Use summary statistics from the decoded state sequences as additional features to improve the predictive performance of the ML models.
NFL Big Data Bowl 2025 provided tracking data of the first 9 weeks of the 2023 season
After data cleaning (e.g., omitting plays with bunch formations), we focus on pass-plays with pre-snap motion
Features before motion include
Motion data include
Each \(X_t\) is generated by one of \(N\) state-dependent distributions \(f_1,\ldots,f_N\).
The state process \(S_t\) selects which of the \(N\) distributions is active at any time, with the state at time \(t\) only depending on the state at time \(t-1\).
This first-order Markov chain is described by the initial distribution \(\delta\) and a transition probability matrix (t.p.m.) \(\Gamma = \color{brown}{(\gamma_{ij})}\), which is to be estimated.
Guarding assignments cannot be observed but inferred from the defenders’ responses to the offensive players’ movements (Franks et al., 2015) with
\(X_t\): y-coordinate of the individual defender at time \(t\),
\(S_t\): offensive player \(j = 1, \ldots, 5\) guarded at time \(t\),
\(N = 5\): one of five ofensive players to be guarded
As the state-dependent distribution, we choose a Gaussian, i.e., \[f(X_t \mid S_t = \color{darkblue}{j}) = \mathcal{N}(\color{darkblue}{\mu_j}, \color{darkgreen}{\sigma^2}), \ j = 1, \ldots, 5, \ \text{with}\]
the mean \(\color{darkblue}{\mu_j = y_{\text{off}_{j, t}}}\) being the y-coordinate of the offensive player \(j\) at time \(t\),
the standard deviation \(\color{darkgreen}{\sigma}\) being estimated.