The Best of Both Worlds: Predicting Football Coverages with Supervised and Unsupervised Learning


Rouven Michels
Bielefeld University & TU Dortmund University

joint work with Robert Bajons (WU Vienna) and Jan-Ole Koslik (Bielefeld University)


2025 New England Symposium on Statistics in Sports
Harvard University, Cambridge, September 27, 2025

  • Deciding between man and zone coverage is a crucial strategic decision.
  • Offenses try to decode these decisions by using pre-snap motion.

➡ Set up a model that forecasts the defensive schema based on pre-snap motion.

Roadmap

  1. Supervised Learning

    1. Predict man/zone coverage based on features before motion with ML models.
    1. Predict man/zone coverage based on naive post-motion features with ML models.
  1. Unsupervised Learning
    1. Model defenders’ pre-snap player movements with a hidden Markov model, where the latent states represent the offensive players to be guarded.

    2. Use summary statistics from the decoded state sequences as additional features to improve the predictive performance of the ML models.

Data

Data

  • NFL Big Data Bowl 2025 provided tracking data of the first 9 weeks of the 2023 season

  • After data cleaning (e.g., omitting plays with bunch formations), we focus on pass-plays with pre-snap motion

    • in our data, defenses play \(\approx 74\%\) in zone coverage, \(\approx 26\%\) in man; per PFF

Features before motion include

  • play-by-play information such as down and yards to go,
  • convex hulls spanned by both teams,
  • \((x,y)\)-coordinates of the five most relevant players per team.

Motion data include

  • approx. 1,200,000 \((x,y)\)-coordinates of the five most relevant players per team.

Hidden Markov Model

Hidden Markov Model

functionoftheunknownparameters θ ðÞ giventheobservationsequence x 1 , ... , x T ðÞ canbecalculatedatacomputationalcostthatis(only)linearin T .Theparametervector θ ,whichistobeestimated,containsanyunknownparametersembeddedinthethreemodel-defningcomponents δ , Γ and P x t ðÞ .MadepossiblebytherelativelysimpledependencestructureofanHMM,theforwardalgorithmtraversesalongthetimeseries,updatingthelikelihoodstep-by-stepwhileretaininginforma-tionontheprobabilitiesofbeinginthedifferentstates(Zuc-chini etal .,2016,pp.37 39).Applicationoftheforwardalgorithmisequivalenttoevaluatingthelikelihoodusingasimplematrixproductexpression, L ð θ j x 1 , ::: , x T Þ¼ δ P x 1 ðÞ Γ P x 2 ðÞ Γ P x T 1 ðÞ Γ P x T ðÞ 1 ,(1)where 1 isacolumnvectorofones(seeSupplementaryTuto-rialfortechnicalderivation).Inpractice,themainchallengewhenworkingwithHMMstendstobetheestimationofthemodelparameters.ThetwomainstrategiesforfttinganHMMarenumericalmaximisationofthelikelihood(Myung,2003;Zucchini etal .,2016)orBaye-sianinference(Ellison,2004;Gelman etal .,2004)usingMarkovchainMonteCarlo(MCMC)sampling(Brooks etal .,2011).Theformerseekstoidentifytheparametervaluesthatmaximisethelikelihoodfunction(i.e.themaximumlikelihoodestimates θ ),whereasthelatteryieldsasamplefromtheposteriordistri-butionoftheparameters(Ellison,2004).Specifcallyforthemaximumlikelihood(ML)approach,theforwardalgorithmmakesitpossibletousestandardoptimisationmethods(Fletcher,2013)todirectlynumericallymaximisethelikelihood(eqn1).AnalternativeMLapproachistoemployanexpecta-tion maximisation(EM)algorithmthatusessimilarrecursivetechniquestoiteratebetween statedecoding andupdatingtheparametervectoruntilconvergence(Rabiner,1989).ForMCMC,manydifferentstrategiescanbeused,butthesetendtodifferinappropriatenessandeffciencyinamannerthatcanstronglydependonthespecifcmodelanddataathand(Gilks etal .,1996;Gelman etal .,2004;Brooks etal .,2011;RobertandCasella,2004).Theforwardalgorithmandsimilarrecursivetechniquescanfurtherbeusedforforecastingandstatedecoding,aswellastoconductformalmodelcheckingusingpseudo-residuals(Zuc-chini etal .,2016,Chapters5&6).Statedecodingisusuallyaccomplishedusingthe Viterbialgorithm orthe forward back-wardalgorithm (alsoknownas smoothing ),whichrespectivelyidentifythemostlikelysequenceofstatesortheprobabilityofeachstateatanytime t ,conditionalontheobservations.Fortu-nately,practitionerscanoftenuseexistingsoftwareformostaspectsofHMM-baseddataanalysesandneednotdwellonmanyofthemoretechnicaldetailsofimplementation(seeIMPLEMENTATION,CHALLENGESANDPITFALLSandSupplementaryTutorial).Toillustratesomeofthebasicmechanics,weuseasimpleexamplebasedonobservationsofthefeedingbehaviourofabluewhale( Balaenopteramusculus ;cf.DeRuiter etal .,2017).Supposeweassumethatobservationsofthenumberoffeedinglungesperformedineachof T ¼ 53consecutivedives( X t 0,1,2, ... fg for t ¼ 1, ... , T )arisefrom N ¼ 2statesoffeed-ingactivity.BuildingonFig.2,wecouldforexamplehave:Fig.3displaystheresultsforthissimpletwo-stateHMMassumingPoissonstate-dependent(observation)distributions, X t j S t ¼ i Poisson λ i ðÞ for i 1,2 fg ,whenfttedtothefullobservationsequenceviadirectnumericalmaximisationofeqn1.Theratesofthestate-dependentdistributionswereesti-matedas ^ λ 1 ¼ 0 : 05and ^ λ 2 ¼ 2 : 82,suggestingstates1and2correspondto‘low’and‘high’feedingactivityrespectively.Theestimatedstatetransitionprobabilitymatrix,suggestsinterspersedboutsof‘low’and‘high’feedingactivity,butwithboutsof‘high’activitytendingtospanfewerdives.Theestimatedinitialdistribution ^ δ ¼ 0 : 75,0 : 25 ðÞ suggeststhisindividualwasmorelikelytohavebeeninthe‘low’activitystateatthestartofthesequence.MostecologicalapplicationsofHMMsinvolvemorecomplexinferencesrelatedtospecifchypothesesaboutsystemstatedynamics,andagreatstrengthoftheHMMframeworkistherelativeeasewithwhichthebasicmodelformulationcanbemodifedtodescribeawidevarietyofprocesses(Zucchini etal .,2016,Chapters9 13).Nextwehighlightsomeextensionsthatweconsidertobehighlyrelevantinecologicalresearch. Extensions ThedependenceassumptionsmadewithinthebasicHMMaremathematicallyconvenient,butnotalwaysappropriate(seeBox2).TheMarkovpropertyimpliesthattheamountoftimespentinastatebeforeswitchingtoanotherstate theso-called sojourntime followsageometricdistribution.Themostlikely Figure2 DependencestructureofabasichiddenMarkovmodel,withanobservedsequence X 1 , ... , X T arisingfromanunobservedsequenceofunderlyingstates S 1 , ... , S T . Published2020.ThisarticleisaU.S.GovernmentworkandisinthepublicdomainintheUSA.EcologyLetterspublishedbyJohnWiley&SonsLtd. ReviewAndSynthesisHiddenMarkovmodelsforecology 1883
  1. Each \(X_t\) is generated by one of \(N\) state-dependent distributions \(f_1,\ldots,f_N\).

  2. The state process \(S_t\) selects which of the \(N\) distributions is active at any time, with the state at time \(t\) only depending on the state at time \(t-1\).

This first-order Markov chain is described by the initial distribution \(\delta\) and a transition probability matrix (t.p.m.) \(\Gamma = \color{brown}{(\gamma_{ij})}\), which is to be estimated.


5-State HMM formulation

Guarding assignments cannot be observed but inferred from the defenders’ responses to the offensive players’ movements (Franks et al., 2015) with

  • \(X_t\): y-coordinate of the individual defender at time \(t\),

  • \(S_t\): offensive player \(j = 1, \ldots, 5\) guarded at time \(t\),

  • \(N = 5\): one of five ofensive players to be guarded

As the state-dependent distribution, we choose a Gaussian, i.e., \[f(X_t \mid S_t = \color{darkblue}{j}) = \mathcal{N}(\color{darkblue}{\mu_j}, \color{darkgreen}{\sigma^2}), \ j = 1, \ldots, 5, \ \text{with}\]

  • the mean \(\color{darkblue}{\mu_j = y_{\text{off}_{j, t}}}\) being the y-coordinate of the offensive player \(j\) at time \(t\),

  • the standard deviation \(\color{darkgreen}{\sigma}\) being estimated.

Lagged observations within the HMM

We expect defenders to trail offensive players, i.e., alter the state-dependent means as \[ \color{darkblue}{\mu_j = y}_{\color{darkblue}{\text{off}}_{\color{darkblue}{j,} \color{red}{t-l}}} \]

thus, allowing that the state process at time \(t\) depends on observations at \(\color{red}{t-l}\).

The HMM with lag \(l = 4\) yields the best fit according to AIC.

Including random effects in transition probabilities

To account for heterogeneity across positions and teams, we include random effects into the transition probabilities \(\color{brown}{\gamma_{ij}}\) as

\[ \text{logit}(\color{brown}{\gamma_{ij}^{(p,t)}}) = \beta_{ij} + u_{ij}^{(\text{position}[p])} + v_{ij}^{(\text{team}[t])} \]

where

  • \(\beta_{ij}\) is a global fixed effect,

  • \(u_{ij}^{(\text{position}[p])} \sim \mathcal{N}(0, \sigma^2_{\text{position}})\) captures differences between defensive positions,

  • \(v_{ij}^{(\text{team}[t])} \sim \mathcal{N}(0, \sigma^2_{\text{team}})\) captures team-specific tendencies in defensive schemes.

Model fitting and state decoding

We assume independence between defenders, i.e., multiply all time series’ likelihoods

  • for direct numerical likelihood maximization, we use the \(\texttt{LaMa}\) (Koslik, 2025) and \(\texttt{RTMB}\) (Kristensen, 2025) packages in R.

  • to retain probabilistic information, we perform local decoding (Zucchini et al., 2016)

  • we construct the conditional distributions \(P(S_t = j \mid X_1, \ldots, X_T)\) for each \(t\).

  • this assigns for every defender and time point a guarding probability of offensive player \(j = 1, \ldots, 5\).

Results

HMM random effects results

A resulting example t.p.m. is \begin{bmatrix}0.9883 & 0.0029 & 0.0029 & 0.0029 & 0.0029 \\ 0.0029 & 0.9883 & 0.0029 & 0.0029 & 0.0029 \\ 0.0029 & 0.0029 & 0.9883 & 0.0029 & 0.0029 \\ 0.0029 & 0.0029 & 0.0029 & 0.9883 & 0.0029 \\ 0.0029 & 0.0029 & 0.0029 & 0.0029 & 0.9883\end{bmatrix}

Example decoded state sequence

   playId       player p_off1 p_off2 p_off3 p_off4 p_off5
31    291 Marco Wilson      0 0.0001      0      0 0.9998
32    291 Marco Wilson      0 0.0001      0      0 0.9998
33    291 Marco Wilson      0 0.0002      0      0 0.9998
34    291 Marco Wilson      0 0.0002      0      0 0.9998
35    291 Marco Wilson      0 0.0002      0      0 0.9998
36    291 Marco Wilson      0 0.0002      0      0 0.9998
37    291 Marco Wilson      0 0.0001      0      0 0.9999
38    291 Marco Wilson      0 0.0001      0      0 0.9999
39    291 Marco Wilson      0 0.0001      0      0 0.9999
40    291 Marco Wilson      0 0.0001      0      0 0.9999

Deriving features from decoded states

A caveat of the decoded state probabilities is the multi-dimensional time series structure, complicating their direct inclusion in the ML models. Thus, we

  1. compute for each defender and time \(t\) the most likely guarded offensive player and
    • the sum of state switches per play
    • the number of defenders that switch offensive players during a play,
  1. calculate the mean entropy across defenders by using the full decoded state probabilities, thus propagating uncertainty.

Here, higher entropy indicates greater unpredictability, which indicates less persistent guarding allocations (and vice versa).

Supervised Learning - Accuracy

Supervised Learning - AUC

Supervised Learning - negative logloss

Significance test I

Significance test II

Team analysis

Discussion

Summary

  • Modeling pre-snap motion via HMMs significantly improved the coverage prediction.
  • Random effects give insight into assignments of different positions and teams.

Outlook

  • Include defenders’ x-coordinates to use information of press- vs. off-coverage.
  • Model dependencies between defenders via multivariate Normal (or Copulas)
  • Current models strongly limited by small sample size (the number of plays), but HMM framework is modular, thus extendable to more complex models.
  • Employ mixtures of HMMs to decode schemes that include both man- and zone-coverage (see Groom et al., 2024 for a similar approach in soccer).

References

Franks, A., Miller, A., Bornn, L., & Goldsberry, K. (2015). Characterizing the spatial structure of defensive skill in professional basketball. The Annals of Applied Statistics, 9(1), 94–121.
Groom, S., Morris, D., Anderson, L., & Wang, S. (2024). Modeling defensive dynamics in football: A hidden markov model-based approach for man-marking and zonal defending corner analysis. The 2nd International Workshop on Intelligent Technologies for Precision Sports Science (IT4PSS) in Conjunction with the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24). https://wasn.csie.ncu.edu.tw/workshop/IT4PSS2024.html
Koslik, J.-O. (2025). LaMa: Fast numerical maximum likelihood estimation for latent markov models. R Package Version, 2(5).
Kristensen, K. (2025). RTMB: ’R’ bindings for ’TMB’. https://github.com/kaskr/rtmb
Zucchini, W., MacDonald, I. L., & Langrock, R. (2016). Hidden markov Models for Time Series: An Introduction Using R. Chapman; Hall/CRC. https://doi.org/10.1201/b20790

If you…

  • have questions,

  • comments,

  • or want to go to the Patriots with me tomorrow,

QR


tell me now or send me a message to rouven.michels@tu-dortmund.de!


Thank you for your attention!