The Best of Both Worlds: Predicting Football Coverages with Supervised and Unsupervised Learning


Rouven Michels
Bielefeld University & TU Dortmund University

joint work with Robert Bajons (WU Vienna) and Jan-Ole Koslik (Bielefeld University)


2025 New England Symposium on Statistics in Sports
Harvard University, Cambridge, September 27, 2025

  • Deciding between man and zone coverage is a crucial strategic decision.
  • Offenses try to decode these decisions by using pre-snap motion.

➡ Set up a model that forecasts the defensive schema based on pre-snap motion.

Roadmap

  1. Supervised Learning

    1. Predict man/zone coverage based on features before motion with ML models.
    1. Predict man/zone coverage based on naive post-motion features with ML models.
  1. Unsupervised Learning
    1. Model defenders’ pre-snap player movements with a hidden Markov model, where the latent states represent the offensive players to be guarded.

    2. Use summary statistics from the decoded state sequences as additional features to improve the predictive performance of the ML models.

Data

Data

  • NFL Big Data Bowl 2025 provided tracking data of the first 9 weeks of the 2023 season

  • After data cleaning (e.g., omitting plays with bunch formations), we focus on pass-plays with pre-snap motion

    • in our data, defenses play \(\approx 74\%\) in zone coverage, \(\approx 26\%\) in man; per PFF

Features before motion include

  • play-by-play information such as down and yards to go,
  • convex hulls spanned by both teams,
  • \((x,y)\)-coordinates of the five most relevant players per team.

Motion data include

  • approx. 1,200,000 \((x,y)\)-coordinates of the five most relevant players per team.

Hidden Markov Model

Hidden Markov Model

functionoftheunknownparameters θ ðÞ giventheobservationsequence x 1 , ... , x T ðÞ canbecalculatedatacomputationalcostthatis(only)linearin T .Theparametervector θ ,whichistobeestimated,containsanyunknownparametersembeddedinthethreemodel-defningcomponents δ , Γ and P x t ðÞ .MadepossiblebytherelativelysimpledependencestructureofanHMM,theforwardalgorithmtraversesalongthetimeseries,updatingthelikelihoodstep-by-stepwhileretaininginforma-tionontheprobabilitiesofbeinginthedifferentstates(Zuc-chini etal .,2016,pp.37 39).Applicationoftheforwardalgorithmisequivalenttoevaluatingthelikelihoodusingasimplematrixproductexpression, L ð θ j x 1 , ::: , x T Þ¼ δ P x 1 ðÞ Γ P x 2 ðÞ Γ P x T 1 ðÞ Γ P x T ðÞ 1 ,(1)where 1 isacolumnvectorofones(seeSupplementaryTuto-rialfortechnicalderivation).Inpractice,themainchallengewhenworkingwithHMMstendstobetheestimationofthemodelparameters.ThetwomainstrategiesforfttinganHMMarenumericalmaximisationofthelikelihood(Myung,2003;Zucchini etal .,2016)orBaye-sianinference(Ellison,2004;Gelman etal .,2004)usingMarkovchainMonteCarlo(MCMC)sampling(Brooks etal .,2011).Theformerseekstoidentifytheparametervaluesthatmaximisethelikelihoodfunction(i.e.themaximumlikelihoodestimates θ ),whereasthelatteryieldsasamplefromtheposteriordistri-butionoftheparameters(Ellison,2004).Specifcallyforthemaximumlikelihood(ML)approach,theforwardalgorithmmakesitpossibletousestandardoptimisationmethods(Fletcher,2013)todirectlynumericallymaximisethelikelihood(eqn1).AnalternativeMLapproachistoemployanexpecta-tion maximisation(EM)algorithmthatusessimilarrecursivetechniquestoiteratebetween statedecoding andupdatingtheparametervectoruntilconvergence(Rabiner,1989).ForMCMC,manydifferentstrategiescanbeused,butthesetendtodifferinappropriatenessandeffciencyinamannerthatcanstronglydependonthespecifcmodelanddataathand(Gilks etal .,1996;Gelman etal .,2004;Brooks etal .,2011;RobertandCasella,2004).Theforwardalgorithmandsimilarrecursivetechniquescanfurtherbeusedforforecastingandstatedecoding,aswellastoconductformalmodelcheckingusingpseudo-residuals(Zuc-chini etal .,2016,Chapters5&6).Statedecodingisusuallyaccomplishedusingthe Viterbialgorithm orthe forward back-wardalgorithm (alsoknownas smoothing ),whichrespectivelyidentifythemostlikelysequenceofstatesortheprobabilityofeachstateatanytime t ,conditionalontheobservations.Fortu-nately,practitionerscanoftenuseexistingsoftwareformostaspectsofHMM-baseddataanalysesandneednotdwellonmanyofthemoretechnicaldetailsofimplementation(seeIMPLEMENTATION,CHALLENGESANDPITFALLSandSupplementaryTutorial).Toillustratesomeofthebasicmechanics,weuseasimpleexamplebasedonobservationsofthefeedingbehaviourofabluewhale( Balaenopteramusculus ;cf.DeRuiter etal .,2017).Supposeweassumethatobservationsofthenumberoffeedinglungesperformedineachof T ¼ 53consecutivedives( X t 0,1,2, ... fg for t ¼ 1, ... , T )arisefrom N ¼ 2statesoffeed-ingactivity.BuildingonFig.2,wecouldforexamplehave:Fig.3displaystheresultsforthissimpletwo-stateHMMassumingPoissonstate-dependent(observation)distributions, X t j S t ¼ i Poisson λ i ðÞ for i 1,2 fg ,whenfttedtothefullobservationsequenceviadirectnumericalmaximisationofeqn1.Theratesofthestate-dependentdistributionswereesti-matedas ^ λ 1 ¼ 0 : 05and ^ λ 2 ¼ 2 : 82,suggestingstates1and2correspondto‘low’and‘high’feedingactivityrespectively.Theestimatedstatetransitionprobabilitymatrix,suggestsinterspersedboutsof‘low’and‘high’feedingactivity,butwithboutsof‘high’activitytendingtospanfewerdives.Theestimatedinitialdistribution ^ δ ¼ 0 : 75,0 : 25 ðÞ suggeststhisindividualwasmorelikelytohavebeeninthe‘low’activitystateatthestartofthesequence.MostecologicalapplicationsofHMMsinvolvemorecomplexinferencesrelatedtospecifchypothesesaboutsystemstatedynamics,andagreatstrengthoftheHMMframeworkistherelativeeasewithwhichthebasicmodelformulationcanbemodifedtodescribeawidevarietyofprocesses(Zucchini etal .,2016,Chapters9 13).Nextwehighlightsomeextensionsthatweconsidertobehighlyrelevantinecologicalresearch. Extensions ThedependenceassumptionsmadewithinthebasicHMMaremathematicallyconvenient,butnotalwaysappropriate(seeBox2).TheMarkovpropertyimpliesthattheamountoftimespentinastatebeforeswitchingtoanotherstate theso-called sojourntime followsageometricdistribution.Themostlikely Figure2 DependencestructureofabasichiddenMarkovmodel,withanobservedsequence X 1 , ... , X T arisingfromanunobservedsequenceofunderlyingstates S 1 , ... , S T . Published2020.ThisarticleisaU.S.GovernmentworkandisinthepublicdomainintheUSA.EcologyLetterspublishedbyJohnWiley&SonsLtd. ReviewAndSynthesisHiddenMarkovmodelsforecology 1883
  1. Each \(X_t\) is generated by one of \(N\) state-dependent distributions \(f_1,\ldots,f_N\).

  2. The state process \(S_t\) selects which of the \(N\) distributions is active at any time, with the state at time \(t\) only depending on the state at time \(t-1\).

This first-order Markov chain is described by the initial distribution \(\delta\) and a transition probability matrix (t.p.m.) \(\Gamma = \color{brown}{(\gamma_{ij})}\), which is to be estimated.


5-State HMM formulation

Guarding assignments cannot be observed but inferred from the defenders’ responses to the offensive players’ movements (Franks et al., 2015) with

  • \(X_t\): y-coordinate of the individual defender at time \(t\),

  • \(S_t\): offensive player \(j = 1, \ldots, 5\) guarded at time \(t\),

  • \(N = 5\): one of five ofensive players to be guarded

As the state-dependent distribution, we choose a Gaussian, i.e., \[f(X_t \mid S_t = \color{darkblue}{j}) = \mathcal{N}(\color{darkblue}{\mu_j}, \color{darkgreen}{\sigma^2}), \ j = 1, \ldots, 5, \ \text{with}\]

  • the mean \(\color{darkblue}{\mu_j = y_{\text{off}_{j, t}}}\) being the y-coordinate of the offensive player \(j\) at time \(t\),

  • the standard deviation \(\color{darkgreen}{\sigma}\) being estimated.