metric | pre | post | HMM |
|---|---|---|---|
Accuracy | 0.8573 | 0.8837 | 0.8753 |
AUC | 0.8419 | 0.8907 | 0.8839 |
Loglik | -0.3834 | -0.3240 | -0.3335 |
color scale | worst models | medium models | best models |
Vienna University of Economics and Business
Jul 16, 2025
Sports analytics = Statistics + Sports Science:
How to identify important predictors of a sport-specific outcome?
Sports analytics = Statistics + Sports Science:
How to identify important predictors of a sport-specific outcome?
Machine learning models vs parametric models (linear/logistic regression):
Do HMM features help in defensive coverage prediction in the NFL?
Goal: predict coverage type (1: man coverage, 0: zone coverage) for a defense on a play based on pre-snap features (derived from tracking data)
3 types of features:
Typical approach for answering the question:
metric | pre | post | HMM |
|---|---|---|---|
Accuracy | 0.8573 | 0.8837 | 0.8753 |
AUC | 0.8419 | 0.8907 | 0.8839 |
Loglik | -0.3834 | -0.3240 | -0.3335 |
color scale | worst models | medium models | best models |
Problems:
How can we identify important predictors of an outcome?
Classical approach: logistic regression:
Partially linear logistic regression model (PLLM)
\[ \log\left(\frac{\pi(X,Z)}{1-\pi(X,Z)}\right) = X^{\top}\beta + g(Z) \]
Interested in testing \(H_0 : \beta = 0\)
Use Generalised Covariance Measure (GCM) test (Shah and Peters 2020) for inference
Proposition
Consider a PLLM and let \(X \in \mathbb{R}^{d_X}\) with \(\operatorname{Var}(X \mid Z)\) a.s. positive semidefinite. Then \[\beta = 0 \Leftrightarrow \mathbb{E}[(\operatorname{Cov}(Y,X \mid Z)] = 0\]
Takeaways:
Proposition
Consider a PLLM and let \(X \in \mathbb{R}^{d_X}\) with \(\operatorname{Var}(X \mid Z)\) a.s. positive semidefinite. Then \[\beta = 0 \Leftrightarrow \mathbb{E}[(\operatorname{Cov}(Y,X \mid Z)] = 0\]
Proposition
Consider a PLLM and let \(X \in \mathbb{R}\) with \(\operatorname{Var}(X \mid Z) > 0\) a.s. Then \[\operatorname{sign}(\beta) = \operatorname{sign}(\mathbb{E}[\operatorname{Cov}(Y,X \mid Z)])\]
Takeaways:
Proposition
Consider a PLLM and let \(X \in \mathbb{R}^{d_X}\) with \(\operatorname{Var}(X \mid Z)\) a.s. positive semidefinite. Then \[\beta = 0 \Leftrightarrow \mathbb{E}[(\operatorname{Cov}(Y,X \mid Z)] = 0\]
Proposition
Consider a PLLM and let \(X \in \mathbb{R}\) with \(\operatorname{Var}(X \mid Z) > 0\) a.s. Then \[\operatorname{sign}(\beta) = \operatorname{sign}(\mathbb{E}[\operatorname{Cov}(Y,X \mid Z)])\]
Takeaways:
Coverage prediction model:
Testing for predictive power of HMM features:
Identifying important factors for a sport-specific outcome is difficult:
Machine learning based statistical inference in a semiparametric model:
R-package comets (Kook 2025)Various use cases:
Thank you for your attention!
Generalised Covariance Measure:
\[ \operatorname{GCM} = \mathbb{E}[\operatorname{Cov}(Y,X \mid Z)] =\mathbb{E}[(Y - \mathbb{E}[Y | Z])(X - \mathbb{E}[X | Z])]\]
Basis for GCM test:
\[Y \perp\!\!\!\perp X \mid Z \Rightarrow \mathbb{E}[\operatorname{Cov}(Y,X \mid Z)] = 0\]
GCM test in practice:
Popular measure for shooting skill of players: Goals above expectation (GAX)
Solution: