Machine learning based statistical inference in sports analytics
A general framework for valid statistical inference in an interpretable semi-parametric model.
| Resource | Date | Link |
|---|---|---|
| Presentation given at the International Workshop on Statistical Modelling 2025 | 2025-07-16 | Presentation (IWSM 25) |
| Contribution to proceedings of the International Workshop on Statistical Modelling 2025 | 2024-07-15 | Short Paper (IWSM 25) |
| Talk presented at the Sports Analytics Workshop (SAW) 2025 at AUEB | 2025-05-06 | Presentation (SAW 25) |
Note
This project is intimately related to the project: rGAX: Rethinking goals above expectation (GAX). Furthermore, it also has some connections to the project HMMotion: Predicting coverage schemes in the NFL
Overview
Sports analytics, fueled by the recent availability of high-resolution tracking data, has experienced a surge in the use of advanced statistical and machine learning (ML) models. A key focus of these applications is identifying the factors that influence a game, for instance, identifying top players, predictors of injuries, or factors influencing the final score.
Commonly, the task of identifying influential factors is tackled by fitting machine learning models and analyzing traditional variable importance measures. Alternatively (or additionally), researchers may compare the predictive power of models with and without factors of interest to tackle this problem. However, there are various pitfalls in these approaches impeding interpretation of the results. More importantly, uncertainty quantification and valid statistical inference become challenging.
In this project, we use a well-established nonparametric independence test (the Generalised Covariance Measure (GCM) test, see Shah and Peters 2020) to obtain inference in a partially generalized linear model. This allows the identification of features that may aid in outcome prediction in an interpretable way, without making strong modeling assumptions but maintaining valid statistical inference. The framework has various applications ranging from identifying important factors for defensive coverage detection in the NFL to player evaluation in many sports by adapting existing and popular metrics such as goals above expectation (GAX), expected goals added (EGA, both soccer), expected points added (EPA, American football), shooter impact (SI, Basketball), and many more.