Clustering pass rush routes in the NFL
Analysing pass rushs by using a weighted curve clustering apporach.
Resources on the Project
| Resource | Date | Link |
|---|---|---|
| Preprint (last updated: 2025-02-28) | 2025 | Preprint |
| Presentation given at CMstatistics Conference 2023 | 2023-12-17 | Presentation (CMstat) |
| Presentation given at New England Symposium on Statistics in Sports (NESSIS) 2023 | 2023-09-23 | Presentation (NESSIS) |
| Presentation given for representatives of the Cleveland Browns Football Team | 2023-08-22 | Presentation (Browns) |
| Poster presented at the International Workshop on Statistical Modelling 2023 | 2023-07-18 | Poster (IWSM 23) |
| Contribution to proceedings of the International Workshop on Statistical Modelling 2023 | 2023-07-17 | Short Paper (IWSM 23) |
| Talk given at the research seminar (BBS) of the Institute for Statistics and Mathematics at WU Vienna | 2023-07-15 | Presentation (BBS 23) |
Overview
In this project, we present a weighted \(K\)-means approach for clustering weighted curves, i.e. curves which may be assigned weights at each observation of the curve. The methodology is applied to routes of defending players in American football (obtained from NFL big data bowl on Kaggle), where the aim is to automatically detect effective pass-rushing routes from specific players or teams. Each route of a defensive player can be assigned a weight, which at each time point represents the probability of pressuring the quarterback. The weights may be derived by a machine learning model of choice (gradient boosted trees are used in this work). Since pass rush routes vary in length due to the variability in the duration of each play, the weighted route data is first preprocessed using Bézier curves, such that each trajectory is of the same length. Results demonstrate that the methodology used is able to cluster pass-rushing routes effectively and much better than a classical (unweighted) K-means approach. The resulting clusters are finally used for various team and player analyses of pass rush plays.
Some further information
The project specifically focuses on pass rush in American football. However, in sports analytics, it is quite common to observe trajectories of players, and, depending on the use case, weights may be derived quite naturally in order to assign importance to the routes. Our methodology is suitable for any kind of such situation. Other use cases imply for example clustering possessions in soccer (see the project player evaluation in football (soccer) for more information on soccer possessions). In fact, clustering possessions in this way results in interesting insights and may even be combined with the player evaluation framework in order to derive even more valuable insight (this, however, is unpublished work in progress so far ). In a finance-related project with a large Austrian banking institution, Kurt and I used a similar methodology to cluster time series data of term structure products in order to enhance the risk management of non-mature deposits in high interest periods (as observed nowadays on the market).
More details on the approach can be found especially in the most recent presentations from above resources table. Currently, we are working on wrapping the code for the weighted \(K\)-means with all its computational finesses into an R-package, as for us it had already quite some use cases and thus it might be relevant for others as well. Furthermore, we plan on writing a paper about the pass rush application soon (edit: see preprint above).