A Weighted Curve Clustering Approach for Analyzing Pass Rush Routes in American Football

Robert Bajons¹
robert.bajons@wu.ac.at

Kurt Hornik¹

¹ Institute for Statistics and Mathematics, Vienna University of Economics and Business

Introduction

In team sports players naturally move on the pitch in specific trajectories.
Studying common patterns in these movements, can reveal interesting insights into player strengths and team tactics.
Notable examples in Sports: Clustering player trajectories in Basketball (Miller and Bornn (2017)), clustering routes of wide receivers in football (Chu et al. (2020)).
This work: Clustering weighted player trajectories in American football, i.e. curves, that at each point in time can be assigned weights.

Data and Preprocessing

Use player tracking data of all matches of the first half of 2021, obtained from NFL Big Data Bowl in Kaggle.
Add pressure weights to each route of defensive players, using a machine learning model of choice (e.g. gradient boosted trees), see Figure 1.

Figure 1: Left: Routes of a sample play. Right: The yellow box; pass rushing routes with weights.

Formally: Data \(\boldsymbol{Y} = \{\boldsymbol{y}_1,\dots,\boldsymbol{y}_n\}\), with \(\boldsymbol{y}_i = (x_i,y_i,w_i)\) an \(m_i \times 3\) dimensional matrix (\(x\)-coordinates, \(y\)-coordinates, weights vector \(w\)) \(\Rightarrow\) \(m_i\) different for each \(\boldsymbol{y}_i\)!
To use \(K\)-means: Unitize data to same length \(\Rightarrow\) approximate each curves by a Bézier curve evaluated at a fixed number \(M\) of points and aggregate weights accordingly.

Methodology

Weighted \(K\)-means for Curves

Adapt classical \(K\)-means algorithm (see e.g. Hastie, Tibshirani, and Friedman (2009)) to account for weighted observations.
Transform observations from preprocessing: \(\boldsymbol{y}_i \in \mathbb{R}^{M \times 3}\) is split up in vectors \(\boldsymbol{z}_i, \boldsymbol{w}_i \in \mathbb{R}^{2M}\).

\[\boldsymbol{z}_i = (x_1,\dots,x_M,y_1,\dots,y_M)\] \[\boldsymbol{w}_i = (w_1,\dots,w_M,w_1,\dots,w_M)\]

Find clusters and prototypes such that:

\[\min_{(p_{jk}),(g_i)}\sum_{k = 1}^K \sum_{i:g_i = k} \sum_{j = 1}^{\tilde{M}} w_{i,j}(z_{i,j}-p_{k,j})^2.\]

Iteratively alternating between finding the optimal prototypes for a given cluster and finding the optimal cluster assignments given prototypes.

Algorithm

Initialize appropriate starting assignments of clusters.
For given cluster assignment find the optimal prototypes:

\[p_{k,j} = \frac{\sum_{i:g_i = k} w_{i,j}z_{i,j}}{\sum_{i:g_i = k} w_{i,j}}\]

Given prototypes find optimal cluster assignments by minimizing

\[\sum_{j = 1}^M w_{i,j}(z_{i,j}-p_{k,j})^2\] over \(k\).

Repeat steps 2-3 until convergence, i.e. until change in function to optimize is below some tolerance level.

Results

Figure 2: Left: Defensive routes clustered with weighted \(K\)-means. Right: Defensive routes clustered by classical \(K\)-means.

Main Findings

Weighted \(K\)-means algorithm is able to distinguish between important and irrelevant clusters, see Figure 2.
Classical \(K\)-means results in 2 pass rush (important) clusters and 9 coverage (irrelevant) clusters. Weighted \(K\)-means algorithm results in only 3 coverage (irrelevant) clusters (cluster 2, cluster 3, \(\color{#79dc4e}{\text{cluster 4}}\) of Figure 2).
Distinguishing and analyzing pass rush routes only possible with weighted approach.

Use Cases

Analysing specific players.

Pass rush play database for coaches

Coach wants to prepare for a game:
- Analyse specific defensive player of opponent team.
- Analyse specific strengths of opponent team.
2 options for coach:
- Go through all of the videotapes of the season to see how opponent player/team handled situations.
- Filter database based on specific cluster routes of interest and only analyse relevant tapes.
Example:
- Analyse strong pass rusher who generally plays on the left but is used flexible and also strong on the right side.
- Filter for strong right side cluster plays of player and analyse only relevant tapes (filter for pressure as well).

References

Chu, Dani, Matthew Reyers, James Thomson, and Lucas Yifan Wu. 2020. “An Application of Model-Based Curve Clustering Using the EM Algorithm.” Journal of Quantitative Analysis in Sports 16 (2): 121–32. https://doi.org/doi:10.1515/jqas-2019-0047.

Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer Series in Statistics. Springer New York. https://books.google.at/books?id=tVIjmNS3Ob8C.

Miller, A. C., and L. Bornn. 2017. “Possession Sketches : Mapping Nba Strategies.” In Proceedings of the 2017 MIT Sloan Sports Analytics Conference.

In American football clustering of pass rush routes is much more effective using a weighted K-means approach instead of a classical K-means approach.