class: center, middle, inverse, title-slide .title[ # Curve Clustering Methods and their Applications to Sports Analytics ] .author[ ### Robert Bajons, joint work with Kurt Hornik ] .institute[ ### BBS 2023 ] .date[ ### 2023-07-25 ] --- class: inverse, center, middle # Motivation --- # Motivation <center> <img src="play1.gif" width=800,height = 300> </center> --- # Motivation Routes in the NFL. <img src="BBS_Bajons_files/figure-html/plot1-1.png" width="1250px" height="433px" style="display: block; margin: auto;" /> --- # Motivation Routes in the NFL: focus on pass rush. <img src="BBS_Bajons_files/figure-html/plot1.5-1.png" width="1250px" height="433px" style="display: block; margin: auto;" /> --- # Motivation Pass rush in the NFL. <img src="BBS_Bajons_files/figure-html/plot2-1.png" width="576" style="display: block; margin: auto;" /> --- # Motivation Pass rush in the NFL: add pressure weights. <img src="BBS_Bajons_files/figure-html/plot2.5-1.png" width="576" style="display: block; margin: auto;" /> --- # Motivation .pull-left[ - Routes or curves in sports analysis are quite common. - In many problems trajectories come as weighted observations. - Analyse weighted curves instead of only curves. - Clustering algorithm for weighted curves needed. ] -- .pull-right[ - Examples: Pass rush in the NFL, possession in soccer. - Derive pressure weight models. - Weights helpful to distinguish between *useful* routes and *worthless* routes. - Possible advantage/strength: Being able to analyze strong players/teams in more detail. ] --- class: inverse, center, middle # Methodology --- # Formal Description of the Problem - Observed data `\(\boldsymbol{Y} = \{\boldsymbol{y}_1,\dots,\boldsymbol{y}_n\}\)` of `\(n\)` curves. - Each `\(\boldsymbol{y_i}\)` is `\(m_i \times 3\)` dimensional matrix, `\((x,y)\)`-coordinates and weights `\(w\)`. - `\(m_i\)` not fixed but varies for each observation. (E.g.: routes in NFL dependent on length of play.) -- **Goal:** Cluster routes, i.e. `\((x,y)\)` coordinates by accounting for the information from the weights `\(w\)`. -- - Initial idea: Use a `\(K\)`-means type of algorithm for clustering. - Adjust usual procedure by deriving a weighted `\(K\)`-means method. - Data preprocessing necessary to use `\(K\)`-means algorithm. --- # Bézier Curves .pull-left[ - Parametric curves defined on `\(t \in [0,1]\)` by control points `\(\boldsymbol{\theta} = (\theta_0,\dots,\theta_p)\)` using Bernstein polynomials `\(b_p^P(t) = \binom{P}{p}t^p(1-t)^{P-p}\)`, `\(p = 0,\dots,P\)`. `$$B(t,\boldsymbol{\theta}) = \sum_{p = 0}^P \theta_p b_p^P(t), \quad t \in [0,1]$$` - Control points form a polygon and Bézier curve defined by them lies in convex hull of the points. ] .pull-right[ <img src="BBS_Bajons_files/figure-html/plotBC-1.png" width="576" style="display: block; margin: auto;" /> ] --- # Preprocessing Data - The trajectory of each observation `\(\boldsymbol{y}_i\)`, is represented by a 2-dimensional Bézier curve: `$$\boldsymbol{y}_i(t) = \sum_{i = 1}^{m_i} b_p^P(t)(x_i,y_i)$$` - Control vector `\(\boldsymbol{\theta}\)` is given by `\((x,y)\)` points observed from `\(\boldsymbol{y}_i\)`. - In theory any form of interpolation between the points could be used. Advantages of Bézier curves: - Natural starting ( `\(t = 0\)` ) and end point ( `\(t = 1\)` ). - `\(\Rightarrow\)` Bézier curves allow for a very general smooth representation of the curve (helpful in the context of football routes). - Flexible usage for curves of different lengths. - Usage for `\(K\)`-means clustering: - Use prespecified number of `\(M\)` equidistant points on above curve to represent `\(\boldsymbol{y}_i\)`, such that each observation is of dimension `\(M \times 2\)`. - Aggregate weights to get `\(M\)` new weights by considering proximity and order of points. --- # From `\(K\)`-means to weighted `\(K\)`-means .panelset[ .panel[.panel-name[K-means] - Given a set of observations `\(\boldsymbol{Y}\)`, find a partition into `\(K\)` clusters by minimizing the within cluster sum of squares `$$S = \sum_{k = 1}^K \sum_{i:g_i = k} (x_i-p_k)^2$$` - Prototypes (centers) `\(p_k\)` are given as cluster means: `$$p_k = \frac{1}{N_k}\sum_{i:g_i = k} x_i$$` - Basic implementation: Use an iterative refinement procedure, which alternates between an assignment step and cluster update step. ] .panel[.panel-name[weighted K-means] - Given a set of observations `\(\boldsymbol{Y}\)`, where each `\(y_i \in \boldsymbol{Y}\)` can be assigned weights `\(w_i\)`, find a partition into `\(K\)` clusters by minimizing the weighted within cluster sum of squares `$$S_w = \sum_{k = 1}^K \sum_{i:g_i = k} w_i(x_i-p_k)^2$$` - Prototypes (centers) `\(p_k\)` are given as weighted cluster means: `$$p_k = \frac{\sum_{i:g_i = k} v_ix_i}{\sum_{i:g_i = k} v_i}$$` - Adapt iterative refinement procedure to case of weighted observation. ] ] --- # Weighted `\(K\)`-means for Curves - Transform observation from preprocessing: `\(\boldsymbol{y}_i \in \mathbb{R}^{M \times 3}\)` is split up in vectors `\(\boldsymbol{z}_i, \boldsymbol{w}_i \in \mathbb{R}^{2M}\)`. `$$\boldsymbol{z}_i = (x_1,\dots,x_M,y_1,\dots,y_M)$$` `$$\boldsymbol{w}_i = (w_1,\dots,w_M,w_1,\dots,w_M)$$` - Find clusters and prototypes such that: `$$\min_{(p_{jk}),(g_i)}\sum_{k = 1}^K \sum_{i:g_i = k} \sum_{j = 1}^{\tilde{M}} w_{i,j}(z_{i,j}-p_{k,j})^2.$$` - Find clusters by iteratively alternating between finding the optimal prototypes for a given cluster and finding the optimal cluster assignments given prototypes. --- # Weighted `\(K\)`-means Algorithm 1. Initialize appropriate starting assignments of clusters. 2. For given cluster assignment find the optimal prototypes: `$$p_{k,j} = \frac{\sum_{i:g_i = k} w_{i,j}z_{i,j}}{\sum_{i:g_i = k} w_{i,j}}$$` 3. Given prototypes find optimal cluster assignments by minimizing `$$\sum_{j = 1}^M w_{i,j}(z_{i,j}-p_{k,j})^2$$` over `\(k\)`. 4. Repeat steps 2-3 until convergence, i.e. until change in function to optimize is below some tolerance level. --- # Weighted `\(K\)`-means Recap - Weighted `\(K\)`-means algorithm provides clustering of curves, i.e. `\((x,y)\)`-coordinates by accounting for weights `\(w\)`. - Spatial distance of curves is **less** important, when weights are small and **more** important, when weights are high. - *Irrelevant* routes are likely to be clustered together, even if spatially very far from each other. Disadvantage of weighted `\(K\)`-means: - Heavy preprocessing necessary in order to apply algorithm to curves of different lengths. - Hard clustering. --- class: inverse, center, middle # Application --- # Data - NFL tracking data: - Routes for every player on every passing play of the first half of the 2021/22 season (~ 8000 plays). - Route information from start of the play until relevant event (pass/fumble). - A number of player and play specific details. - `\(\sim\)` 80000 Routes of defensive players to cluster: - Initial preprocessing steps comprise of standardizing each play to line of scrimmage ( `\(x\)`-axis) and football ( `\(y\)`-axis). - Routes are labeled into pass rushers and coverage players. --- # Clustering Results <img src="BBS_Bajons_files/figure-html/plot_clusters-1.png" width="1008" style="display: block; margin: auto;" /> [More comparisons in Appendix](#res_pr_only1) --- # Clustering Results <img src="BBS_Bajons_files/figure-html/plot_clusters2-1.png" width="1008" style="display: block; margin: auto;" /> --- # Analysing Players <img src="BBS_Bajons_files/figure-html/plot_players-1.png" width="1008" style="display: block; margin: auto;" /> --- # Analysing Teams .panelset[ .panel[.panel-name[Defense] <img src="BBS_Bajons_files/figure-html/plot_teams_def-1.png" width="1008" style="display: block; margin: auto;" /> ] .panel[.panel-name[Offense] <img src="BBS_Bajons_files/figure-html/plot_teams_off-1.png" width="1008" style="display: block; margin: auto;" /> ] ] --- # Play Database for Coaches - Suppose coach wants to prepare for a game: - Analyse specific defensive player of the opponent team. - Analyse specific strengths of a team. - Coach could go through all of the videotapes of the season to see how opponent player/team handled situations. - Coach could filter database by considering specific cluster routes of interest and only analyse relevant tapes. - Example: - Analyse strong pass rusher who generally plays on the left but is used flexible and also strong on the right side. - Filter for strong right side cluster plays of player and analyse only relevant tapes (filter for pressure as well). --- class: inverse, center, middle # More Work To Do... --- # Regression Mixture Models - Following Chu, Reyers, Thomson et al. (2020) consider the following regression model for curves `\(\boldsymbol{y}_i\)`: `\begin{equation} \boldsymbol{y}_i = \boldsymbol{T}_i\boldsymbol{\theta} + \boldsymbol{\epsilon}_i. \end{equation}` - `\(\boldsymbol{T}_i\)` is an `\(m_i \times (P+1)\)`-matrix of Bernstein polynomials evaluated at each timepoint `\(t_j, j = 1,\dots,m_i\)`. - `\(\boldsymbol{\epsilon}_i\)` is a Gaussian error term. - In the simplest case: obtain a probabilistic model for each curve, such that `\(\boldsymbol{y}_i\)` given `\(\boldsymbol{T}_i\)` follows `\(\mathcal{N}(\boldsymbol{T}_i\boldsymbol{\theta}, \sigma^2 I)\)`. - Curves assumed to be centered around a Bézier curve, where the control points `\(\boldsymbol{\theta}\)` describing the mean curve (as well as the variance `\(\sigma^2\)`) need to be estimated. --- # Regression Mixture Models - Incorporate cluster structure by assuming mixture model for each curve, i.e.: `\begin{equation} P(\boldsymbol{y}_i|\boldsymbol{T}_i,\Theta) = \sum_k^K \alpha_k p_k(\boldsymbol{y}_i|\boldsymbol{T}_i\boldsymbol{\theta}_k,\sigma_k^2), \end{equation}` where `\(p_k\)` is the normal density. - Following Gaffney and Smyth (2004) an EM algorithm can be derived to estimate cluster specific parameters. - Idea: Adapt procedure using weighted likelihood methods (c.f. Newton and Raftery (1994)). - So far work in progress... - Advantages: - No need for heavy preprocessing. - Soft clustering. - At the same time: due to using weights on observations in likelihood, similar effect, that irrelevant routes have less effect on cluster estimation. --- class: inverse, center, middle # Thank You! --- name: mylastslide # References Chu, D., M. Reyers, J. Thomson, et al. (2020). "Route identification in the National Football League". In: _Journal of Quantitative Analysis in Sports_ 16.2, pp. 121-132. DOI: [doi:10.1515/jqas-2019-0047](https://doi.org/doi%3A10.1515%2Fjqas-2019-0047). URL: [https://doi.org/10.1515/jqas-2019-0047](https://doi.org/10.1515/jqas-2019-0047). Gaffney, S. J. and P. Smyth (2004). "Probabilistic Curve-Aligned Clustering and Prediction with Regression Mixture Models". AAI3119737. Hastie, T., R. Tibshirani, and J. Friedman (2009). _The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition_. Springer Series in Statistics. Springer New York. ISBN: 9780387848587. URL: [https://books.google.at/books?id=tVIjmNS3Ob8C](https://books.google.at/books?id=tVIjmNS3Ob8C). James, G., D. Witten, T. Hastie, et al. (2021). _An Introduction to Statistical Learning: with Applications in R_. Springer Texts in Statistics. Springer US. ISBN: 9781071614174. URL: [https://books.google.at/books?id=g5gezgEACAAJ](https://books.google.at/books?id=g5gezgEACAAJ). Miller, A. C. and L. Bornn (2017). "Possession sketches : Mapping nba strategies". In: _Proceedings of the 2017 MIT Sloan Sports Analytics Conference_. Newton, M. A. and A. E. Raftery (1994). "Approximate Bayesian Inference with the Weighted Likelihood Bootstrap". In: _Journal of the Royal Statistical Society. Series B (Methodological)_ 56.1, pp. 3-48. ISSN: 00359246. URL: [http://www.jstor.org/stable/2346025](http://www.jstor.org/stable/2346025) (visited on Jun. 15, 2023). --- class: inverse, center, middle # Appendix --- name:res_pr_only1 # Pass Rush Routes only <img src="BBS_Bajons_files/figure-html/plot_clusters_pr_only-1.png" width="1008" style="display: block; margin: auto;" /> --- name:res_pr_only2 # Pass Rush Routes only - Analyse predictive performance of clusters in terms of correctly classifying pressure on a play. - Are the clusters helping in predicting pressure? - Table below shows results based on different kind of metrics. (Brier score, Accuracy, Recall, Precision, F-score, Matthews correlation coefficient, Area under the ROC)