Robert Bajons | Joint work with Kurt Hornik | 17 Dec. 2023
Vienna University of Economics and Business
CMStatistics 2023


Curves (spatio-temporal observations) are common in sports analysis:
⇒ studying patterns in curves allows for interesting insights.
Curves (spatio-temporal observations) are common in sports analysis:
⇒ studying patterns in curves allows for interesting insights.
Depending on use case: assign weights to curves at each time point.
Finding structure in augmented data ⇒ clustering algorithm for weighted curves needed.
Weights helpful to distinguish curves according to their worth ⇒ obtain more representative patterns.
Curves (spatio-temporal observations) are common in sports analysis:
⇒ studying patterns in curves allows for interesting insights.
Depending on use case: assign weights to curves at each time point.
Finding structure in augmented data ⇒ clustering algorithm for weighted curves needed.
Weights helpful to distinguish curves according to their worth ⇒ obtain more representative patterns.
Examples:
Routes in the NFL: focus on pass rush.

Pass rush in the NFL: add pressure weights.

Possessions in soccer.

Possessions in soccer with goal scoring weights.

Observed data: Z={z1,…,zn}, set of n curves.
Each zi is comprised of observations zit, t=1,…,T.
At each t: a weight wit is assigned to zit.
Observed data: Z={z1,…,zn}, set of n curves.
Each zi is comprised of observations zit, t=1,…,T.
At each t: a weight wit is assigned to zit.
Formal goal: Cluster curves zi ( (x,y)-coordinates) while accounting for information from weights w ⇒ estimate good prototypes (cluster representants).
Observed data: Z={z1,…,zn}, set of n curves.
Each zi is comprised of observations zit, t=1,…,T.
At each t: a weight wit is assigned to zit.
Formal goal: Cluster curves zi ( (x,y)-coordinates) while accounting for information from weights w ⇒ estimate good prototypes (cluster representants).
Initial idea: Use a K-means type of algorithm for clustering.
Two adjustment of usual procedure necessary:
Note: In many applications data preprocessing necessary to use K-means algorithm.
S=K∑k=1∑i:gi=k(xi−pk)2→min(pk),(gi)
Hard problem to solve ⇒ iterative procedure:
pk=1Nk∑i:gi=kxi
Repeat until convergence:
Sw=K∑k=1∑i:gi=kwi(xi−pk)2.
Solve via iterative procedure:
pk=∑i:gi=kwixi∑i:gi=kwi.
Observations: time series objects of the same length zi=(zi1,…,ziT) (component-wise weighted object).
Additional information from weights wi=(wi1,…,wiT):
K∑k=1∑i:gi=kT∑t=1wit(zit−ptk)2→min(ptk),(gi).
Iterative refinement procedure:
Initialize appropriate starting assignments of clusters.
For given cluster assignment find the optimal prototypes (component-wise): pkt=∑i:gi=kwitzit∑i:gi=kwit.
Given prototypes find optimal cluster assignments by minimizing T∑t=1wit(zit−pkt)2 over k.
Repeat steps 2-3 until convergence, i.e. until change in function to optimize is below some tolerance level.
Implementation hinges on three functions:
C: computes the optimal prototypes for given cluster assignment.D: computes (weighted) distance for each observation to cluster.E: evaluates objective function for given assignments and prototypes.Advantages:
R operation.C, D and E).What about soft clustering?
Fuzzy K-means: derive optimal prototypes and allow for mixed membership of the observations to clusters.
Fuzzy extension of weighted K-means:
N∑i=1K∑k=1T∑t=1umikwit(zit−pkt)2→min(uik),(ptk)
What about soft clustering?
Fuzzy K-means: derive optimal prototypes and allow for mixed membership of the observations to clusters.
Fuzzy extension of weighted K-means:
N∑i=1K∑k=1T∑t=1umikwit(zit−pkt)2→min(uik),(ptk)
Solve via adapted iterative procedure:
Implementation in R follows similar idea as before ⇒ adjust C and E, implement function U for membership degrees.
NFL tracking data:
Initial preprocessing:
Augment data with pressure weights:

Event stream data:
Preprocessing:
Augment data with goal scoring weights:

Sports data is full of spatio-temporal measurements (curves) ⇒ Finding patterns in the data allows interesting insights.
In most cases: augmenting curves with weights is quite intuitive ⇒ allows for clustering of interesting routes.
We derived a weighted K-means type of algorithm for component-wise weighted curves/objects.
R.Applications:
Future work:
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| o | Tile View: Overview of Slides |
| Alt + f | Fit Slides to Screen |
| Esc | Back to slideshow |
Robert Bajons | Joint work with Kurt Hornik | 17 Dec. 2023
Vienna University of Economics and Business
CMStatistics 2023


Curves (spatio-temporal observations) are common in sports analysis:
⇒ studying patterns in curves allows for interesting insights.
Curves (spatio-temporal observations) are common in sports analysis:
⇒ studying patterns in curves allows for interesting insights.
Depending on use case: assign weights to curves at each time point.
Finding structure in augmented data ⇒ clustering algorithm for weighted curves needed.
Weights helpful to distinguish curves according to their worth ⇒ obtain more representative patterns.
Curves (spatio-temporal observations) are common in sports analysis:
⇒ studying patterns in curves allows for interesting insights.
Depending on use case: assign weights to curves at each time point.
Finding structure in augmented data ⇒ clustering algorithm for weighted curves needed.
Weights helpful to distinguish curves according to their worth ⇒ obtain more representative patterns.
Examples:
Routes in the NFL: focus on pass rush.

Pass rush in the NFL: add pressure weights.

Possessions in soccer.

Possessions in soccer with goal scoring weights.

Observed data: Z={z1,…,zn}, set of n curves.
Each zi is comprised of observations zit, t=1,…,T.
At each t: a weight wit is assigned to zit.
Observed data: Z={z1,…,zn}, set of n curves.
Each zi is comprised of observations zit, t=1,…,T.
At each t: a weight wit is assigned to zit.
Formal goal: Cluster curves zi ( (x,y)-coordinates) while accounting for information from weights w ⇒ estimate good prototypes (cluster representants).
Observed data: Z={z1,…,zn}, set of n curves.
Each zi is comprised of observations zit, t=1,…,T.
At each t: a weight wit is assigned to zit.
Formal goal: Cluster curves zi ( (x,y)-coordinates) while accounting for information from weights w ⇒ estimate good prototypes (cluster representants).
Initial idea: Use a K-means type of algorithm for clustering.
Two adjustment of usual procedure necessary:
Note: In many applications data preprocessing necessary to use K-means algorithm.
S=K∑k=1∑i:gi=k(xi−pk)2→min(pk),(gi)
Hard problem to solve ⇒ iterative procedure:
pk=1Nk∑i:gi=kxi
Repeat until convergence:
Sw=K∑k=1∑i:gi=kwi(xi−pk)2.
Solve via iterative procedure:
pk=∑i:gi=kwixi∑i:gi=kwi.
Observations: time series objects of the same length zi=(zi1,…,ziT) (component-wise weighted object).
Additional information from weights wi=(wi1,…,wiT):
K∑k=1∑i:gi=kT∑t=1wit(zit−ptk)2→min(ptk),(gi).
Iterative refinement procedure:
Initialize appropriate starting assignments of clusters.
For given cluster assignment find the optimal prototypes (component-wise): pkt=∑i:gi=kwitzit∑i:gi=kwit.
Given prototypes find optimal cluster assignments by minimizing T∑t=1wit(zit−pkt)2 over k.
Repeat steps 2-3 until convergence, i.e. until change in function to optimize is below some tolerance level.
Implementation hinges on three functions:
C: computes the optimal prototypes for given cluster assignment.D: computes (weighted) distance for each observation to cluster.E: evaluates objective function for given assignments and prototypes.Advantages:
R operation.C, D and E).What about soft clustering?
Fuzzy K-means: derive optimal prototypes and allow for mixed membership of the observations to clusters.
Fuzzy extension of weighted K-means:
N∑i=1K∑k=1T∑t=1umikwit(zit−pkt)2→min(uik),(ptk)
What about soft clustering?
Fuzzy K-means: derive optimal prototypes and allow for mixed membership of the observations to clusters.
Fuzzy extension of weighted K-means:
N∑i=1K∑k=1T∑t=1umikwit(zit−pkt)2→min(uik),(ptk)
Solve via adapted iterative procedure:
Implementation in R follows similar idea as before ⇒ adjust C and E, implement function U for membership degrees.
NFL tracking data:
Initial preprocessing:
Augment data with pressure weights:

Event stream data:
Preprocessing:
Augment data with goal scoring weights:

Sports data is full of spatio-temporal measurements (curves) ⇒ Finding patterns in the data allows interesting insights.
In most cases: augmenting curves with weights is quite intuitive ⇒ allows for clustering of interesting routes.
We derived a weighted K-means type of algorithm for component-wise weighted curves/objects.
R.Applications:
Future work: