Vienna University of Economics and Business
Sep 27, 2025
How do we commonly evaluate players in sports?
Sports analytics:
Expected value metrics \(\rightarrow\) compare observed and expected outcome:
Examples:
Goals above expectation (GAX):
over a time frame (e.g. a season), compute differences between goals and xG for all shots of player \(i\)
\[ \operatorname{GAX}_i = \sum_{j=1}^{N_i} (Y_j - \hat h(Z_j))\]
Recent criticism of GAX (Baron et al. 2024; Davis and Robberechts 2024):
Instability and limited replicability (over seasons)
Low (effective) sample size \(\rightarrow\) high uncertainty, lack of uncertainty quantification
Biases arising from data:
\(\Rightarrow\) GAX have been labeled a poor measure for evaluating shooting skills
Recent criticism of GAX (Baron et al. 2024; Davis and Robberechts 2024):
Instability and limited replicability (over seasons)
Low (effective) sample size \(\rightarrow\) high uncertainty, lack of uncertainty quantification
Biases arising from data:
\(\Rightarrow\) GAX have been labeled a poor measure for evaluating shooting skills
Are GAX really a poor metric for player evaluation?
How can we identify outstanding shooters?
Setup:
Logistic regression model:
\(Y \mid X,Z \sim \operatorname{Ber}(\pi(X,Z)), \quad \pi(X,Z) = P(Y=1 \mid X,Z)\) and
\[ \begin{aligned} \log\left(\frac{\pi(X,Z)}{1-\pi(X,Z)}\right) = X\beta + Z^{\top}\gamma. \end{aligned} \]
Goal: Inference on \(\beta\)
Given i.i.d data \((Y_i,X_i,Z_i)_{i = 1}^N\) from the logistic regression model
Score of \(\beta\) (target of score tests): …………………………………………..
\[ \sum_{i = 1}^N\frac{\partial\log L(\beta,\gamma \mid Y_i,X_i,Z_i)}{\partial \beta}\]
Score test on \(\beta\) uses score under \(H_0: \beta = 0\):
Recall:
\[ \operatorname{GAX}_i = \sum_{j=1}^{N_i} (Y_j - \hat h(Z_j))\]
Since \(X_j\) binary \(\Rightarrow\) score is exactly GAX for a player
Conclusion: GAX relates to a classical score test in logistic regression model
Problems:
Linear model assumptions unrealistic
traditional GAX via machine learning model:
Partially linear logistic regression model (PLLM) ………………………………….
\[ \log\left(\frac{\pi(X,Z)}{1-\pi(X,Z)}\right) = X\beta + g(Z) \]
Goal: Inference on \(\beta\) (testing \(H_0 : \beta = 0\))
GCM test uses empirical GCM:
GAX in parametric model:
\[\sum_{j=1}^{N} (Y_j - \hat h(Z_j))X_j\]
GAX via machine learning:
\[\sum_{j=1}^{N} (Y_j - \hat{h}(Z_j))X_j\]
rGAX:
\[\sum_{j=1}^{N} (Y_j-\hat h(Z_j))(X_j - \hat f(Z_j))\]
GAX in parametric model:
\[\sum_{j=1}^{N} (Y_j - \hat h(Z_j))X_j\]
GAX via machine learning:
\[\sum_{j=1}^{N} (Y_j - \hat{h}(Z_j))X_j\]
rGAX:
\[\sum_{j=1}^{N} (Y_j-\hat h(Z_j))(X_j - \hat f(Z_j))\]