Expected Pass Value (xPV)

A Holistic Framework for Evaluating Passing Situations in Soccer

Tobias Harringer | Robert Bajons

May 5, 2026

Motivation

An example: A pass from Enzo Fernandez

Overview

Goal
- Value passes less outcome dependent
- Gain deeper insights into passing situations

Overview

Goal
- Value passes less outcome dependent
- Gain deeper insights into passing situations
Framework
- Pitch Control: likelihood of controlling a location
- Pitch Value: value of possession at a location
- Combine models to obtain expected pass value (xPV)

Overview

Goal
- Value passes less outcome dependent
- Gain deeper insights into passing situations
Framework
- Pitch Control: likelihood of controlling a location
- Pitch Value: value of possession at a location
- Combine models to obtain expected pass value (xPV)
Insights
- Value real passes
- Value hypothetical passes

Data

Source:
2022 World Cup dataset: Event + Tracking data

Data

Source:
2022 World Cup dataset: Event + Tracking data
- Event data: passes, shots, dribbles, …
- Tracking data: \((x,y)\)-coordinates of players and ball

Data

Source:
2022 World Cup dataset: Event + Tracking data
- Event data: passes, shots, dribbles, …
- Tracking data: \((x,y)\)-coordinates of players and ball

Tracking data capture:

Where are players positioned?

Data

Source:
2022 World Cup dataset: Event + Tracking data
- Event data: passes, shots, dribbles, …
- Tracking data: \((x,y)\)-coordinates of players and ball

Tracking data capture:

Where are players positioned?
Where are players heading to?

Approximate using centered differences:

\(\hat{v}_x(t) = \frac{x_{t+\Delta} - x_{t-\Delta}}{2\Delta}\)

\(\hat{v}_y(t) = \frac{y_{t+\Delta} - y_{t-\Delta}}{2\Delta}\)

Theoretical Framework

Goal: Value passes in expectation
Setup:

Define binary variables:

\[\begin{aligned} C = \begin{cases} 1, & \text{if chance for own team} \\ 0, & \text{otherwise} \end{cases}, \quad S = \begin{cases} 1, & \text{if pass successful} \\ 0, & \text{otherwise} \end{cases} \end{aligned}\]

Expected Pass Value xPV:

\[ xPV = \mathbb{E}(C \mid P, I) \]

where:

\(P\): pass from \((x_0, y_0)\) to \((x_1, y_1)\)
\(I\): all available information at the time of the pass, i.e. tracking information

By law of total expectation:

\[ xPV = \mathbb{E}(C \mid P, I, S = 1) \mathbb{P}(S = 1 \mid P, I) + \mathbb{E}(C \mid P, I, S = 0) \mathbb{P}(S = 0 \mid P, I) \]

Theoretical Framework

Goal: Value passes in expectation
Setup:

Define binary variables:

Expected Pass Value xPV:

\[ xPV = \mathbb{E}(C \mid P, I) \]

where:

\(P\): pass from \((x_0, y_0)\) to \((x_1, y_1)\)
\(I\): all available information at the time of the pass, i.e. tracking information

By law of total expectation: \[ xPV = \underbrace{\mathbb{E}(C \mid P, I, S = 1)}_{\text{Pitch Value}}\underbrace{\mathbb{P}(S = 1 \mid P, I)}_{\text{Pitch Control}} + \underbrace{\mathbb{E}(C \mid P, I, S = 0)}_{\approx \text{ 0}} \underbrace{\mathbb{P}(S = 0 \mid P, I)}_{\text{Pitch Control}} \]

Pitch Control

Definition: Pitch control, at any location \((x,y)\), is the probability that the team currently in possession keeps possession if the ball were passed to that location

Intuition
Modelling Approach

Simple model: closest distance → control

Velocity?

Ball?

Uncertainty?

Pitch control based on time it takes for players and the ball to reach a location (Wu and Swartz 2024; Spearman 2018)
Data-driven: build a supervised model
- Response: pass outcome (success = 1, fail = 0)
- Inputs: estimated arrival times
Train model with gradient-boosted trees (XGBoost) on observed passes
Compute time to location to any hypothetical location → estimate pitch control

Pitch Value

Goal: estimate \(\mathbb{E}(C \mid P, I, S = 1)\)
What is a chance?
- Ideally: Goals
- Proxy: Shots, idea based on Power et al. (2017)
Define: \(C = \begin{cases} 1, & \text{if shot for own team within 10 seconds} \\ 0, & \text{otherwise} \end{cases}\)
Derive for each successful pass
- Label \(C\)
- Features capturing game context from \(P\) and \(I\) (more details)
Supervised learning: train XGBoost model to estimate \(C\)
Apply model to any hypothetical location

Results - Model Evaluation

Calibration Pitch Control Model

(More details)

Calibration Pitch Value Model

(More details)

Results - Revisiting Motivation

An example: Evaluating a pass from Enzo Fernandez

Real situation
Pitch control
Pitch value
xPV

Who controls a location? Pitch Control of pass: \(0.74\)

How valuable is a location? Pitch Value of pass: \(0.59\)

How valuable is a pass? xPV of pass: \(0.44\)

Results - Predictive Power

How well can we predict future passing performance?
Aggregated metrics (xPV, expected Assists (xA), Assists) per player and game
Correlation of lagged values (game \(i\!-\!1\)) and current values (game \(i\))

Values represent pearson correlation, brackets show bootstrapped 95% confidence intervals. N = 878 (at least 45 min, sensitivity)

Results suggest that xPV is a stronger predictor of future passing performance than existing metrics

Results - Holistic Analysis

Beyond evaluating the pass: full xPV surface enables holistic analysis

Decision making
Defender positioning
Space creation

Was it the right pass?

Alternative positioning?

How much space does an attacker create?

Missing piece: Which space is truly relevant?

Conclusion

Developed Pitch Control and Pitch Value model
Combined models to obtain xPV
xPV shows promising results to value real passes
We can value any hypothetical pass with xPV → allows for holistic analysis of passing situations
Extension and possible future work:
- Which passes are likely and feasible?
- Include risk

Thank you for listening!

Questions?

References

Martens, Florian, Uwe Dick, and Ulf Brefeld. 2021. “Space and Control in Soccer.” Frontiers in Sports and Active Living 3 (July). https://doi.org/10.3389/fspor.2021.676179.

Moura, Felipe Arruda, Luiz Eduardo Barreto Martins, Ricardo De Oliveira Anido, Ricardo Machado Leite De Barros, and Sergio Augusto Cunha. 2012. “Quantitative Analysis of Brazilian Football Players’ Organisation on the Pitch.” Sports Biomechanics 11 (1): 85–96. https://doi.org/10.1080/14763141.2011.637123.

Power, Paul, Hector Ruiz, Xinyu Wei, and Patrick Lucey. 2017. “Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17. ACM. https://doi.org/10.1145/3097983.3098051.

Spearman, William. 2018. “Beyond Expected Goals.” In. MIT Sloan Sports Analytics Conference.

Wu, Lucas, and Tim B. Swartz. 2024. “A New Metric for Pitch Control Based on an Intuitive Motion Model.” Computational Statistics, June. https://doi.org/10.1007/s00180-024-01512-2.

Appendix: Top Players World Cup 2022

Who were the best players in terms of xPV in the World Cup 2022?

Total xPV across tournament

Player	Total xPV	MV in million €
Lionel Messi	28.780	50
Rodrigo de Paul	22.109	35
Mateo Kovacic	20.460	40
Pedri	19.900	100
Jamal Musiala	17.143	100
Kylian Mbappé	17.066	160
Luka Modric	16.757	10
Antoine Griezmann	15.939	25
Ousmane Dembele	14.628	60
Bernardo Silva	14.439	80

xPV per 90 Minutes

Player	xPV/90	MV in million €
Jamal Musiala	5.980	100
Pedri	5.031	100
Ilkay Gündogan	4.666	25
Rodrygo	3.941	80
Joshua Kimmich	3.878	80
Lionel Messi	3.754	50
Angel Di Maria	3.722	10
Bernardo Silva	3.650	80
Dusan Tadić	3.612	7
Rodrigo de Paul	3.322	35

Appendix: Market Value Correlation

Tested correlation of different metrics with market value
Aggregated xPV, Goals, Assists and expected Assists per player over the whole tournament
As in tournament mode better players usually play more games, they have more game time to aggregate values. Therefore, we tested correlation with market value for aggregated values and for normalized per 90 minutes played values

Correlation: Aggregated Values

	Market Value
xPV	0.47
Goals	0.31
Assists	0.25
xA	0.33

Correlation: Per 90 Values

	Market Value
xPV/90	0.47
Goals/90	0.25
Assists/90	0.17
xA/90	0.25

Appendix: Sensitivity of Correlation Analysis

In the correlation analysis of xPV with xA and Assists, samples where only considerd if the player played in the previous and current game at least 45 minutes
The plot shows the correlation for different threshold values (1-60 minutes)

(Jump back)

Player Time to Location

How long does it take for a player to reach a target location \((x_1, y_1)\) given their current location \((x_0, y_0)\) and current velocity \((v_{x,0}, v_{y,0})\)?

Problem
Approach

Formally: use equation of motion and minimize time to reach \((x_1, y_1)\)
Constraints: maximum speed and acceleration \(\Rightarrow\) no closed-form solution

Numerical optimization

\(\Rightarrow\) Realistic, but slow

Simple analytical model

\(\Rightarrow\) Fast, but unrealistic

How can we get a realistic and fast model?

Decomposed approximation model: \(t_\text{approx} = t_\text{simple} + \alpha \times p\)
- \(t_\text{simple}\): simple time based on distance and current speed
- \(p\): penalty term based on angular mismatch, scaled by current speed
- \(\alpha\): learn scalar from data

Example: Angular mismatch

Overview: Take Wu and Swartz (2024) as baseline approach to obtain \(t_\text{ws}\), approximate \(t_\text{ws}\) with an efficient analytical model

Description of Wu and Swartz (2024) algorithm:

Constraints: maximum speed \(\sqrt{v_x^2 + v_y^2} \leq v_{\text{max}}\), maximum acceleration \(\sqrt{a_x^2 + a_y^2} \leq a_{\text{max}}\)
Two-phase model: player has constant acceleration (\(a_x\), \(a_y\)) unil he reaches \(\sqrt{v_x^2 + v_y^2} = v_\text{max}\), then continues to run to target location with \(v_{max}\)
Obtain solution: grid search over possible acceleration pairs (\(a_x\), \(a_y\)), taking minimum time \(t\) such that player reaches target location as \(t_\text{ws}\)

Approximation model:

Decompose time into two analytically solvable components (\(t_\text{simple}\), \(p\)) and include a scalar (\(\alpha\)) which can be learned to approximate \(t_\text{ws}\)
\(t_\text{approx} = t_\text{simple} + \alpha \times p\)

Description:

We ignore the direction of velocity. Player starts with current total speed \(v_0\), and then has constant acceleration \(a\) until he reaches \(v_\text{max}\) and then runs with \(v_\text{max}\) to target location
Two cases: player reaches target location before maximum speed or player reaches maximum speed before target location

Calculation:

Calculate distance to target location: \(d_\text{target} = \sqrt{(x_1 - x_0)^2 + (y_1 - y_0)^2}\)
Check distance player runs while accelerating to maximum speed: \(d_\text{acc} = \frac{v_\text{max} + v_0}{2} \cdot t_\text{acc}\)

where \(v_0 = \sqrt{(v_{x,0}^2 + v_{y,0}^2)}\) is the current total speed and \(t_\text{acc} = (v_\text{max} - v_0)/a\) is the time it takes to reach maximum speed

Case 1: \(d_\text{acc} \geq d_\text{target}\): solve kinematic equation \(d_\text{target} = v_0 \cdot t + \frac{1}{2} \cdot a \cdot t\) using quadratic formula to obtain: \(t_\text{simple} = \frac{-v_0 \pm \sqrt{v_0^2 + 2 \cdot a \cdot d_\text{target}}}{a}\) (take positive root, as negative solutions in time nonsensical)
Case 2: \(d_\text{acc} < d_\text{target}\): calculate time it takes to cover remaining distance after acceleration (\(d_\text{rem} = d_\text{target} - d_\text{acc}\)) as \(t_\text{rem} = \frac{d_\text{rem}}{v_\text{max}}\) and add time of acceleration phase to obtain: \(t_\text{simple} = t_\text{rem} + t_\text{acc}\)

Description:

As \(t_\text{simple}\) ignores the direction of velocity, the penalty \(p\) is a term to account for this
It is calculated as the angular mismatch of a player, i.e. the shortest angular difference between the players current movement angle and the angle from the current location of the player to the target location; and then scaled by the current total speed
Intuitively: the faster a player runs into the “wrong” direction, the higher the penalty
We test two penalty variants: (i) the angular mismatch, and (ii) the cosine difference of the angular mismatch, which introduces non-linearity

Angular mismatch calculation: \[ \Delta \theta = \min(|\theta_\text{move} - \theta_\text{target}|, 2 \pi - |\theta_\text{move} - \theta_\text{target}|) \]

where \(\theta_\text{move} = \operatorname{atan2}(v_{0,y},v_{0,x})\) is the current movement angle and \(\theta_\text{target} = \operatorname{atan2}((y_1 - y_0) , (x_1 - x_0))\) the angle from the current location to the target location

Penalty calculation:

Version 1: Angular penalty: \(p_1 = \Delta \theta \cdot v_0\)
Version 2: Cosine penalty: \(p_2 = (1 - \cos(\Delta \theta)) \cdot v_0\)

Goal: learn \(\alpha\) such that \(t_\text{approx} = t_\text{simple} + \alpha \times p_i \approx t_\text{ws}\)

Generate training data:

1.000.000 random combinations of current location \((x_0, y_0)\) and target location \((x_1, y_1)\) within the pitch dimensions and current velocity \((v_{x,0}, v_{y,0})\)
Calculate for each sample:
- \(t_\text{ws}\): by grid search using Wu and Swartz (2024) algorithm as described
- \(t_\text{simple}, p_1, p_2\): analytically as described

Train model: linear regression without intercept (in case of no penalty, i.e. 0 velocity or no angular mismatch, \(t_\text{ws} \approx t_\text{simple}\), as running in straight line to target is then optimal)

Angular mismatch model: \(t_\text{ws} - t_\text{simple} = \beta_1 \cdot p_1 + \epsilon\)
Cosine difference model: \(t_\text{ws} - t_\text{simple} = \beta_2 \cdot p_2 + \epsilon\)

where \(\beta_1\) and \(\beta_2\) are coefficients estimated by minimizing the sum of squared residuals \(\sum_i^n \epsilon_i^2\)

Resulting coefficients: \(\beta_1 = 0.1328\), \(\beta_2 = 0.1986\)

Evaluation:

Generate 300.000 test data (same method as generating training data)
Calculate for each test sample \(t_\text{ws}\) and \(t_{\text{approx}_1} = t_\text{simple} + 0.1328 \cdot p_1\), \(t_{\text{approx}_2} = t_\text{simple} + 0.1986 \cdot p_2\)
Evaluate \(t_\text{ws} - t_{\text{approx}_1}\) (Angular) and \(t_\text{ws} - t_{\text{approx}_2}\) (Cosine) by using root mean squared error (RMSE), mean absolute error (MAE) and fraction of cases where absolute difference is below \(0.01\) (< 0.01) and \(0.05\) (< 0.05) seconds
Context: \(t_\text{ws}\) is on average 6.23 seconds for the test data

Evaluation table:

Metric	Angular	Cosine
RMSE	0.2480	0.1189
MAE	0.1906	0.0678
< 0.01	0.6183	0.7016
< 0.05	0.6875	0.9004

Conclusion:

Cosine difference penalty better than (raw) angular mismatch penalty
With the cosine difference penalty we can approximate \(t_\text{ws}\) very well in an efficient way

Appendix: Ball Time to Location

How long does it take to pass the ball from some location \((x_0, y_0)\) to a target location \((x_1, y_1)\)?

Approach
Learned Function

We model pass duration as a function of distance (Martens, Dick, and Brefeld 2021)
We use a data-driven approach by using observed passes and their duration and distance
To learn the function \(f(d) = t\), where d is the distance of a pass in meters and t the pass duration in seconds, we use cubic smoothing splines, i.e. we fit piecewise cubic polynomials that are joined at knots, with a penalty \(\lambda\) controlling the flexibility of the fit
The function \(f(d)\) minimizes

\[ \sum_i \big(t_{i} - f(d_i)\big)^2 + \lambda \int \big(f''(d)\big)^2 \mathrm{d}d, \]

where \(d_i\) is the distance in meters and \(t_i\) the pass duration of observed pass \(i\) for \(i = 1, \ldots, n\) and the penalty parameter \(\lambda\) controls the smoothness of the fit.

Appendix: Pitch Control - Model Evaluation

Evaluation Metrics
Model Calibration

N = 62,309 samples, 5-fold CV, i.e. to obtain out-of-sample predictions for all samples we split the data into 5 equally sized folds, always train a model on 4 and predict the 5th
Baseline 1: Naive; naive predictor always predicting empirical mean (0.8657)
Baseline 2: GLM; simple glm model using time to location of 3 closest players of both teams and ball as features
Model: XGBoost; using time to location of all players and ball as features

Metric	Naive	GLM	XGBoost
Accuracy	0.8657	0.9444	0.9493
Area under the curve	-	0.9730	0.9792
Logloss	0.3944	0.1405	0.1224
Brier score	0.1163	0.0410	0.0369

(Jump back)

Calibration plot simple GLM model

Calibration plot XGBoost model

Appendix: Pitch Value - Features

Goal: derive game context

Spatial features
Team context
Team shape
Defensive structure

Euclidean distance to goal center from pass start and target location
Pass progressive distance: x-distance to goal after pass minus x-distance to goal before pass
Euclidean distance of pass start to target location
Cosine and sine of angle from target location to goalcenter

(Jump back)

Support: estimated player time-to-location of 5 earliest arriving players from team in possession
Pressure: estimated player time-to-location of 5 earliest arriving players from team defending
Average x-coordinate of team in possession/defending
Average velocity in x-direction of team in possession/defending
Number of defenders/attackers behind pass start and target location in x-coordinate

Jarvi’s March: find hull vertices \(p_1 = (x_1, y_1), \ldots,\text{ } p_m = (x_m, y_m)\)
Area: \(A = \frac{1}{2} \sum_{i=1}^m (x_i y_{i+1} - x_{i+1} y_i)\)
Perimeter: \(P = \sum_{i=1}^m \|p_{i+1} - p_i\|_2 \quad\) where \(p_{m+1} = p_1\)

Note: per team, all outfield players (excl. goalkeeper). Visualization shows example for defending team

Defending: usually 3 lines

Identify lines: k-means clustering on defenders x-coordinate, k = 3

\[ \min_{\{C_1,C_2,C_3\}} \; \sum_{k=1}^{3} \sum_{i \in C_k} (x_i - \mu_k)^2 \] where \(\mu_k\) is the mean x-coordinate of cluster \(k\)

Features: Formation, height, compactness

Note: excluding goalkeeper

Appendix: Pitch Value - Model Evaluation

Evaluation Metrics
Model Calibration

N = 53,956 samples, 5-fold CV, i.e. to obtain out-of-sample predictions for all samples we split the data into 5 equally sized folds, always train a model on 4 and predict the 5th
Baseline 1: Naive; naive predictor always predicting empirical mean (0.04)
Baseline 2: GLM; simple glm model using distance to goal as feature
Model: XGBoost; using all derived game context features

Metric	Naive	GLM	XGBoost
Accuracy	0.96	0.9609	0.9621
Area under the curve	-	0.8703	0.8887
Logloss	0.168	0.1250	0.1183
Brier Score	0.0384	0.0326	0.0312

(Jump back)

Calibration plot simple GLM model

Calibration plot XGBoost model

Appendix: xPV with Shots vs. xG

Instead of defining reward variable \(C\) as binary (1, if shot happens within 10 seconds, 0 else) we could replace the 1 with the expected goals (xG) value of the shot
Advantage: Shots from worse positions not rated equal to shots from good positions
Change in xPV surface:

Original xPV surface

xPV surface with xG model for training

Similar surface, but all values for xG model a lot smaller (mean xG of shots is only 0.1, while before all shots where encoded as 1)