Expected Pass Value (xPV)

A Holistic Framework for Evaluating Passing Situations in Soccer

Tobias Harringer | Robert Bajons

Sep 27, 2025

Motivation

An example: A pass from Enzo Fernandez

Red: attacking team; Blue: defending team; Arrows: velocity; Black: ball

Unsuccessful…so, a bad pass?

Motivation

An example: A pass from Enzo Fernandez

Red: attacking team; Blue: defending team; Arrows: velocity; Black: ball

Unsuccessful…so, a bad pass?

Or a bad run by #20?

Overview

Goal
- Value passes less outcome dependent
- Gain deeper insights into passing situations

Overview

Goal
- Value passes less outcome dependent
- Gain deeper insights into passing situations
Framework
- Pitch Control: likelihood of controlling a location
- Pitch Value: value of possession at a location
- Combine models to obtain expected pass value (xPV)

Overview

Goal
- Value passes less outcome dependent
- Gain deeper insights into passing situations
Framework
- Pitch Control: likelihood of controlling a location
- Pitch Value: value of possession at a location
- Combine models to obtain expected pass value (xPV)
Insights
- Value real passes
- Quantify decision making of passer
- Analyze defender positioning
- Situational analysis

Data

Source:
2022 World Cup dataset: Event + Tracking data

Data

Source:
2022 World Cup dataset: Event + Tracking data
- Event data: passes, shots, dribbles, …
- Tracking data: \((x,y)\)-coordinates of players and ball

Data

Source:
2022 World Cup dataset: Event + Tracking data
- Event data: passes, shots, dribbles, …
- Tracking data: \((x,y)\)-coordinates of players and ball

Tracking data capture:

Where are players positioned?

Data

Source:
2022 World Cup dataset: Event + Tracking data
- Event data: passes, shots, dribbles, …
- Tracking data: \((x,y)\)-coordinates of players and ball

Tracking data capture:

Where are players positioned?
Where are players heading to?

Approximate using centered differences:

\(\hat{v}_x(t) = \frac{x_{t+\Delta} - x_{t-\Delta}}{2\Delta}\)

\(\hat{v}_y(t) = \frac{y_{t+\Delta} - y_{t-\Delta}}{2\Delta}\)

Theoretical Framework

Goal: Value passes in expectation
Setup:

Define binary variables:

\[\begin{aligned} C = \begin{cases} 1, & \text{if chance for own team} \\ 0, & \text{otherwise} \end{cases}, \quad S = \begin{cases} 1, & \text{if pass successful} \\ 0, & \text{otherwise} \end{cases} \end{aligned}\]

Expected Pass Value xPV:

\[ xPV = \mathbb{E}(C \mid P, I) \]

where:

\(P\): pass from \((x_0, y_0)\) to \((x_1, y_1)\)
\(I\): all available information at the time of the pass, i.e. tracking information

By law of total expectation:

\[ xPV = \mathbb{E}(C \mid P, I, S = 1) \mathbb{P}(S = 1 \mid P, I) + \mathbb{E}(C \mid P, I, S = 0) \mathbb{P}(S = 0 \mid P, I) \]

Theoretical Framework

Goal: Value passes in expectation
Setup:

Define binary variables:

Expected Pass Value xPV:

\[ xPV = \mathbb{E}(C \mid P, I) \]

where:

\(P\): pass from \((x_0, y_0)\) to \((x_1, y_1)\)
\(I\): all available information at the time of the pass, i.e. tracking information

By law of total expectation: \[ xPV = \underbrace{\mathbb{E}(C \mid P, I, S = 1)}_{\text{Pitch Value}}\underbrace{\mathbb{P}(S = 1 \mid P, I)}_{\text{Pitch Control}} + \underbrace{\mathbb{E}(C \mid P, I, S = 0)}_{\approx \text{ 0}} \underbrace{\mathbb{P}(S = 0 \mid P, I)}_{\text{Pitch Control}} \]

Pitch Control

Definition: Pitch control, at any location \((x,y)\), is the probability that the team currently in possession keeps possession if the ball were passed to that location

Intuition
Modelling Approach

A simple model: closest distance → control

Velocity?

Uncertainty?

Pitch control based on time it takes for players and the ball to reach a location (Wu and Swartz 2024; Spearman 2018)
Data-driven: build a supervised model
- Response: pass outcome (success = 1, fail = 0)
- Inputs: theoretically calculated arrival times
Train model with gradient-boosted trees (XGBoost) on observed passes
Compute time to location to any hypothetical location → estimate pitch control

Player Time to Location

How long does it take for a player to reach a target location \((x_1, y_1)\) given their current location \((x_0, y_0)\) and current velocity \((v_{x,0}, v_{y,0})\)?

Formal problem
Approach
Implementation

Formally: use equation of motion and minimize \(t\)

\[ (x_1, y_1) = (x_0, y_0) + (v_{x,0}, v_{y,0}) t + \tfrac{1}{2}(a_x, a_y) t^2 \]

with constraints on maximum acceleration \(a_\text{max}\) and speed \(v_\text{max}\):

\[ \sqrt{a_x^2 + a_y^2} \leq a_{\text{max}}, \quad \sqrt{v_x^2 + v_y^2} \leq v_{\text{max}} \]

No analytical solution

Numerical optimization, e.g. Wu and Swartz (2024)

Problem: Many calculations per pass → optimization too expensive

Solution: Train fast approximation model
- Generate training data
- Obtain numerical solutions \(t_\text{ws}\) (using the Wu and Swartz (2024) algorithm)
- Construct analytical model: \(t_\text{approx} = t_\text{simple} + \alpha \times p\)
- learn scalar \(\alpha\) such that \(t_\text{approx} \approx t_\text{ws}\)

\(t_\text{simple}\): based on distance to target and current total speed
\(p\): penalty based on angular mismatch, scaled by current speed

Write \(t_\text{approx} = t_\text{simple} + \alpha \times p\)

Find \(\alpha\): regress \(t_\text{ws} - t_\text{simple}\) on \(p\)

(More details)

Example: Angular mismatch

Pitch Control - Model Evaluation

Recall: train XGBoost, features = time to location (players + ball), target = pass success (1/0), n = 62,309 samples (cross fitting)

Model calibration:

(More details)

Pitch Value - Overview

Goal: estimate \(\mathbb{E}(C \mid P, I, S = 1)\)
What is a chance?
- Ideally: Goals
- Proxy: Shots, idea based on Power et al. (2017)
Define: \(C = \begin{cases} 1, & \text{if shot for own team within 10 seconds} \\ 0, & \text{otherwise} \end{cases}\)
Derive for each successful pass
- Label \(C\)
- Features capturing game context from \(P\) and \(I\)
Supervised learning: train XGBoost model
Apply model to any hypothetical location to estimate pitch value

Pitch Value - Features

Goal: derive situational context
Features: location features + team shape + defensive structure

Pitch Value - Features

Goal: derive situational context
Features: location features + team shape + defensive structure

Team shape
Defensive structure

(More details)

Features from convex hull:

Area: \(A = \frac{1}{2} \sum_{i=1}^m (x_i y_{i+1} - x_{i+1} y_i)\)
Perimeter: \(P = \sum_{i=1}^m \|p_{i+1} - p_i\|_2 \quad\) where \(p_{m+1} = p_1\)

Defending: usually 3 lines

(More details)

Identify lines: k-means clustering (\(k=3\), cluster on defenders x-coordinate)

\[ \min_{\{C_1,C_2,C_3\}} \; \sum_{k=1}^{3} \sum_{i \in C_k} (x_i - \mu_k)^2 \] where \(\mu_k\) is the mean x-coordinate of cluster \(k\)

Features: formation, height and compactness of each line

Pitch Value - Model Evaluation

Recall: train XGBoost, features = game context, target = shot (1/0), n = 53,956 samples (cross fitting)

Model calibration:

(More details)

Results - Revisiting Motivation

An example: Evaluating a pass from Enzo Fernandez

Real situation
Pitch control
Pitch value
xPV

Red: attacking team; Blue: defending team; Arrows: velocity; Black: ball

Who controls a location? Pitch Control of pass: \(0.65\)

How valuable is a location? Pitch Value of pass: \(0.61\)

How valuable is a pass? xPV of pass: \(0.40\)

Results - Predictive Power

How well can we predict future passing performance?
Aggregated metrics (xPV, expected Assists (xA), Assists) per player and game
Correlation of lagged values (game \(i\!-\!1\)) and current values (game \(i\))

Values represent pearson correlation, brackets show bootstrapped 95% confidence intervals. N = 878 (at least 45 min, sensitivity)

Results suggest that xPV is a stronger predictor of future passing performance than existing metrics

Results - Decision Making

Was it the right pass?

Evaluate decision making using xPV:

Pass loss: \(\text{PL} = v(P^\ast) - v(P)\)
Pass efficiency: \(\text{PE} = \frac{v(P)}{v(P^\ast)}\)

where \(A\) is the set of all passes (real and hypothetical), \(P \in A\) the real pass, \(P^\ast \in A\) the best possible pass, \(v(P)\) the xPV of pass \(P\)

Question: Is the “best pass” possible?
- Constrain pass set: Maximum distance, angle, …
- Choose \(P^\ast\) from constrained set \(A_c \subset A\)

Results - Positional Play

How well is the positional play of a defender?

“If I have to make a tackle then I have already made a mistake.”

-Paolo Maldini

How much xPV does a defender prevent?

\(\text{xPV}_\text{prevented} = \sum_i^n v(P_i^{(\setminus j)}) - v(P_i)\)

where \(P_i^{(\setminus j)}\) is pass \(i\) without defender \(j\)

How would the xPV-surface change if a defender were positioned differently?

\(\text{xPV}_\text{optimal} = \sum_i^n v(P_i) - v(P^{(j^\ast)}_i)\)

where \(P_i^{(j^\ast)}\) is pass \(i\) with defender \(j\) optimally positioned within radius \(r\) of his true position

xPV-surface: Original situation Total xPV: 1445.4

xPV-surface: without #5 Total xPV: 1543.6 (+98.2)

xPV-surface: Original situation Total xPV: 1445.4

xPV-surface: changed #5 position Total xPV: 1421.8 (-23.6)

Results - Situational analysis

xPV-surface can be used by coaches and analysts to visualize their ideas
We are currently developing a prototype for this use case:

Conclusion

Developed Pitch Control and Pitch Value model
Combined models to obtain xPV
xPV shows promising results to value real passes
We can value any hypothetical pass with xPV → allows for holistic analysis of passing situations
Possible future work:
- include risk
- refine use cases

Thank you for listening!

Questions?

References

Martens, Florian, Uwe Dick, and Ulf Brefeld. 2021. “Space and Control in Soccer.” Frontiers in Sports and Active Living 3 (July). https://doi.org/10.3389/fspor.2021.676179.

Moura, Felipe Arruda, Luiz Eduardo Barreto Martins, Ricardo De Oliveira Anido, Ricardo Machado Leite De Barros, and Sergio Augusto Cunha. 2012. “Quantitative Analysis of Brazilian Football Players’ Organisation on the Pitch.” Sports Biomechanics 11 (1): 85–96. https://doi.org/10.1080/14763141.2011.637123.

Power, Paul, Hector Ruiz, Xinyu Wei, and Patrick Lucey. 2017. “Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17. ACM. https://doi.org/10.1145/3097983.3098051.

Spearman, William. 2018. “Beyond Expected Goals.” In. MIT Sloan Sports Analytics Conference.

Wu, Lucas, and Tim B. Swartz. 2024. “A New Metric for Pitch Control Based on an Intuitive Motion Model.” Computational Statistics, June. https://doi.org/10.1007/s00180-024-01512-2.

Appendix: Top Players World Cup 2022

Who were the best players in terms of xPV in the World Cup 2022?

Total xPV across tournament

Player	Total xPV	MV in million €
Lionel Messi	28.780	50
Rodrigo de Paul	22.109	35
Mateo Kovacic	20.460	40
Pedri	19.900	100
Jamal Musiala	17.143	100
Kylian Mbappé	17.066	160
Luka Modric	16.757	10
Antoine Griezmann	15.939	25
Ousmane Dembele	14.628	60
Bernardo Silva	14.439	80

xPV per 90 Minutes

Player	xPV/90	MV in million €
Jamal Musiala	5.980	100
Pedri	5.031	100
Ilkay Gündogan	4.666	25
Rodrygo	3.941	80
Joshua Kimmich	3.878	80
Lionel Messi	3.754	50
Angel Di Maria	3.722	10
Bernardo Silva	3.650	80
Dusan Tadić	3.612	7
Rodrigo de Paul	3.322	35

Appendix: Market Value Correlation

Tested correlation of different metrics with market value
Aggregated xPV, Goals, Assists and expected Assists per player over the whole tournament
As in tournament mode better players usually play more games, they have more game time to aggregate values. Therefore, we tested correlation with market value for aggregated values and for normalized per 90 minutes played values

Correlation: Aggregated Values

	Market Value
xPV	0.47
Goals	0.31
Assists	0.25
xA	0.33

Correlation: Per 90 Values

	Market Value
xPV/90	0.47
Goals/90	0.25
Assists/90	0.17
xA/90	0.25

Appendix: Sensitivity of Correlation Analysis

In the correlation analysis of xPV with xA and Assists, samples where only considerd if the player played in the previous and current game at least 45 minutes
The plot shows the correlation for different threshold values (1-60 minutes)

(Jump back)

Overview: Take Wu and Swartz (2024) as baseline approach to obtain \(t_\text{ws}\), approximate \(t_\text{ws}\) with an efficient analytical model

Description of Wu and Swartz (2024) algorithm:

Constraints: maximum speed \(\sqrt{v_x^2 + v_y^2} \leq v_{\text{max}}\), maximum acceleration \(\sqrt{a_x^2 + a_y^2} \leq a_{\text{max}}\)
Two-phase model: player has constant acceleration (\(a_x\), \(a_y\)) unil he reaches \(\sqrt{v_x^2 + v_y^2} = v_\text{max}\), then continues to run to target location with \(v_{max}\)
Obtain solution: grid search over possible acceleration pairs (\(a_x\), \(a_y\)), taking minimum time \(t\) such that player reaches target location as \(t_\text{ws}\)

Approximation model:

Decompose time into two analytically solvable components (\(t_\text{simple}\), \(p\)) and include a scalar (\(\alpha\)) which can be learned to approximate \(t_\text{ws}\)
\(t_\text{approx} = t_\text{simple} + \alpha \times p\)

(Jump back)

Description:

We ignore the direction of velocity. Player starts with current total speed \(v_0\), and then has constant acceleration \(a\) until he reaches \(v_\text{max}\) and then runs with \(v_\text{max}\) to target location
Two cases: player reaches target location before maximum speed or player reaches maximum speed before target location

Calculation:

Calculate distance to target location: \(d_\text{target} = \sqrt{(x_1 - x_0)^2 + (y_1 - y_0)^2}\)
Check distance player runs while accelerating to maximum speed: \(d_\text{acc} = \frac{v_\text{max} + v_0}{2} \cdot t_\text{acc}\)

where \(v_0 = \sqrt{(v_{x,0}^2 + v_{y,0}^2)}\) is the current total speed and \(t_\text{acc} = (v_\text{max} - v_0)/a\) is the time it takes to reach maximum speed

Case 1: \(d_\text{acc} \geq d_\text{target}\): solve kinematic equation \(d_\text{target} = v_0 \cdot t + \frac{1}{2} \cdot a \cdot t\) using quadratic formula to obtain: \(t_\text{simple} = \frac{-v_0 \pm \sqrt{v_0^2 + 2 \cdot a \cdot d_\text{target}}}{a}\) (take positive root, as negative solutions in time nonsensical)
Case 2: \(d_\text{acc} < d_\text{target}\): calculate time it takes to cover remaining distance after acceleration (\(d_\text{rem} = d_\text{target} - d_\text{acc}\)) as \(t_\text{rem} = \frac{d_\text{rem}}{v_\text{max}}\) and add time of acceleration phase to obtain: \(t_\text{simple} = t_\text{rem} + t_\text{acc}\)

Description:

As \(t_\text{simple}\) ignores the direction of velocity, the penalty \(p\) is a term to account for this
It is calculated as the angular mismatch of a player, i.e. the shortest angular difference between the players current movement angle and the angle from the current location of the player to the target location; and then scaled by the current total speed
Intuitively: the faster a player runs into the “wrong” direction, the higher the penalty
We test two penalty variants: (i) the angular mismatch, and (ii) the cosine difference of the angular mismatch, which introduces non-linearity

Angular mismatch calculation: \[ \Delta \theta = \min(|\theta_\text{move} - \theta_\text{target}|, 2 \pi - |\theta_\text{move} - \theta_\text{target}|) \]

where \(\theta_\text{move} = \operatorname{atan2}(v_{0,y},v_{0,x})\) is the current movement angle and \(\theta_\text{target} = \operatorname{atan2}((y_1 - y_0) , (x_1 - x_0))\) the angle from the current location to the target location

Penalty calculation:

Version 1: Angular penalty: \(p_1 = \Delta \theta \cdot v_0\)
Version 2: Cosine penalty: \(p_2 = (1 - \cos(\Delta \theta)) \cdot v_0\)

Goal: learn \(\alpha\) such that \(t_\text{approx} = t_\text{simple} + \alpha \times p_i \approx t_\text{ws}\)

Generate training data:

1.000.000 random combinations of current location \((x_0, y_0)\) and target location \((x_1, y_1)\) within the pitch dimensions and current velocity \((v_{x,0}, v_{y,0})\)
Calculate for each sample:
- \(t_\text{ws}\): by grid search using Wu and Swartz (2024) algorithm as described
- \(t_\text{simple}, p_1, p_2\): analytically as described

Train model: linear regression without intercept (in case of no penalty, i.e. 0 velocity or no angular mismatch, \(t_\text{ws} \approx t_\text{simple}\), as running in straight line to target is then optimal)

Angular mismatch model: \(t_\text{ws} - t_\text{simple} = \beta_1 \cdot p_1 + \epsilon\)
Cosine difference model: \(t_\text{ws} - t_\text{simple} = \beta_2 \cdot p_2 + \epsilon\)

where \(\beta_1\) and \(\beta_2\) are coefficients estimated by minimizing the sum of squared residuals \(\sum_i^n \epsilon_i^2\)

Resulting coefficients: \(\beta_1 = 0.1328\), \(\beta_2 = 0.1986\)

Evaluation:

Generate 300.000 test data (same method as generating training data)
Calculate for each test sample \(t_\text{ws}\) and \(t_{\text{approx}_1} = t_\text{simple} + 0.1328 \cdot p_1\), \(t_{\text{approx}_2} = t_\text{simple} + 0.1986 \cdot p_2\)
Evaluate \(t_\text{ws} - t_{\text{approx}_1}\) (Angular) and \(t_\text{ws} - t_{\text{approx}_2}\) (Cosine) by using root mean squared error (RMSE), mean absolute error (MAE) and fraction of cases where absolute difference is below \(0.01\) (< 0.01) and \(0.05\) (< 0.05) seconds
Context: \(t_\text{ws}\) is on average 6.23 seconds for the test data

Evaluation table:

Metric	Angular	Cosine
RMSE	0.2480	0.1189
MAE	0.1906	0.0678
< 0.01	0.6183	0.7016
< 0.05	0.6875	0.9004

Conclusion:

Cosine difference penalty better than (raw) angular mismatch penalty
With the cosine difference penalty we can approximate \(t_\text{ws}\) very well in an efficient way

Appendix: Ball Time to Location

How long does it take to pass the ball from some location \((x_0, y_0)\) to a target location \((x_1, y_1)\)?

Approach
Learned Function

We model pass duration as a function of distance (Martens, Dick, and Brefeld 2021)
We use a data-driven approach by using observed passes and their duration and distance
To learn the function \(f(d) = t\), where d is the distance of a pass in meters and t the pass duration in seconds, we use cubic smoothing splines, i.e. we fit piecewise cubic polynomials that are joined at knots, with a penalty \(\lambda\) controlling the flexibility of the fit
The function \(f(d)\) minimizes

\[ \sum_i \big(t_{i} - f(d_i)\big)^2 + \lambda \int \big(f''(d)\big)^2 \mathrm{d}d, \]

where \(d_i\) is the distance in meters and \(t_i\) the pass duration of observed pass \(i\) for \(i = 1, \ldots, n\) and the penalty parameter \(\lambda\) controls the smoothness of the fit.

(Jump back)

Appendix: Pitch Control - Model Evaluation

Evaluation Metrics
Model Calibration

N = 62,309 samples, cross fitting, i.e. to obtain out-of-sample predictions for all samples we split the data into 5 equally sized folds, always train (and tune hyperparameters with 5-fold CV) a model on 4 and predict the 5th
Baseline 1: Naive; naive predictor always predicting empirical mean (0.8657)
Baseline 2: GLM; simple glm model using time to location of 3 closest players of both teams and ball as features
Model: XGBoost; using time to location of all players and ball as features

Metric	Naive	GLM	XGBoost
Accuracy	0.8657	0.9444	0.9493
Area under the curve	-	0.9730	0.9792
Logloss	0.3944	0.1405	0.1224
Brier score	0.1163	0.0410	0.0369

(Jump back)

Calibration plot simple GLM model

Calibration plot XGBoost model

Appendix: Pitch Value - Model Features

Goal: derive situational context
Features: location features + team shape + defensive structure

Location features
Team shape
Defensive structure

For target location \((x_1, y_1)\):

Distance to goal
Angle to goal (cosine and sine angle)
Support (calculated time to target location for 5 closest players in time from possession team)
Pressure (calculated time to target location for 5 closest players in time from defending team)

For start location \((x_0, y_0)\) + target location \((x_1, y_1)\):

Progressive distance (distance to goal from target location - distance to goal from start location)
“Packing”: defenders behind ball at start location and defenders behind ball at target location (behind ball = closer to defending goal in x-coordinate, both values evaluated using player positions at time of pass)

Exclude goalkeepers, individually look at both teams outfield players
Jarvi’s March: find hull vertices \(p_1 = (x_1, y_1), \ldots,\text{ } p_m = (x_m, y_m)\)
Area: \(A = \frac{1}{2} \sum_{i=1}^m (x_i y_{i+1} - x_{i+1} y_i)\)
Perimeter: \(P = \sum_{i=1}^m \|p_{i+1} - p_i\|_2 \quad\) where \(p_{m+1} = p_1\)
Features: Area and perimeter of both teams

Defending: usually 3 lines

Identify lines: k-means clustering on defenders x-coordinate with k = 3

\[ \min_{\{C_1,C_2,C_3\}} \; \sum_{k=1}^{3} \sum_{i \in C_k} (x_i - \mu_k)^2 \] where \(\mu_k\) is the mean x-coordinate of cluster \(k\)

Assign cluster with x-coordinate closest to goal line of defending team as “defensive” line, 2nd closest as “midfield” line and furthest as “forward” line
Features: number of of players in each line, height (center \(\mu_k\)) of each line and compactness (standard deviation within cluster) per line

(Jump back)

Appendix: Pitch Value - Model Evaluation

Evaluation Metrics
Model Calibration

N = 53,956 samples, cross fitting, i.e. to obtain out-of-sample predictions for all samples we split the data into 5 equally sized folds, always train (and tune hyperparameters with 5-fold CV) a model on 4 and predict the 5th
Baseline 1: Naive; naive predictor always predicting empirical mean (0.04)
Baseline 2: GLM; simple glm model using distance to goal as feature
Model: XGBoost; using all derived game context features

Metric	Naive	GLM	XGBoost
Accuracy	0.96	0.9609	0.9621
Area under the curve	-	0.8703	0.8887
Logloss	0.168	0.1250	0.1183
Brier Score	0.0384	0.0326	0.0312

(Jump back)

Calibration plot simple GLM model

Calibration plot XGBoost model