Expected Pass Value (xPV)

A Holistic Framework for Evaluating Passing Situations in Soccer

Tobias Harringer | Robert Bajons

May 5, 2026

Motivation

An example: A pass from Enzo Fernandez

Your browser does not support the video tag.

Overview

  • Goal
    • Value passes less outcome dependent
    • Gain deeper insights into passing situations

Overview

  • Goal
    • Value passes less outcome dependent
    • Gain deeper insights into passing situations
  • Framework
    • Pitch Control: likelihood of controlling a location
    • Pitch Value: value of possession at a location
    • Combine models to obtain expected pass value (xPV)

Overview

  • Goal
    • Value passes less outcome dependent
    • Gain deeper insights into passing situations
  • Framework
    • Pitch Control: likelihood of controlling a location
    • Pitch Value: value of possession at a location
    • Combine models to obtain expected pass value (xPV)
  • Insights
    • Value real passes
    • Value hypothetical passes

Data

  • Source:

  • 2022 World Cup dataset: Event + Tracking data

Data

  • Source:

  • 2022 World Cup dataset: Event + Tracking data

    • Event data: passes, shots, dribbles, …

    • Tracking data: \((x,y)\)-coordinates of players and ball

Data

  • Source:

  • 2022 World Cup dataset: Event + Tracking data

    • Event data: passes, shots, dribbles, …

    • Tracking data: \((x,y)\)-coordinates of players and ball

Tracking data capture:

  • Where are players positioned?

Data

  • Source:

  • 2022 World Cup dataset: Event + Tracking data

    • Event data: passes, shots, dribbles, …

    • Tracking data: \((x,y)\)-coordinates of players and ball

Tracking data capture:

  • Where are players positioned?

  • Where are players heading to?

Approximate using centered differences:

\(\hat{v}_x(t) = \frac{x_{t+\Delta} - x_{t-\Delta}}{2\Delta}\)

\(\hat{v}_y(t) = \frac{y_{t+\Delta} - y_{t-\Delta}}{2\Delta}\)

Theoretical Framework

  • Goal: Value passes in expectation
  • Setup:

Define binary variables:

\[\begin{aligned} C = \begin{cases} 1, & \text{if chance for own team} \\ 0, & \text{otherwise} \end{cases}, \quad S = \begin{cases} 1, & \text{if pass successful} \\ 0, & \text{otherwise} \end{cases} \end{aligned}\]

Expected Pass Value xPV:

\[ xPV = \mathbb{E}(C \mid P, I) \]

where:

  • \(P\): pass from \((x_0, y_0)\) to \((x_1, y_1)\)
  • \(I\): all available information at the time of the pass, i.e. tracking information

By law of total expectation:

\[ xPV = \mathbb{E}(C \mid P, I, S = 1) \mathbb{P}(S = 1 \mid P, I) + \mathbb{E}(C \mid P, I, S = 0) \mathbb{P}(S = 0 \mid P, I) \]

Theoretical Framework

  • Goal: Value passes in expectation
  • Setup:

Define binary variables:

\[\begin{aligned} C = \begin{cases} 1, & \text{if chance for own team} \\ 0, & \text{otherwise} \end{cases}, \quad S = \begin{cases} 1, & \text{if pass successful} \\ 0, & \text{otherwise} \end{cases} \end{aligned}\]

Expected Pass Value xPV:

\[ xPV = \mathbb{E}(C \mid P, I) \]

where:

  • \(P\): pass from \((x_0, y_0)\) to \((x_1, y_1)\)
  • \(I\): all available information at the time of the pass, i.e. tracking information

By law of total expectation: \[ xPV = \underbrace{\mathbb{E}(C \mid P, I, S = 1)}_{\text{Pitch Value}}\underbrace{\mathbb{P}(S = 1 \mid P, I)}_{\text{Pitch Control}} + \underbrace{\mathbb{E}(C \mid P, I, S = 0)}_{\approx \text{ 0}} \underbrace{\mathbb{P}(S = 0 \mid P, I)}_{\text{Pitch Control}} \]

Pitch Control

  • Definition: Pitch control, at any location \((x,y)\), is the probability that the team currently in possession keeps possession if the ball were passed to that location
  • Intuition
  • Modelling Approach
  • Simple model: closest distance → control

  • Velocity?
  • Ball?
  • Uncertainty?
  • Pitch control based on time it takes for players and the ball to reach a location (Wu and Swartz 2024; Spearman 2018)
  • Data-driven: build a supervised model
    • Response: pass outcome (success = 1, fail = 0)
    • Inputs: estimated arrival times
  • Train model with gradient-boosted trees (XGBoost) on observed passes
  • Compute time to location to any hypothetical location → estimate pitch control

Pitch Value

  • Goal: estimate \(\mathbb{E}(C \mid P, I, S = 1)\)
  • What is a chance?
    • Ideally: Goals
    • Proxy: Shots, idea based on Power et al. (2017)
  • Define: \(C = \begin{cases} 1, & \text{if shot for own team within 10 seconds} \\ 0, & \text{otherwise} \end{cases}\)
  • Derive for each successful pass
    • Label \(C\)
    • Features capturing game context from \(P\) and \(I\) (more details)
  • Supervised learning: train XGBoost model to estimate \(C\)
  • Apply model to any hypothetical location

Results - Model Evaluation


Calibration Pitch Control Model

(More details)

Calibration Pitch Value Model

(More details)

Results - Revisiting Motivation

An example: Evaluating a pass from Enzo Fernandez

  • Real situation
  • Pitch control
  • Pitch value
  • xPV
Your browser does not support the video tag.

Who controls a location? Pitch Control of pass: \(0.74\)

How valuable is a location? Pitch Value of pass: \(0.59\)

How valuable is a pass? xPV of pass: \(0.44\)

Results - Predictive Power

  • How well can we predict future passing performance?

  • Aggregated metrics (xPV, expected Assists (xA), Assists) per player and game

  • Correlation of lagged values (game \(i\!-\!1\)) and current values (game \(i\))

Values represent pearson correlation, brackets show bootstrapped 95% confidence intervals. N = 878 (at least 45 min, sensitivity)



  • Results suggest that xPV is a stronger predictor of future passing performance than existing metrics

Results - Holistic Analysis

  • Beyond evaluating the pass: full xPV surface enables holistic analysis
  • Decision making
  • Defender positioning
  • Space creation




  • Was it the right pass?




  • Alternative positioning?




  • How much space does an attacker create?

Missing piece: Which space is truly relevant?

Conclusion

  • Developed Pitch Control and Pitch Value model
  • Combined models to obtain xPV
  • xPV shows promising results to value real passes
  • We can value any hypothetical pass with xPV → allows for holistic analysis of passing situations
  • Extension and possible future work:
    • Which passes are likely and feasible?
    • Include risk




Thank you for listening!

Questions?

References

Martens, Florian, Uwe Dick, and Ulf Brefeld. 2021. “Space and Control in Soccer.” Frontiers in Sports and Active Living 3 (July). https://doi.org/10.3389/fspor.2021.676179.
Moura, Felipe Arruda, Luiz Eduardo Barreto Martins, Ricardo De Oliveira Anido, Ricardo Machado Leite De Barros, and Sergio Augusto Cunha. 2012. “Quantitative Analysis of Brazilian Football Players’ Organisation on the Pitch.” Sports Biomechanics 11 (1): 85–96. https://doi.org/10.1080/14763141.2011.637123.
Power, Paul, Hector Ruiz, Xinyu Wei, and Patrick Lucey. 2017. “Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17. ACM. https://doi.org/10.1145/3097983.3098051.
Spearman, William. 2018. “Beyond Expected Goals.” In. MIT Sloan Sports Analytics Conference.
Wu, Lucas, and Tim B. Swartz. 2024. “A New Metric for Pitch Control Based on an Intuitive Motion Model.” Computational Statistics, June. https://doi.org/10.1007/s00180-024-01512-2.

Appendix: Top Players World Cup 2022

  • Who were the best players in terms of xPV in the World Cup 2022?
  • Total xPV across tournament

Player

Total xPV

MV in million €

Lionel Messi

28.780

50

Rodrigo de Paul

22.109

35

Mateo Kovacic

20.460

40

Pedri

19.900

100

Jamal Musiala

17.143

100

Kylian Mbappé

17.066

160

Luka Modric

16.757

10

Antoine Griezmann

15.939

25

Ousmane Dembele

14.628

60

Bernardo Silva

14.439

80

  • xPV per 90 Minutes

Player

xPV/90

MV in million €

Jamal Musiala

5.980

100

Pedri

5.031

100

Ilkay Gündogan

4.666

25

Rodrygo

3.941

80

Joshua Kimmich

3.878

80

Lionel Messi

3.754

50

Angel Di Maria

3.722

10

Bernardo Silva

3.650

80

Dusan Tadić

3.612

7

Rodrigo de Paul

3.322

35

Appendix: Market Value Correlation

  • Tested correlation of different metrics with market value
  • Aggregated xPV, Goals, Assists and expected Assists per player over the whole tournament
  • As in tournament mode better players usually play more games, they have more game time to aggregate values. Therefore, we tested correlation with market value for aggregated values and for normalized per 90 minutes played values

Correlation: Aggregated Values

Market Value
xPV 0.47
Goals 0.31
Assists 0.25
xA 0.33

Correlation: Per 90 Values

Market Value
xPV/90 0.47
Goals/90 0.25
Assists/90 0.17
xA/90 0.25

Appendix: Sensitivity of Correlation Analysis

  • In the correlation analysis of xPV with xA and Assists, samples where only considerd if the player played in the previous and current game at least 45 minutes
  • The plot shows the correlation for different threshold values (1-60 minutes)

(Jump back)

Player Time to Location

How long does it take for a player to reach a target location \((x_1, y_1)\) given their current location \((x_0, y_0)\) and current velocity \((v_{x,0}, v_{y,0})\)?

  • Problem
  • Approach
  • Formally: use equation of motion and minimize time to reach \((x_1, y_1)\)
  • Constraints: maximum speed and acceleration \(\Rightarrow\) no closed-form solution

Approach 1

  • Numerical optimization

\(\Rightarrow\) Realistic, but slow

Approach 2

  • Simple analytical model

\(\Rightarrow\) Fast, but unrealistic

How can we get a realistic and fast model?

  • Decomposed approximation model: \(t_\text{approx} = t_\text{simple} + \alpha \times p\)
    • \(t_\text{simple}\): simple time based on distance and current speed
    • \(p\): penalty term based on angular mismatch, scaled by current speed
    • \(\alpha\): learn scalar from data

Example: Angular mismatch

Appendix: Player Time to Location

  • Approach
  • \(t_\text{simple}\)
  • p
  • Model training
  • Results
  • Overview: Take Wu and Swartz (2024) as baseline approach to obtain \(t_\text{ws}\), approximate \(t_\text{ws}\) with an efficient analytical model

Description of Wu and Swartz (2024) algorithm:

  • Constraints: maximum speed \(\sqrt{v_x^2 + v_y^2} \leq v_{\text{max}}\), maximum acceleration \(\sqrt{a_x^2 + a_y^2} \leq a_{\text{max}}\)
  • Two-phase model: player has constant acceleration (\(a_x\), \(a_y\)) unil he reaches \(\sqrt{v_x^2 + v_y^2} = v_\text{max}\), then continues to run to target location with \(v_{max}\)
  • Obtain solution: grid search over possible acceleration pairs (\(a_x\), \(a_y\)), taking minimum time \(t\) such that player reaches target location as \(t_\text{ws}\)

Approximation model:

  • Decompose time into two analytically solvable components (\(t_\text{simple}\), \(p\)) and include a scalar (\(\alpha\)) which can be learned to approximate \(t_\text{ws}\)
  • \(t_\text{approx} = t_\text{simple} + \alpha \times p\)


Description:

  • We ignore the direction of velocity. Player starts with current total speed \(v_0\), and then has constant acceleration \(a\) until he reaches \(v_\text{max}\) and then runs with \(v_\text{max}\) to target location

  • Two cases: player reaches target location before maximum speed or player reaches maximum speed before target location

Calculation:

  • Calculate distance to target location: \(d_\text{target} = \sqrt{(x_1 - x_0)^2 + (y_1 - y_0)^2}\)

  • Check distance player runs while accelerating to maximum speed: \(d_\text{acc} = \frac{v_\text{max} + v_0}{2} \cdot t_\text{acc}\)

where \(v_0 = \sqrt{(v_{x,0}^2 + v_{y,0}^2)}\) is the current total speed and \(t_\text{acc} = (v_\text{max} - v_0)/a\) is the time it takes to reach maximum speed

  • Case 1: \(d_\text{acc} \geq d_\text{target}\): solve kinematic equation \(d_\text{target} = v_0 \cdot t + \frac{1}{2} \cdot a \cdot t\) using quadratic formula to obtain: \(t_\text{simple} = \frac{-v_0 \pm \sqrt{v_0^2 + 2 \cdot a \cdot d_\text{target}}}{a}\) (take positive root, as negative solutions in time nonsensical)

  • Case 2: \(d_\text{acc} < d_\text{target}\): calculate time it takes to cover remaining distance after acceleration (\(d_\text{rem} = d_\text{target} - d_\text{acc}\)) as \(t_\text{rem} = \frac{d_\text{rem}}{v_\text{max}}\) and add time of acceleration phase to obtain: \(t_\text{simple} = t_\text{rem} + t_\text{acc}\)

Description:

  • As \(t_\text{simple}\) ignores the direction of velocity, the penalty \(p\) is a term to account for this

  • It is calculated as the angular mismatch of a player, i.e. the shortest angular difference between the players current movement angle and the angle from the current location of the player to the target location; and then scaled by the current total speed

  • Intuitively: the faster a player runs into the “wrong” direction, the higher the penalty

  • We test two penalty variants: (i) the angular mismatch, and (ii) the cosine difference of the angular mismatch, which introduces non-linearity

Angular mismatch calculation: \[ \Delta \theta = \min(|\theta_\text{move} - \theta_\text{target}|, 2 \pi - |\theta_\text{move} - \theta_\text{target}|) \]

where \(\theta_\text{move} = \operatorname{atan2}(v_{0,y},v_{0,x})\) is the current movement angle and \(\theta_\text{target} = \operatorname{atan2}((y_1 - y_0) , (x_1 - x_0))\) the angle from the current location to the target location

Penalty calculation:

  • Version 1: Angular penalty: \(p_1 = \Delta \theta \cdot v_0\)

  • Version 2: Cosine penalty: \(p_2 = (1 - \cos(\Delta \theta)) \cdot v_0\)

Goal: learn \(\alpha\) such that \(t_\text{approx} = t_\text{simple} + \alpha \times p_i \approx t_\text{ws}\)

Generate training data:

  • 1.000.000 random combinations of current location \((x_0, y_0)\) and target location \((x_1, y_1)\) within the pitch dimensions and current velocity \((v_{x,0}, v_{y,0})\)
  • Calculate for each sample:
    • \(t_\text{ws}\): by grid search using Wu and Swartz (2024) algorithm as described
    • \(t_\text{simple}, p_1, p_2\): analytically as described

Train model: linear regression without intercept (in case of no penalty, i.e. 0 velocity or no angular mismatch, \(t_\text{ws} \approx t_\text{simple}\), as running in straight line to target is then optimal)

  • Angular mismatch model: \(t_\text{ws} - t_\text{simple} = \beta_1 \cdot p_1 + \epsilon\)

  • Cosine difference model: \(t_\text{ws} - t_\text{simple} = \beta_2 \cdot p_2 + \epsilon\)

where \(\beta_1\) and \(\beta_2\) are coefficients estimated by minimizing the sum of squared residuals \(\sum_i^n \epsilon_i^2\)

  • Resulting coefficients: \(\beta_1 = 0.1328\), \(\beta_2 = 0.1986\)

Evaluation:

  • Generate 300.000 test data (same method as generating training data)

  • Calculate for each test sample \(t_\text{ws}\) and \(t_{\text{approx}_1} = t_\text{simple} + 0.1328 \cdot p_1\), \(t_{\text{approx}_2} = t_\text{simple} + 0.1986 \cdot p_2\)

  • Evaluate \(t_\text{ws} - t_{\text{approx}_1}\) (Angular) and \(t_\text{ws} - t_{\text{approx}_2}\) (Cosine) by using root mean squared error (RMSE), mean absolute error (MAE) and fraction of cases where absolute difference is below \(0.01\) (< 0.01) and \(0.05\) (< 0.05) seconds

  • Context: \(t_\text{ws}\) is on average 6.23 seconds for the test data

Evaluation table:

Metric Angular Cosine
RMSE 0.2480 0.1189
MAE 0.1906 0.0678
< 0.01 0.6183 0.7016
< 0.05 0.6875 0.9004

Conclusion:

  • Cosine difference penalty better than (raw) angular mismatch penalty
  • With the cosine difference penalty we can approximate \(t_\text{ws}\) very well in an efficient way

Appendix: Ball Time to Location

  • How long does it take to pass the ball from some location \((x_0, y_0)\) to a target location \((x_1, y_1)\)?
  • Approach
  • Learned Function
  • We model pass duration as a function of distance (Martens, Dick, and Brefeld 2021)

  • We use a data-driven approach by using observed passes and their duration and distance

  • To learn the function \(f(d) = t\), where d is the distance of a pass in meters and t the pass duration in seconds, we use cubic smoothing splines, i.e. we fit piecewise cubic polynomials that are joined at knots, with a penalty \(\lambda\) controlling the flexibility of the fit

  • The function \(f(d)\) minimizes

\[ \sum_i \big(t_{i} - f(d_i)\big)^2 + \lambda \int \big(f''(d)\big)^2 \mathrm{d}d, \]

where \(d_i\) is the distance in meters and \(t_i\) the pass duration of observed pass \(i\) for \(i = 1, \ldots, n\) and the penalty parameter \(\lambda\) controls the smoothness of the fit.

Appendix: Pitch Control - Model Evaluation

  • Evaluation Metrics
  • Model Calibration
  • N = 62,309 samples, 5-fold CV, i.e. to obtain out-of-sample predictions for all samples we split the data into 5 equally sized folds, always train a model on 4 and predict the 5th
  • Baseline 1: Naive; naive predictor always predicting empirical mean (0.8657)
  • Baseline 2: GLM; simple glm model using time to location of 3 closest players of both teams and ball as features
  • Model: XGBoost; using time to location of all players and ball as features



Metric Naive GLM XGBoost
Accuracy 0.8657 0.9444 0.9493
Area under the curve - 0.9730 0.9792
Logloss 0.3944 0.1405 0.1224
Brier score 0.1163 0.0410 0.0369



(Jump back)

  • Calibration plot simple GLM model

  • Calibration plot XGBoost model

Appendix: Pitch Value - Features

  • Goal: derive game context
  • Spatial features
  • Team context
  • Team shape
  • Defensive structure
  • Euclidean distance to goal center from pass start and target location
  • Pass progressive distance: x-distance to goal after pass minus x-distance to goal before pass
  • Euclidean distance of pass start to target location
  • Cosine and sine of angle from target location to goalcenter


(Jump back)

  • Support: estimated player time-to-location of 5 earliest arriving players from team in possession
  • Pressure: estimated player time-to-location of 5 earliest arriving players from team defending
  • Average x-coordinate of team in possession/defending
  • Average velocity in x-direction of team in possession/defending
  • Number of defenders/attackers behind pass start and target location in x-coordinate

Convex hull: team shape (Moura et al. 2012)

  • Jarvi’s March: find hull vertices \(p_1 = (x_1, y_1), \ldots,\text{ } p_m = (x_m, y_m)\)

  • Area: \(A = \frac{1}{2} \sum_{i=1}^m (x_i y_{i+1} - x_{i+1} y_i)\)

  • Perimeter: \(P = \sum_{i=1}^m \|p_{i+1} - p_i\|_2 \quad\) where \(p_{m+1} = p_1\)

  • Note: per team, all outfield players (excl. goalkeeper). Visualization shows example for defending team

Defending: usually 3 lines

  • Identify lines: k-means clustering on defenders x-coordinate, k = 3

\[ \min_{\{C_1,C_2,C_3\}} \; \sum_{k=1}^{3} \sum_{i \in C_k} (x_i - \mu_k)^2 \] where \(\mu_k\) is the mean x-coordinate of cluster \(k\)

  • Features: Formation, height, compactness
  • Note: excluding goalkeeper

Appendix: Pitch Value - Model Evaluation

  • Evaluation Metrics
  • Model Calibration
  • N = 53,956 samples, 5-fold CV, i.e. to obtain out-of-sample predictions for all samples we split the data into 5 equally sized folds, always train a model on 4 and predict the 5th
  • Baseline 1: Naive; naive predictor always predicting empirical mean (0.04)
  • Baseline 2: GLM; simple glm model using distance to goal as feature
  • Model: XGBoost; using all derived game context features



Metric Naive GLM XGBoost
Accuracy 0.96 0.9609 0.9621
Area under the curve - 0.8703 0.8887
Logloss 0.168 0.1250 0.1183
Brier Score 0.0384 0.0326 0.0312



(Jump back)

  • Calibration plot simple GLM model

  • Calibration plot XGBoost model

Appendix: xPV with Shots vs. xG

  • Instead of defining reward variable \(C\) as binary (1, if shot happens within 10 seconds, 0 else) we could replace the 1 with the expected goals (xG) value of the shot
  • Advantage: Shots from worse positions not rated equal to shots from good positions
  • Change in xPV surface:

Original xPV surface

xPV surface with xG model for training

  • Similar surface, but all values for xG model a lot smaller (mean xG of shots is only 0.1, while before all shots where encoded as 1)