Substituting Similarity

Summary statistics are problematic for comparing players

Nov 18, 2024

Over the last few years, I’ve done a lot of thinking around the notion of similarity in sport and I’ve come to believe this is the most important problem in soccer analytics. Whether you’re scouting players, analyzing opponents, or designing a recruitment strategy, you’re repeatedly grappling with the same fundamental question: Is Player A similar to Player B? This might seem straightforward, but as with many things in analytics, the simplicity is deceptive.

Take recruitment, for example. If you’re replacing a player, your immediate question is likely: Who can replicate their role? For central positions where output metrics are less obvious, you’re less focused on raw production and more concerned with understanding how a player might play in a specific tactical context. This isn’t an easy question to answer, and current approaches often fall short.

Here’s why.

Limits of Summary Statistics

In his book Thinking, Fast and Slow, Daniel Kahneman introduced (to me, at least) the concept of substitution bias, where people unconsciously replace a difficult problem with an easier one. This happens all the time in sports analysis.

The hard problem is understanding player similarity in terms of their decision-making tendencies—how they think and act in specific situations. But because that’s challenging to measure, we often substitute it with an easier question: How similar are their statistics?

This substitution creates significant and cascading issues. Advanced process-oriented metrics like expected goals, progressive passes, or on-ball value are helpful, but they’re still outputs shaped by a host of external factors:

Team Context: A player’s statistics are influenced by their team’s style, coaching preferences, and tactical configuration.
Game State: Variance introduced by scorelines, opposition tactics, and randomness adds noise.
Ambiguous Causality: Summary metrics are often the result of team interactions, making it difficult to isolate the player’s individual contribution.

While comparing summary vectors of advanced stats can sometimes approximate similarity, it’s fundamentally a substituted problem—and it often leads to misleading conclusions.

Stimulus and Response

Instead of focusing on outputs, I think we should frame this as a stimulus-response problem: when a player is presented with a specific situation, how do they respond? And are there patterns or tendencies which can be uncovered in how individual players react to certain categories of situations?

Imagine a scenario where a midfielder has the ball just outside the penalty area:

Do they attempt a safe pass back to the center-back?
Do they risk a through-ball to split the opposing back line?
Do they switch play with a long cross-field pass?

Each decision tells us something about the player’s tendencies. But here’s the catch: we’ve only recently gained the tools to study this. In the event data era, we were not privy to the full decision space because we lacked positional context of what other options were available. Tracking data, however, is changing the game.

Player Embeddings

With tracking data, we can start building player embeddings—neural network-generated representations of a player’s decision-making tendencies. These embeddings allow us to simulate how a player might act in hypothetical scenarios.

For example, let’s say Player A and Player B are central midfielders. We can use their embeddings to compare how they respond to similar situations—like breaking defensive lines or retaining possession under pressure. This lets us evaluate their decision-making tendencies more directly, side-stepping some of the biases baked into summary stats.

Below are some example surfaces that were generated for my 2021 talk at NESSIS titled Player Masks: Encoding Soccer Decision-Making Tendencies, which demonstrates this concept using Statsbomb 360 data.

These surfaces show the probability distribution of pass destinations for three different situations for a generic (i.e. average) player embedding.

This surface demonstrates the difference between likely passing destination surfaces generated by two different player embeddings.

Strengths, Limitations, and Future Directions

This approach has clear advantages:

Contextual Comparison: Simulated scenarios reduce the noise of team and game-state effects.
Decision Focus: Embeddings capture how players think, not just what they produce.

But it’s not perfect. Even the best embeddings will still reflect some influence of team style, coaching, and the opportunities presented by specific systems. And while embeddings offer a more nuanced view, they’re not always easy to interpret, which can make them tricky to communicate to non-technical audiences.

I imagine that it might be a bit difficult to persuade a scouting executive on decision making tendencies calculated on synthetic data that you can’t tie directly back to video!

Still, as tracking data becomes more widespread, these methods will only improve. We’re not quite at the point where player similarity is a solved problem, but we’re getting closer. Additionally, we haven’t spoken anything about player physical capacity. A player might have elite decision making, but it doesn’t matter if they can’t get themselves where they need to go.

Validation?

It is not obvious how you might validate a model like this. But in theory, you would expect that a player would be assigned approximately the same embedding through an identical training process across both a training and a validation data set.

Practically, you will want to selectively freeze certain portions of the model after training to generate embeddings for future players who were not part of the initial training data set. And you will probably need to retrain something like this on a regular basis to account for drift.

Of course, you would see some player-level variation season-to-season, or perhaps when playing for different teams, but hopefully there should be some signal that is retained across all of these iterations.

My toy experiments with this concept almost certainly don’t have enough data to truly construct robust embeddings and are probably just over-fitting. But, the output surfaces look believable and I think the approach is conceptually strong.

So, What’s Next?

For now, the challenge is twofold: refining these models and integrating them into practical workflows. As a field, we need to consciously resist substitution bias and think deeply about what "similarity" really means and how it applies to the questions we’re trying to answer.

Ultimately, it’s about understanding the game on a deeper level—breaking down the decisions players make and the factors that influence them. If we get this right, the implications go far beyond player recruitment.