Unexpected Origins and the Fermi Paradox
Can foundational soccer ideas achieve escape velocity?
I recently finished Ian Graham's How to Win the Premier League, which recounts his time at Liverpool from an inside perspective. I found it very enjoyable, and you should pick it up.
My European football knowledge is severely limited, so a lot of the specific characters and transfer talk didn't really land with me. This is not a criticism – you need characters for a story. But what I found especially captivating was his retelling of the history of soccer analytics. It felt like I was reading an alternative history of a time period that I thought I knew quite well.
The book retcons the entire canon of provenance regarding soccer analytics. This includes the origin of foundational concepts such as Expected Goals and Possession Value, and probably others that I don't remember off the top of my head.
Personally, I credit Sam Green for the creation of Expected Goals via an OptaPro blog post in 2012, though I concede that the concept wasn’t new. And Sarah Rudd's Markov model from NESSIS 2011 seems to be the progenitor of modern possession value models like Expected Threat.
According to the book, Ian and the teams surrounding him independently invented these concepts well before being introduced to the public. And Ian suggests that his efforts, just like Sam’s, were performed without knowledge of the prior work such as by Richard Pollard and Charles Reep titled Effectiveness of Playing Strategies. I’ve lifted a figure from that research below, which should look remarkably familiar.
The Adjacent Possible
The phenomenon of multiple discovery is deeply fascinating. The most famous example is probably Calculus, which was independently formulated by both Newton and Leibniz in the 17th century1. There are plenty of other important historical examples, like evolution by natural selection and the light bulb – all similar in cultural magnitude to Expected Goals.
It seems like this occurs when the broader environmental, cultural, and technological conditions make an idea or discovery ripe for development. This is often attributed to the "zeitgeist," or spirit of the times, which fosters the necessary prerequisites for such innovation.
There is a concept coined by theoretical biologist Stuart Kauffman called The Adjacent Possible and it describes how innovation unfolds within the constraints of the current environment while expanding its boundaries. It refers to the set of possibilities that become accessible when a new innovation or discovery is made, opening doors to previously unattainable states.
The introduction of event data into the soccer analytics zeitgeist in the mid 2000's (and popularization in the early 2010’s) was the most obvious catalyst, which opened the door to many of these foundational models. Moneyball probably played a motivating factor as well, published in 2003. Though, many executives probably waited for the movie in 2011.
Escape Velocity
It is intriguing how few novel ideas have achieved escape velocity from the gravity of team environments. If teams are genuinely squirrelling away their analytics discoveries, I’d expect to have found a few more breadcrumbs as evidence!
There are some hints that have been exposed. One uncovered crouton is Liverpool's collaboration with DeepMind on set pieces, which I think is a pretty clear statement of intent regarding their strategic research direction. And of course there is all the public work that Javier Fernandez published while employed by FC Barcelona. But there aren't many additional examples to point to, and the exceptions prove the rule.
I want to briefly contrast this with professional baseball. Analytics is a considerably more established discipline and practice within MLB. Therefore, you see a lot more members of staff switching between organizations to fulfill various personal career motivations and aspirations. Stashed in their luggage are uncovered truths about the game of baseball and the state of the "adjacent possible", which causes a proliferation of ideas and best practices.
(There’s probably a missed opportunity to title this section Exit Velocity, as a cheeky nod to advanced baseball statistics, but I prefer the orbital mechanics metaphor.)
My perspective on sharing new research methods has always been a pretty liberal one. I believe that it is in the best interest of the top research teams to steer and cultivate the public ecosystem because they should be best positioned to take immediate advantage of novel innovations that percolate.
I'm unsure if other organizations or analytics leaders view it the same way that I do. I suspect that many embedded analysts fear arriving empty handed when they come up for air.
Extraterrestrial Soccer
I'd like to suggest a soccer analytics formulation of the Fermi paradox, which observes the lack of evidence for extraterrestrial life despite the overwhelming statistical likelihood.
If the growth of soccer analytics is truly intuitive, with its core principles being independently discovered time and again, why hasn’t more of this surfaced?
Here is a non-exhaustive list of plausible explanations.
Perhaps the conditions are no longer fertile for multiple discovery. The distinct lack of tracking data at scale in the public domain really concerns me.
Teams have managed to remain remarkably tight lipped. And it seems plausible that gambling companies would be particularly motivated to maintain an edge.
Not enough practitioners have bounced from club-to-club to incite a cascading cross-pollination event across a multitude of organizations.
Teams actually don't have much to share and the state of analytics inside of clubs isn't much more advanced than the public. Grace Robertson comedically suggested otherwise in a related blog post from 2022, which is definitely worth a read.
It's too expensive and there aren't many team environments that can support cutting edge research.
There actually is a lot of novel research being produced publicly and I'm just not reading it.
The truth probably includes a mixture of these possibilities, and probably a few others that you should leave in the comments.
How to Win the Premier League suggests an inevitably bright future for soccer analytics, but the reality is likely constrained by the gravitational forces of secrecy, siloed environments, and resource limitations.
If we want to see soccer analytics flourish more openly, it will likely require shifts on multiple fronts: better access to public tracking data, more resource and idea sharing between teams, and a dramatic cultural shift toward valuing open science over guarded advantage.
After writing but before publishing, I discovered that Mark Thompson also made this comparison to Calculus in a 2022 blog post, which you can read on his website here. Pretty meta, given the subject matter!