By: Stephan Teodosescu
What a month of European football that was. A year after it was supposed to start we finally got what all soccer fans have been waiting for since the final whistle of the last World Cup — Euro 2020.
COVID-19 ravaged Italy last year and forced fans of the nation’s soccer team to wait an extra year to see it play in a major competition. The Azzurri embarrassingly didn’t qualify for 2018 World Cup in Russia, thus not seeing the Italian diving — err, style of play– in a “real” match since the 2016 Euros had to be jarring for those accustomed to success for their country on the pitch.
But a true away game against England at London’s Wembley Stadium in the final this year could not have provided a more perfect script. Down 1-0 after just two minutes in front of a hostile crowd, Italy clawed its way back with a second half equalizer before eventually winning on penalty kicks to cap off its storybook, riches-to-rags-to-riches ending. Italy, one of the most decorated footballing nations in the world that reached the nadir in the late 2010s, was once again champion.
Prior to the tournament prognosticators had Italy at about a nine percent chance to win the European Championships. Jan Van Haaren, former Chief Analytics Officer at the data-driven recruiting company SciSports and newly hired data scientist at Belgian champions Club Brugge, put a call out for public forecasts before the tournament started. He gathered predictions from around 20 different sources and made the results available in a Google doc for nerds like me to parse through the data.
Below is what the various forecasts looked like prior to the start of Euro 2020. This included Van Haaren’s team at the KU Leuven DTAI Sports Analytics Lab, as well as the likes of Luke Benz, StatsPerform (The Analyst) and Goldman Sachs among others.
Belgium was the ensemble’s favorite to win the tourney with an average win probability of 15 percent, followed by France and Spain. The finalists — England and Italy — were the fourth and fifth most likely teams with a probability of 11 and nine percent, respectively, to win the tournament. As we know, the Azzurri eventually prevailed against the Three Lions in dramatic fashion.
Belgium by far had the largest distribution of championship probability at the outset which ranged from a maximum of 28.9 percent chance to 3.0 percent to win it all. Interestingly, Van Haaren’s DTAI group (based in Belgium) was the one with the maximum forecast. Spain had the least amount of variance in its distribution among the serious contenders (9.3 to 14.1 percent), but Italy had the most compact interquartile range (the middle 50% of values when ordered from lowest to highest). England and Germany didn’t seem to have much consensus among the experts on whether they had a real shot or would disappoint expectations. You can argue both squads disappointed their countries’ lofty hopes.
I’ve written about Expected Goals (xG) quite a bit in this space in the past, especially in the context of hockey and soccer. Put simply xG is the probability that a shot will result in a goal based on the characteristics of that shot and the events leading up to it, according to FBref.com, and is meant to evaluate a team’s goal scoring chances. The model that FBref uses is provided by StatsBomb. Some of these characteristics include the location of shooter when the shot occurred, the body part the shot came off of, the type of pass and type of attack that led to the shot (think: did it follow a dribble, was it off a rebound, etc).
Italy tied Spain for the most actual goals in the tournament with 13 and were second to only the Spaniards in our xG metric with 10.9 Expected Goals For, according to StatsBomb data. On top of that they exhibited stifling defense conceding 5.9 xG throughout its seven matches, for an expected differential of five goals, which ranked third among all teams despite playing more games than the two countries ahead of it (Spain and Netherlands).
Meanwhile, Hungary, Slovakia and North Macedonia brought up the rear with an expected goal differential of more than -5 despite being knocked out after just three matches in the group stages. The models didn’t give these countries much of a chance anyways.
Italy’s performance put it in the “Fun” category in the charts below, reserved for squads with high expected goals scored and conceded. Keep in mind, the more matches a team plays the higher these numbers are expected to be since they accumulate over time; so it’s no surprise Italy allowed this many xG despite fielding one of the best defending teams; they played the maximum number of games (7). The top plot shows the nominal values for each squad’s Expected Goals For (xG) and Expected Goals Against (xGA), while the bottom plot shows their xG and xGA with the relation of how many matches each team played given by the size of the points.
Match of the Tournament
Americans (me included) tend to complain about soccer’s low scoring, but you wouldn’t know that was a thing if you watched the Croatia vs. Spain Round of 16 matchup. A total of eight goals were scored, two coming in the match’s final 10 minutes of normal time to tie the game at three for the Croatians before Spain pulled away in extra time 5-3.
As we saw above Spain had the largest expected goal differential (+9.7) in the tournament, which was helped by its signature possession style and several gaudy offensive performances like the one against Croatia. But La Furia Roja ran into the eventual champions in their semifinal match and were forced into a 1-1 draw — despite winning the expected goal battle — before the deciding penalty shootout, which the Italians won to advance to the final.
That game help lead the way in a tournament that was objectively fun to watch. Not only were matches hosted all over Europe adding to the intrigue, but the 2.78 goals per game ratio was the highest of any European Championship since 1980.
Luke Shaw’s second minute goal to open the championship final was absolutely stunning. He took a picture perfect, cross-pitch pass from wing back Kieran Trippier and mashed it home at the far post to give England a 1-0 lead that probably registered a mini earthquake in London.
The only issue? There were 88 minutes of normal time still left to play.
Research has shown teams that lead early in matches tend to concede more shots, take fewer shots of their own, and overall win matches less often than you’d expect them to relative to bookmaker’s odds. In other words, teams with leads exhibit a phenomenon that is familiar to us in our everyday lives: loss aversion. Humans are naturally predisposed to keeping the status quo rather than taking a gamble that might pay off, not unlike how animals treat threats (the other team trying to score goals) as more urgent than opportunities (your team trying to score goals), since it gives them a better chance of surviving in the wild.
And these trends were on display for Gareth Southgate’s team in the final. It felt like the soccer equivalent of prevent defense — a scheme that some say prevents you from winning rather than preventing opponent scoring. Southgate sat back and let his defense try to close out his country’s first major tournament trophy since the 1960s instead of opting for a more attacking lineup and scheme. The result was a 1-1 draw in regulation time before losing in heartbreaking fashion on penalty kicks where two 120th-minute subs brought on purely for their scoring prowess missed their spot kicks (the xG scoreboard was much more in favor of the Italians, 2.1-0.4). Italy led England 6-1 in shot attempts on target, 19-6 in total shots and had the ball for 66 percent of the match vs. England’s 34 percent.
With the victory Italy has now won 34 straight matches. In its previous 33 games coming into the final the Azzurri trailed for a total of 44 minutes; they trailed for 65 in the final against England before Leonardo Bonucci evened the scoreline in the 67th minute.
Italy had a 8.8 percent chance to win the tournament, as we mentioned before, before a ball was kicked. The road to get there was even more brutal than one could have expected as it ended up going through world No. 1 Belgium, Spain and England at Wembley. Now the Italians are champions of Europe once again.
Talk about a Renaissance story.
[Update 8/1/21] Tony ElHabr had a Twitter thread a couple weeks ago that looked at how each of the pre-tournament forecasts mentioned above performed. His Sports Viz GitHub repo is worth checking out too if you’re into that kind of thing.
The code for the plots above can be found on GitHub.
The header photo was taken at a summer 2016 Steaua Bucuresti-Dinamo Bucuresti match (known as the Eternal Derby) at Arena Naţională in Bucharest, Romania. The stadium played host to several group stage and Round of 16 matches in Euro 2020.