What do you think? Leave a respectful comment.

Portugal's Cristiano Ronaldo gestures during a game at the 2018 World Cup. Photo by REUTERS/Hannah McKay

To beat Vegas bookies at the World Cup, these statisticians turned to artificial intelligence

When it comes to sports betting, most people lose. But during the 2014 World Cup, a team of statisticians beat the bookmakers. They correctly predicted Germany — their home country and 6-to-1 underdogs — as the final champions and raked in a 30 percent return with bets placed on regular matches.

This year, the team, led by Andreas Groll of Germany’s Technical University of Dortmund, is back with an artificial intelligence program with even better odds. As of Friday morning, it had correctly predicted 15 of 24 games — the winners, the losers and those who tied.

As for the ultimate victor, their new model has picked Spain, but only if Germany falters before the final.

How soccer predictions get made

This quest for sports domination began with Groll’s wife and fellow statistician, Jasmin Abedieh. For a class assignment, she was asked to analyze a 20-year-old study on predicting the World Cup. She wondered if she and her husband could do better.

All soccer predictions come down to the same thing: Estimating the number of goals a team typically scores in a match.

Once these “expected goals” are calculated, analysts run thousands of simulations of the tournament.

This allows analysts to factor in both the skill levels of the teams and the opponents they need to face in order to advance, said Gunther Schauberger, a statistician at the Technical University of Munich, who teamed with Groll to predict the 2014 World Cup.

Mexico's Edson Alvarez celebrates after an unexpected victory over Germany in the group stage. REUTERS/Carl Recine

Mexico’s Edson Alvarez celebrates after an unexpected victory over Germany in the group stage. Photo by REUTERS/Carl Recine

For example, if Germany and Spain — two of the best teams — play each other in the knockout round, it would significantly reduce their odds of winning the whole competition. But that’s a rare event — it might only happen in one of 10,000 simulations.

By looking at how many times a team is successful across all these simulations, researchers and bookmakers can predict the teams with the best chance to win.

An element of chance is also introduced into the simulations. In other words, the teams with the highest expected number of goals don’t win every time — they are just more likely to do so.

Beating Vegas

From Vegas sportsbooks to corporate banks that use the World Cup to test economic theory, the big difference between prediction models is how the number of expected goals is determined.

That’s what Groll’s team aims to improve.

For the 2018 World Cup, Groll and Schauberger recruited two applied mathematicians from Ghent University, who were already developing a way to more accurately rank teams.

They created an “ability score” — a tally of goals scored by each national team over the last eight years. The games were weighted by importance — games that occured a long time ago or friendly matches, for example, are considered less important.

The International Federation of Association Football (FIFA), which organizes the World Cup, creates its own rankings, but they’re not highly regarded. (“Not even FIFA likes FIFA rankings,” Schauberger said.) FIFA rankings focus on clusters of matches — for example, across a whole year — rather than smoothly tracking a team’s progress. FIFA said it plans to replace its ranking system after this World Cup.

Groll’s team entered its ability scores into a machine learning algorithm called a “random forest.” For the last World Cup, they had used a traditional statistical model without machine learning.

The random forest algorithm makes decisions based on team characteristics from previous World Cup tournaments. It considers a lineup of 20 characteristics, including the ability scores, the GDP of the team’s country, the average age of a team’s players, population, FIFA rankings and bookmaker’s odds. The ability scores created by the researchers ultimately appeared to be the most important indicator of success.

Most likely teams to advance from the group stage, based on the "random forest" prediction model. Chart by Groll et al., arXiv, 2018.

Most likely teams to advance from the group stage, based on the “random forest” prediction model. Chart by Groll et al., arXiv, 2018.

The algorithm’s picks are then organized into “decision trees” — one-way flow charts that separate high-scoring teams from low-scoring teams. Each choice creates a new branch on a tree.

Instead of making a single decision tree, their algorithm generates thousands — hence a random forest.

“You shake up the data a little bit so that you get different trees each time you ‘grow them’,” said Trevor Hastie, a statistician at Stanford University.

Here’s the kicker: the machine-learning algorithm is trained to make decisions on its own. The program recognizes patterns between the results of previous World Cup games and team characteristics.

For this year’s tournament, the research team grew 5,000 trees and ran 100,000 simulations. Spain came out as the winner the most, but Germany was only half a percent behind.

However, when the researchers looked at the most probable course of the tournament, rather than just the overall winner, Germany came out on top.

Most probable course for the knockout stage, based on 100,000 simulations conducted by Groll's team. Chart by Groll et al., arXiv, 2018.

Most probable course for the knockout stage, based on 100,000 simulations conducted by Groll’s team. Chart by Groll et al., arXiv, 2018.

Why it matters

“This method is probably going to be amongst the best. It’s really hard to say until somebody else comes along with something better,” Hastie said.

After the first round of matches, half of the World Cup groups are doing what the researchers predicted. Unexpected wins from Iran, Serbia, and Mexico may upend their predictions of who comes out of the group stage. But wins by Japan and Senegal have totally upset their group’s outcome.

But researchers don’t know exactly how the random forest makes its decisions. The machine-learning program gets set upon a huge quantity of data — more than a human could comprehend.

Both Groll and Hastie said it’s “a black box” as to exactly how this AI system makes decisions. Plus, even with vast computational power, no one — and no machine — can ever predict the winner for any match with 100 percent certainty.