Poker-Playing Algorithm Can Beat Humans By Forgetting Things

To computers, there are basically two kinds of games in this world: games that we’ve figured out how to beat (perfect information games) and games we’re just beginning to solve (imperfect information games like poker).

Perfect information games, from a computer’s perspective, are fairly easily to master. In this type of game, all players know the complete state of the game at all times—which means that a sophisticated enough algorithm could theoretically calculate all (or most) possible subsequent moves. Both chess and the ancient Chinese game ‘Go’ fit into this category; last year, Google’s artificial intelligence software AlphaGo defeated the reigning world Go champion in a dramatic series of matches.

But the second type of game is a different beast. Imperfect information games leave algorithms guessing, since some information is hidden from each player. Artificial intelligence engineers have had a notoriously difficult time trying to crack this style of game. Now, two separate teams think they’ve come up with an answer. In January, scientists at Carnegie Mellon presented its human-beating poker algorithm, Libratus. And this week in Science, an unrelated group of researchers have announced DeepStack. DeepStack is similar to Libratus, but with a twist: DeepStack operates on a kind of persistent amnesia.

poker_1024x576
Poker is an example of an imperfect information game.

Basically, every time an individual plays a card, DeepStack forgets the past.

That’s because at any point during a poker game, assessment of the past is complex. If DeepStack’s decision were based on the past, it would have to rely on a probability distribution over private information—information that is only revealed through its opponent’s past actions, which are in turn influenced by their analysis of DeepStack’s past actions. In other words, if DeepStack’s decision were to take the past into account, it’d wind up in a convoluted mess.

Here’s John Timmer, writing for ArsTechnica:

To avoid getting stuck in an infinite recursion, DeepStack simply forgets the past. “Our goal is to avoid ever maintaining a strategy for the entire game,” its developers write. Instead, each time DeepStack needs to act, it performs a quick search to pick out a strategy based on the current state of the game. That search relies on two major simplifications.

The first is that it only considers a limited number of options. It can fold, call, go all-in, or make only two or three different bets. These limit the future states that have to be considered rather considerably—by about 140 orders of magnitude. It also doesn’t search forward to all possible positions. As a result, the computation of which action to take runs about five seconds on a single Nvidia GeForce GTX 1080.

DeepStack doesn’t rely on a supercomputer to update the system during breaks, which gives it an advantage over Libratus. Plus, the developers say their approach isn’t limited to poker. If both teams are able to swap out certain parts of their respective algorithms, the combined research could have practical applications—for example, in medical and defense decision-making.