A team of computer scientists has developed a program that, in the long run, can play more or less perfect two player limit Texas Hold’em.
One of the goals of game theory is to “solve” games. This entails finding strategies for each player that, if perfectly followed, give the best possible outcome for each player, assuming their opponents are also playing perfectly.
In a paper published in the journal Science, a team of computer scientists led by Michael Bowling of the University of Alberta has announced that they have “essentially solved” heads-up limit Hold’em.
Tic-tac-toe is an extremely simple game, and it has a fairly well known solution. If the two players follow the appropriate strategies, the game will always end in a draw. In 2007, checkers was solved in this fashion as well: through a process of brute force analysing all possible checker games over a number of years, researchers identified strategies that, as in tic-tac-toe, always result in a draw.
Poker, however, has an extra dimension that complicates analysis: while you can look at the cards dealt to you, you can’t see what your opponent has, as opposed to games like checkers or chess where you and your opponent have the same information.
This makes decision making significantly harder. There are a lot of ways to play a pair of tens, and you have to evaluate your options based on how your opponent has been betting, and what community cards come out in the later rounds of the game. If you’re holding that pair of tens, there are plenty of possibilities for what your opponent is holding, and you have no way of distinguishing between those possibilities.
This “imperfect information” nature of poker has been a big part of why, while we have computer programs that cannot lose at checkers, and programs that can beat grandmasters at chess, the development of computer poker players that can’t lose has been slower.
Computer scientist Michael Bowling and his team at the University of Alberta have been working on computer poker for years. Their goal has been to solve a relatively simple version of poker: heads-up limit Texas Hold’em.
In Texas Hold’em games, there are four rounds of dealing cards, each followed by a round of betting. Initially, each player is dealt two cards face down, which form the secret “hole cards” part of their hands. After an initial round of betting, three cards are placed face up (called the “flop”), and are available to all players. Another round of betting follows, and then there are two final rounds (the “turn” and the “river”) where one card is played face up in each round. If there are still multiple players in the game at the end, they reveal their original hidden cards, and the winner is the player who can put together the best five-card poker hand using any combination of their two hidden cards and the five community cards.
In the specific case that Bowling and his peers have been working on, “heads-up” indicates that there are just two players, and “limit” means that the sizes of bets are fixed and the number of possible raises are limited in each round. These restrictions greatly simplify the game: more players make strategic decision making far more complicated, and allowing variable no limit betting leads to far more possibilities in each betting round.
Even with these simplifications, there are still a grand total of about 319,000,000,000,000 situations a player can find themselves in. After simplifying this by taking advantage of game symmetries (things like swapping all spades with clubs and vice versa), there are still around 13,800,000,000,000 game scenarios to consider.
Solving The Game
Bowling and his team constructed a computer program that cleverly split up those 13.8 trillion scenarios into more manageable pieces. The program was designed to gradually improve its strategy over time by playing through all those scenarios over and over again with varying strategies, getting closer to an ideal strategy.
For each game scenario, the program would try a possible strategy. After playing through to the end of that hand, it would then compare its strategy to other possible strategies for that scenario. The program would then update its strategy based on this comparison: It would be more likely in the future to choose actions in that scenario that would have turned out better than what it chose this time around, and would avoid actions that would have turned out worse. Doing this over and over again would eventually lead to a very close approximation of a perfect strategy.
The team then ran this program on a huge computing cluster of 200 powerful computers over a period of more than two months. During that time, the above process of trying and adjusting strategies was repeated 1,579 times, resulting in a final strategy that was extremely strong.
The strategy was strong enough to satisfy Bowling’s criterion for “essentially solving” the game: If a human being were to spend 70 years playing 200 hands of poker an hour for 12 hours a day against a computer running the final strategy, there’s at least a 95% chance that human being wouldn’t be able to tell the difference between an actual perfect strategy and the computer’s strategy.
While Bowling and his team, and other computer poker researchers, were able to beat much more simplified, “synthetic” versions of poker in the past, heads-up limit Texas Hold’em is the first game that is actually played competitively by human beings to be essentially solved in this way.
The paper concludes by noting a couple interesting aspects of the algorithm’s early game pre-flop strategy. Heads-up Texas Hold’em starts off with a “blind” bet: The dealer has to put in a small amount of money, and the other player has to put in twice that amount before the cards are dealt. The dealer then hands out the two hidden pocket cards.
At that point, the dealer starts the first round of betting in one of three ways: Throwing away her hand, losing the small blind, but risking no more money (“folding”); putting in the other half of the big blind bet and passing the decision making process to the other player (“calling”); or matching the big blind, and then adding in more chips to double the big blind bet (“raising”).
The program’s final strategy has the dealer being fairly aggressive on that initial play, and the other player to also tend towards aggressiveness. This chart, from the paper, illustrates the program’s likelihoods of raising (green), calling (blue), or folding (red), based on the two pocket cards they’re holding. The lower left half of the square represents cards of different suits; the upper right half cards of the same suit. On the left is the dealer’s first move; on the right is the other player’s response to a dealer raise:
The dealer nearly always raises, almost never calls, and only folds on the absolute weakest of hands. Assuming a dealer raise, the second player replies with a counter raise with a reasonably decent hand, and calls with most other hands, leading to the flop and the next round of betting.
Some further directions for poker research include expanding analysis to games with more players and more complicated betting structures. Bowling and his coauthors also note that game theoretic analysis along the lines used to study poker are helpful in other areas, like airport security and medical diagnosis.
The paper is online at Science’s website here.
Business Insider Emails & Alerts
Site highlights each day to your inbox.