How A.I. Conquered Poker – The New York Times

The Great Read
Good poker players have always known that they need to maintain a balance between bluffing and playing it straight. Now they can do so perfectly.
Credit…Illustration by Patricia Doria
Supported by

To hear more audio stories from publications like The New York Times, download Audm for iPhone or Android.
Last November in the cavernous Amazon Room of Las Vegas’s Rio casino, two dozen men dressed mostly in sweatshirts and baseball caps sat around three well-worn poker tables playing Texas Hold ’em. Occasionally a few passers-by stopped to watch the action, but otherwise the players pushed their chips back and forth in dingy obscurity. Except for the taut, electric stillness with which they held themselves during a hand, there was no outward sign that these were the greatest poker players in the world, nor that they were, as the poker saying goes, “playing for houses,” or at least hefty down payments. This was the first day of a three-day tournament whose official name was the World Series of Poker Super High Roller, though the participants simply called it “the 250K,” after the $250,000 each had put up to enter it.
At one table, a professional player named Seth Davies covertly peeled up the edges of his cards to consider the hand he had just been dealt: the six and seven of diamonds. Over several hours of play, Davies had managed to grow his starting stack of 1.5 million in tournament chips to well over two million, some of which he now slid forward as a raise. A 33-year-old former college baseball player with a trimmed light brown beard, Davies sat upright, intensely following the action as it moved around the table. Two men called his bet before Dan Smith, a fellow pro with a round face, mustache and whimsically worn cowboy hat, put in a hefty reraise. Only Davies called.
The dealer laid out a king, four and five, all clubs, giving Davies a straight draw. Smith checked (bet nothing). Davies bet. Smith called. The turn card was the deuce of diamonds, missing Davies’s draw. Again Smith checked. Again Davies bet. Again Smith called. The last card dealt was the deuce of clubs, one final blow to Davies’s hopes of improving his hand. By now the pot at the center of the faded green-felt-covered table had grown to more than a million in chips. The last deuce had put four clubs on the table, which meant that if Smith had even one club in his hand, he would make a flush.
Davies, who had been betting the whole way needing an eight or a three to turn his hand into a straight, had arrived at the end of the hand with precisely nothing. After Smith checked a third time, Davies considered his options for almost a minute before declaring himself all-in for 1.7 million in chips. If Smith called, Davies would be out of the tournament, his $250,000 entry fee incinerated in a single ill-timed bluff.
Smith studied Davies from under the brim of his cowboy hat, then twisted his face in exasperation at Davies or, perhaps, at luck itself. Finally, his features settling in an irritated scowl, Smith folded and the dealer pushed the pile of multicolored chips Davies’s way. According to Davies, what he felt when the hand was over was not so much triumph as relief.
“You’re playing a pot that’s effectively worth half a million dollars in real money,” he said afterward. “It’s just so much goddamned stress.”
Real validation wouldn’t come until around 2:30 that morning, after the first day of the tournament had come to an end and Davies had made the 15-minute drive from the Rio to his home, outside Las Vegas. There, in an office just in from the garage, he opened a computer program called PioSOLVER, one of a handful of artificial-intelligence-based tools that have, over the last several years, radically remade the way poker is played, especially at the highest levels of the game. Davies input all the details of the hand and then set the program to run. In moments, the solver generated an optimal strategy. Mostly, the program said, Davies had gotten it right. His bet on the turn, when the deuce of diamonds was dealt, should have been 80 percent of the pot instead of 50 percent, but the 1.7 million chip bluff on the river was the right play.
“That feels really good,” Davies said. “Even more than winning a huge pot. The real satisfying part is when you nail one like that.” Davies went to sleep that night knowing for certain that he played the hand within a few degrees of perfection.
The pursuit of perfect poker goes back at least as far as the 1944 publication of “Theory of Games and Economic Behavior,” by the mathematician John von Neumann and the economist Oskar Morgenstern. The two men wanted to correct what they saw as a fundamental imprecision in the field of economics. “We wish,” they wrote, “to find the mathematically complete principles which define ‘rational behavior’ for the participants in a social economy, and to derive from them the general characteristics of that behavior.” Economic life, they suggested, should be thought of as a series of maximization problems in which individual actors compete to wring as much utility as possible from their daily toil. If von Neumann and Morgenstern could quantify the way good decisions were made, the idea went, they would then be able to build a science of economics on firm ground.
It was this desire to model economic decision-making that led them to game play. Von Neumann rejected most games as unsuitable to the task, especially those like checkers or chess in which both players can see all the pieces on the board and share the same information. “Real life is not like that,” he explained to Jacob Bronowski, a fellow mathematician. “Real life consists of bluffing, of little tactics of deception, of asking yourself what is the other man going to think I mean to do. And that is what games are about in my theory.” Real life, von Neumann thought, was like poker.
Here are more fascinating tales you can’t help but read all the way to the end.
Using his own simplified version of the game, in which two players were randomly “dealt” secret numbers and then asked to make bets of a predetermined size on whose number was higher, von Neumann derived the basis for an optimal strategy. Players should bet large both with their very best hands and, as bluffs, with some definable percentage of their very worst hands. (The percentage changed depending on the size of the bet relative to the size of the pot.) Von Neumann was able to demonstrate that by bluffing and calling at mathematically precise frequencies, players would do no worse than break even in the long run, even if they provided their opponents with an exact description of their strategy. And, if their opponents deployed any strategy against them other than the perfect one von Neumann had described, those opponents were guaranteed to lose, given a large enough sample.
“Theory of Games” pointed the way to a future in which all manner of competitive interactions could be modeled mathematically: auctions, submarine warfare, even the way species compete to pass their genes on to future generations. But in strategic terms, poker itself barely advanced in response to von Neumann’s proof until it was taken up by members of the Department of Computing Science at the University of Alberta more than five decades later. The early star of the department’s games research was a professor named Jonathan Schaeffer, who, after 18 years of work, discovered the solution to checkers. Alberta faculty and students also made significant progress on games as diverse as go, Othello, StarCraft and the Canadian pastime of curling. Poker, though, remained a particularly thorny problem, for precisely the reason von Neumann was attracted to it in the first place: the way hidden information in the game acts as an impediment to good decision making.
Unlike in chess or backgammon, in which both players’ moves are clearly legible on the board, in poker a computer has to interpret its opponents’ bets despite never being certain what cards they hold. Neil Burch, a computer scientist who spent nearly two decades working on poker as a graduate student and researcher at Alberta before joining an artificial intelligence company called DeepMind, characterizes the team’s early attempts as pretty unsuccessful. “What we found was if you put a knowledgeable poker player in front of the computer and let them poke at it,” he says, the program got “crushed, absolutely smashed.”
Partly this was just a function of the difficulty of modeling all the decisions involved in playing a hand of poker. Game theorists use a diagram of a branching tree to represent the different ways a game can play out. In a straightforward one like rock-paper-scissors, the tree is small: three branches for the rock, paper and scissors you can play, each with three subsequent branches for the rock, paper and scissors your opponent can play. The more complicated the game, the larger the tree becomes. For even a simplified version of Texas Hold ’em, played “heads up” (i.e., between just two players) and with bets fixed at a predetermined size, a full game tree contains 316,000,000,000,000,000 branches. The tree for no-limit hold ’em, in which players can bet any amount, has even more than that. “It really does get truly enormous,” Burch says. “Like, larger than the number of atoms in the universe.”
At first, the Alberta group’s approach was to try to shrink the game to a more manageable scale — crudely bucketing hands together that were more or less alike, treating a pair of nines and a pair of tens, say, as if they were identical. But as the field of artificial intelligence grew more robust, and as the team’s algorithms became better tuned to the intricacies of poker, its programs began to improve. Crucial to this development was an algorithm called counterfactual regret minimization. Computer scientists tasked their machines with identifying poker’s optimal strategy by having the programs play against themselves billions of times and take note of which decisions in the game tree had been least profitable (the “regrets,” which the A.I. would learn to minimize in future iterations by making other, better choices). In 2015, the Alberta team announced its success by publishing an article in Science titled “Heads-Up Limit Hold’em Poker Is Solved.”
For some players, especially those who made a living playing that variant of poker online, the Alberta group’s triumph represented a serious threat to their livelihood. “I remember when we read about it,” says the former professional Terrence Chan. “We were just like, ‘Oh, good game, it’s been a fun ride.’”
It quickly became clear that academics were not the only ones interested in computers’ ability to discover optimal strategy. One former member of the Alberta team, who asked me not to name him, citing confidentiality agreements with the software company that currently employs him, told me that he had been paid hundreds of thousands of dollars to help poker players develop software that would identify perfect play and to consult with programmers building bots that would be capable of defeating humans in online games. Players unable to front that kind of money didn’t have to wait long before gaining more affordable access to A.I.-based strategies. The same year that Science published the limit hold ’em article, a Polish computer programmer and former online poker player named Piotrek Lopusiewicz began selling the first version of his application PioSOLVER. For $249, players could download a program that approximated the solutions for the far more complicated no-limit version of the game. As of 2015, a practical actualization of John von Neumann’s mathematical proof was available to anyone with a powerful enough personal computer.
One of the earliest and most devoted adopters of what has come to be known as “game theory optimal” poker is Seth Davies’s friend and poker mentor, Jason Koon. On the second day of the three-day Super High Roller tournament, I visited Koon at his multimillion-dollar house, located in a gated community inside a larger gated community next to a Jack Nicklaus-designed golf course. On Day 1, Koon paid $250,000 to play the Super High Roller, then a second $250,000 after he was knocked out four hours in, but again he lost all his chips. “Welcome to the world of nosebleed tourneys,” he texted me afterward. “Just have to play your best — it evens out.”
For Koon, evening out has taken the form of more than $30 million in in-person tournament winnings (and, he says, at least as much from high-stakes cash games in Las Vegas and Macau, the Asian gambling mecca). Koon began playing poker seriously in 2006 while rehabbing an injury at West Virginia Wesleyan College, where he was a sprinter on the track team. He made a good living from cards, but he struggled to win consistently in the highest-stakes games. “I was a pretty mediocre player pre-solver,” he says, “but the second solvers came out, I just buried myself in this thing, and I started to improve like rapidly, rapidly, rapidly, rapidly.”
In a home office decorated mostly with trophies from poker tournaments he has won, Koon turned to his computer and pulled up a hand on PioSOLVER. After specifying the size of the players’ chip stacks and the range of hands they would play from their particular seats at the table, he entered a random three-card flop that both players would see. A 13-by-13 grid illustrated all the possible hands one of the players could hold. Koon hovered his mouse over the square for an ace and queen of different suits. The solver indicated that Koon should check 39 percent of the time; make a bet equivalent to 30 percent the size of the pot 51 percent of the time; and bet 70 percent of the pot the rest of the time. This von Neumann-esque mixed strategy would simultaneously maximize his profit and disguise the strength of his hand.
Thanks to tools like PioSOLVER, Koon has remade his approach to the game, learning what size bets work best in different situations. Sometimes tiny ones, one-fifth or even one-tenth the size of the pot, are ideal; other times, giant bets two or three times the size of the pot are correct. And, while good poker players have always known that they need to maintain a balance between bluffing and playing it straight, solvers define the precise frequency with which Koon should employ one tactic or the other and identify the (sometimes surprising) best and worst hands to bluff with, depending on the cards in play.
Erik Seidel, a pro who learned the game in the 1980s, told me that if players like Koon traveled back in time just 15 years with today’s knowledge, they would crush the best players of that era. “I think also that all the people in the game would think that they were fish,” Seidel said, using the poker argot for bad players, “There are a lot of really strange plays now that these guys are making that are effective — but if people saw them back in the day, I think that they’d be invited into the game every night.”
Against weaker players, Koon will sometimes intentionally diverge from theoretically perfect poker, bluffing more than he should or betting large when the A.I. says he should bet small, to take advantage of his opponents’ mistakes. But against the best professionals, he will mostly just do his best to replicate the solvers’ decisions — to the extent that he is able to remember the A.I.’s preferred bet sizes and the frequencies with which to employ them. Because he knows his own human biases can creep into his decision making, Koon will often randomly select which of the solver’s tactics to employ in a given hand. He’ll glance down at the second hand on his watch, or at a poker chip to note the orientation of the casino logo as if it were a clock face, in order to generate a percentage between 1 and 100. The higher the percentage, the more aggressive the action he’ll take. “I’ll say: OK, well I just rolled 9 o’clock. So that’s 75 percent. That’s a pretty aggressive number.” In that instance, Koon might choose the largest of the solver’s approved bet sizes for his hand, whereas if the second hand had pointed to 3 o’clock, or 25 percent, he might have checked.
Using optimal strategy is no guarantee, of course, that Koon will win any particular hand. Given enough hands, however, the math says he should do no worse than break even — and will in practice do much better than that, depending on how far his opponents’ strategies diverge from theoretically perfect play. If you were to play thousands of hands against a solver, Koon says, “it’s going to win, I promise.”
Koon is quick to point out that even with access to the solvers’ perfect strategy, poker remains an incredibly difficult game to play well. The emotional swings that come from winning or losing giant pots and the fatigue of 12-hour sessions remain the same challenges as always, but now top players have to put in significant work away from the tables to succeed. Like most top pros, Koon spends a good part of each week studying different situations that might arise, trying to understand the logic behind the programs’ choices. “Solvers can’t tell you why they do what they do — they just do it,” he says. “So now it’s on the poker player to figure out why.”
The best players are able to reverse-engineer the A.I.’s strategy and create heuristics that apply to hands and situations similar to the one they’re studying. Even so, they are working with immense amounts of information. When I suggested to Koon that it was like endlessly rereading a 10,000-page book in order to keep as much of it in his head as possible, he immediately corrected me: “100,000-page book. The game is so damn hard.”
In fact, the store of data Koon draws on is even larger than that. He rents nearly 200 terabytes of cloud storage for the game trees he has developed since he started working with solvers. Players sitting down to in-person games have no way to access all that information at the table, but that limitation does not necessarily apply to poker played over the internet. Automated bots, especially in low-stakes games, have been a problem for internet poker since before the rise of solvers, but now human players willing to skirt the rules can look up A.I. strategies on one screen and then use them to play optimally on a second screen. “Any time there are high stakes and a lot of money to be won, and a device that might be used for good,” Koon says, “people have a way to turn it into a cheating tool.”
Koon isn’t especially worried that people are cheating in the games he plays over the internet, but other players aren’t so sure. “It’s the main reason why I don’t really play much online anymore,” a pro named Ryan Laplante says. In a recent $7,000 buy-in online tournament held as part of the World Series of Poker, Laplante says he recognized the screen names of at least four of the 100 or so competitors as belonging to players who were rumored to have been banned from other sites for using what is called “real-time assistance.” Laplante credits some of the biggest online sites with doing a good job of policing their games, but he worries that as solvers become more ubiquitous, the balance of power will continue to shift toward those who cheat to gain an edge.
“The only thing I’m confident of,” Laplante says, “is that it’s going to get a lot worse very quickly.”
Well after midnight on the Super High Roller’s second day, a German professional named Christoph Vogelsang called a bet for all his chips with a king and a nine versus another player’s ace and jack. According to the solvers, calling was, in fact, the correct play — all the same, Vogelsang lost the hand and was eliminated from the tournament in sixth place. Unlike a regular poker game, where players can leave the table and cash in their chips whenever they feel like it, a poker tournament requires players to continue until they either lose everything or win every single chip in play. Prizes, drawn from the pool created by all the buy-ins, are paid out based on how long players manage to stay in the game. The person who ends with all the chips is awarded the first-place prize ($3.2 million in this tournament), the second-to-last survivor gets second place ($2 million) and so on down to the final in-the-money finisher, in this case, fifth place ($630,000). Vogelsang, and all the players who were eliminated before him, received nothing.
Given the small sample size of several hundred hands that a player will see over the course of three days, a single poker tournament is an incredibly inexact way of identifying the strongest player in the field. Luck will determine much of the outcome for even the best players — if the 26 human players in the tournament were replaced with 26 perfectly programmed poker bots, one bot would win and one would be the first to be eliminated, despite their sharing the same optimal strategy.
Poker players tend to take the long view, speaking of tournament buy-ins as investments with a more or less predictable return when averaged over time. “In a relatively tough tournament, the worst players in the field are losing maybe as much as 30 or 40 percent of their buy-in,” says Ike Haxton, who plays professionally. Stronger amateurs, he says, should expect to lose an average of about 15 percent of the money they put in, while the best pros will earn a return of around 5 to 10 percent over the long run.
To dampen the huge swings of fortune that come in the short term, many professionals agree to swap percentages of any potential prize money with one another before the tournament starts — I agree to give you 5 percent of what I win, say, if you agree to give me 5 percent of what you win — or sell stakes in their future winnings to outside backers, like shares in an old-time whaling voyage. Seth Davies wouldn’t tell me the exact details of his own arrangements, but he admitted that less than half of what he put into this tournament had come out of his own bankroll. Even so, after being knocked out on the first day and then paying a second $250,000 to re-enter, he had “well into six figures” of his own money on the line.
On the third and final day of the Super High Roller, the five remaining players were relocated from the dilapidated outer tables of the Amazon Room to a made-for-television set at its center. Stage lights brightly illuminated the poker table’s gleaming green felt from above, while a 45-foot camera crane swung from side to side to get the best angle on the action. All five players who had made it this far were guaranteed to turn a profit, but there was still a lot of maneuvering left to determine how far up the payout ladder they could climb. As the game got underway, the chip leader, a 27-year-old Spanish pro named Adrián Mateos, kept up a steady barrage of giant bets against the other players, asking them again and again whether this was the hand with which they wanted to make their final stand, or whether, perhaps, they would rather fold and wait for another player or two to bust out so that they could finish fourth or third, instead of fifth, and take home an additional $300,000 or $700,000 in prize money.
Situations like these bend the value of players’ stacks in strange ways, depending on where they are in the payout hierarchy. Even a single chip can be worth an incredible amount of real money if another player is knocked out of the tournament after you’ve folded. There are solvers that can model these circumstances as well, but as the chip stacks get shorter relative to the size of the blind bets and antes players are required to put in the pot before each hand begins, flawless play alone offers no real insurance against what often becomes essentially a game of heads or tails. “When it comes down to it,” Davies says, “you just end up running these million-dollar flips, and you hope you win.”
After one competitor was eliminated, Davies found himself with the shortest chip stack at the table. With only one more person still to play behind him, he pushed all-in with the ace and seven of clubs, just as the solvers said he should, given the size of his stack. The remaining player, a ponytailed Englishman named Ben Heath, quickly called and turned over a pair of jacks, making him a 67 percent favorite to win the hand. None of the five cards the dealer laid out improved Davies’s hand, so Heath won the pot, and Davies was eliminated in fourth place. He stood up from the table, collected his backpack and N95 mask and left the stage. “That was some serious gambling up there,” he told me. Davies at least had the satisfaction of knowing how closely his play over the last three days had hewed to the optimal strategy generated on his computer at home. (Another consolation was the $930,791 in prize money he would receive for his fourth-place finish.)
Stowing his cash-out ticket in his pocket, Davies walked over to a nearby $50,000 buy-in tournament already underway. He had planned to get some dinner and rest a little before buying in, but he changed his mind after seeing how many of the players here were the sort most likely to employ decidedly nonoptimal strategies. “This 50K looks incredible,” Davies told me. “I just couldn’t not be in there right away.”
Not every player I spoke to is happy about the way A.I.-based approaches have changed the poker landscape. For one thing, while the tactics employed in most lower-stakes games today look pretty similar to those in use before the advent of solvers, higher-stakes competition has become much tougher. As optimal strategy has become more widely understood, the advantage in skill the very best players once held over the merely quite good players has narrowed considerably. But for Doug Polk, who largely retired from poker in 2017 after winning tens of millions of dollars, the change solvers have wrought is more existential. “I feel like it kind of killed the soul of the game,” Polk says, changing poker “from who can be the most creative problem-solver to who can memorize the most stuff and apply it.”
Piotrek Lopusiewicz, the programmer behind PioSOLVER, counters by arguing that the new generation of A.I. tools is merely a continuation of a longer pattern of technological innovation in poker. Before the advent of solvers, top online players like Polk used software to collect data about their opponents’ past play and analyze it for potential weaknesses. “So now someone brought a bigger firearm to the arms race,” Lopusiewicz says, “and suddenly those guys who weren’t in a position to profit were like: ‘Oh, yeah, but we don’t really mean that arms race. We just want our tools, not the better tools.’”
Besides, for Lopusiewicz, solvers haven’t so much changed poker as revealed its essence. Whether poker players themselves recognized it, or wanted to, at its core the game was always just the maximization problem John von Neumann revealed it to be. “Today, everyone at a certain level is forced to respect the math side,” Lopusiewicz says. “They can’t ignore it anymore.”
Keith Romer is an audio and print reporter whose work has been featured on National Public Radio, Planet Money and ESPN’s “30 for 30” podcasts, and in Rolling Stone and The New Yorker.
Advertisement

source

Leave a comment

Your email address will not be published.