It took my computer nearly a day to transfer all of the hands into a database fast enough to read the volume of info I was feeding it, 1 million hands of Pot-limit Omaha in all when finished. They all took place from 2004 to 2011. I compiled them so I could conduct analysis and look back and ask questions. What stats really matter when it comes to winrate? Did the games change from 2004 to 2011? Are there still inefficiencies in the way people play? Anyone playing poker has a choice. They can choose to bet their money by gut instinct, or choose to make decisions by numbers
This conflict between ways of deciding (numbers or gut?) has recently been popularized in the movie Moneyball, where the General Manger of the Oakland A’s Billy Beane uses stats like on-base percentage to find cheap players. The Oakland Athletics were the most cost effective team in baseball from 2000 to 2006 using two tools: research and analysis. Imagine the Moneyball model Billy Beane used to buy players applied to poker. Databases are a big part of online poker. For the past few months I've advanced in using the data analysis program R. With it I'm able to run regression, bootstrap tests, and map advanced visualizations that can show unique trends in data. All of these are just tools I use so that I'm able to answer the question: how do I use poker stats to become the most effective player possible given the resources I have? What stats matter and what stats amount to noise?
For example here's a visualization of an analysis that breaks a series of baseball stats into a graph that shows their impact on each other (known as principal component analysis). A Principal component analysis (PCA) tests the contribution of the variance of a variable (like walks and runs) to the total variance of the group. It's telling which factor is the most important to the total variance. In R I made a graph of the PCA comparative variance (weighted against the group).
3 Rules for Interpreting Principal Component Analysis (PCA)
Rule 1: Ignore everything but the length and direction of the arrows in the graph (Figure 1 and Figure 2)
Rule 2: You can interpret a correlation between variables that are close together
and going in the same direction (For example in Figure 1(below) baseball players that hit a lot of homeruns also tend to walk a lot. as they go in the same direction and are both long. People who fold their big blind (seen in Figure 2 as co_folded bb) are correlated with those that fold their small blind a lot. But there isn't a correlation between people that fold their blinds and those that steal a lot (seen as att_steal))
Rule 3: The most important dynamics are the ones with the largest variance. On the graph this means that the most important variables are the longest arrows.
So I'll start with the classic baseball stats like homeruns, singles, walks, stolen bases and graph the results of a Principal Components Analysis. In the end I have a graph showing which of stats have the biggest impact on the total variance between my dataset.
Figure
1
Comp.1 -Direction of Greatest Variance
Comp.2-Direction of Second Greatest Variance
This is the Moneyball model of baseball, showing that getting to first base (through walks and singles) and hitting homeruns are the most important stats. All of the other stats are bunched in the middle. Using this information the A’s formulated what traits they wanted to buy (on base percentage and homerun hitting) and which skills were overpriced (stealing bases)
Pot Limit Omaha stats aren’t the same as baseball, but the underlying research methods are the same. Here’s what the principal component analysis looks like when applied to a million hands of Pot Limit Omaha using these stats: the percentage you folded your big blind (co_folded_bb), the percentage you folded your small blind (co_folded_sb), the percentage you attempted to steal (att_steal), and the percentage that you voluntarily saw the flop (vol_saw_flop), and how often you saw the flop in the small blind (saw_flop_sb), and pre-flop raise percentage.

Figure 2
Comp.1 -Direction of Greatest Variance
Comp.2-Direction of Second Greatest Variance
What is the source of largest amount of variance in PLO? Remember the 3 rules for interpreting PCA.
Rule 1: You can ignore everything but the length and direction of the arrows in the graph
Rule 2: There is a correlation between variables that are close together and going in the same direction
Rule 3: The most important dynamics are the ones with the largest variance. On the graph this means that the most important variables are the longest arrows.
The stats that show up as most important in 6-max PLO are the percentage you folded your big blind, the percentage you folded your small blind. Of medium importance are the percentage you attempted to steal, and the percentage that you voluntarily saw the flop, and how often you saw the flop in the small blind.
What this analysis says is that the critical goal of shorthanded PLO is how you play in and against the blinds both by attacking and defending well. Aggressive players are often trying to steal your blinds for good reason. Imagine for a minute that you were able to steal the blinds in every hand in 6 max Pot limit Omaha (well every hand you weren't in the blinds!). That would make 4 out of 6 hands where you win +1.5 big blinds and two hands where you lose 1.5 big blinds. For a total of +4.5 big blinds every 6 hands. Every 100 hands you would win 75 big blinds (37.5 big bets 100). Good players tend to win about 6-15 big blinds per 100(3-7.5 big bets per 100). So you'd be a huge winner if you could steal at will.
Thinking about the stats that most players use to act like voluntarily put money in pot (VPIP), pre-flop raise (PFR), and aggression factor (AF), only one of three shows up as being very significant. Your opponents thus will tend to over rate (and call too much) versus someone who has a high aggression factor and pre flop raise. If your goal is to play a value betting game, then the best way you can do generate a wild image is by generating a high pre flop raise and high aggression factor. My strategy when I first started playing shorthanded Pot Limit Omaha was close to this. I raised a lot and didn't defend as much as I should. Early on in the games a pure stealing strategy with value betting was enough to beat the games. But like in every poker game, the players adapted to my play.
Some players adopted a strategy that led to me believing that they played passively. Armani was the first player I confronted who manufactured the low aggression factor mirage against me (while simultaneously stealing a lot). He was an aggressive pre-flop raiser whose stats would come out at 45/30/1.2. It was because of him that I first started to question the use of the aggression factor stat in Pot Limit Omaha. He was the biggest winner in the 6 max games at the 2/5 500 max and 5/10 1000 max level. He made some very big and successful bluffs against me. One pot of memory involved him calling an out of position raise 3 way in the 5 /10 game with a ($6000) stack and I($4000) was the 3 bettor in position with AAT9 with a flop pot size of 600. I'd 3 bet more than Aces so I wasn't giving that away as my hand. Armani checked a 976 rainbow flop and called my pot sized bet. The turn came a 7 and Armani lead at the pot of 1800. Since I was beaten by a 7 or better (and it was a common hand) I folded (Armani showed ATJ9). This was just one of many bluffs he made like this, and I feel that part of his success was in the widespread use by me and others gauging his bets by aggression factor.
It may surprise you to learn that the average PLO player, call him John Smith, didn't get any better from 2004 to 2011.
Average Player Stats for Pot Limit Omaha
Averages |
Sessions |
Hands |
Hours |
Big blind per 100 |
VPIP |
PFR |
Won $ at showdown % |
2006 to 2008 |
3.17 |
104.3 |
1.68 |
-48.79 |
47.33 |
11.28 |
40.95 |
2008 to 2011 |
2.69 |
73.1 |
1.3 |
-56.87 |
44.7 |
14.77 |
37.11 |
|
|
|
|
|
|
|
|
Declines |
-0.48 |
-31.2 |
-0.38 |
-8.08 |
-2.63 |
3.49 |
-3.84 |
|
-15% |
-30% |
-23% |
-17% |
-6% |
31% |
-9% |
What did happen from 2004 to 2011 is that John Smith played less frequently and for 31 fewer hands. What factors are involved in this I can only speculate on. Some possibilities are having less time, getting bored with the game, or going through a down economy. All may be applicable, but one thing is for sure, there's no evidence that the average PLO player got a ton better from 2004 to 2011. John Smith did decide to play fewer hands and did raise more pre-flop, but these would have helped him more were he playing No-Limit Hold'em. Yet, the games in 2011 were tougher games than the ones in 2006(as I can testify too) but that was probably because the games that lasted longer were more often against regular players (with fewer new and average players filling in the seats).
People tend to over rate the stats that they place before themselves in PLO. I wrote about this in two previous articles where I discussed two better stats for getting to heart of aggression(with total aggression rank) and also being able to tell if someone is creating a false impression based on breaking apart someone’s big bets and little bets(with betting volume). Both of these stats helped me to better defend myself out of position. Ideally it would also help to know how someone behaves in their mid-range game, but calculating that is a bit more difficult and a problem for another day.
Very few people come to the poker table with the mind set to play defense or to watch for defense. Thus much of what you can do defensively will go under the radar. Your opponents will see your offensive stats and moves and perhaps the percentage of the time you fold your blinds. They won't know how often you check raise their flop continuation bet as a bluff or semi-bluff, or re-raise them with a marginal but winning hand.
Another game besides baseball where teams are using moneyball principles is the NBA. Good defense will generally be less noticed as many defensive stats aren't even tracked accurately (check out Dean Oliver's Basketball on Paper for details). People see offensive stats and very poor defenses clearly but the good defenses are more difficult to spot. Here's an excerpt of an excellent article about defense in the NBA by Mike Prada on the New Orleans Hornets:
“No team does a better job of dissecting individual scouting reports and putting them into action when they take the court. They know players' tendencies and force those players into spots on the court from which they are less efficient. The Hornets topped the Thunder by preventing Westbrook from getting into the paint. Westbrook is pretty much exclusively a paint scorer at this point of his career. He hits over 55 percent of his shots at the rim and gets to the line 8.3 times a game. He's improved the rest of his game, but he's still below average pretty much anywhere else on the floor. Earlier in the game, he found his way into the paint, but down the stretch, the Hornets laid off him on pick and rolls and made him beat them with jump shots “
Like players in the NBA, poker players fall into certain niches of strategy. Some are high frequency bettors, meaning they're loose and aggressive on small bets but are much more timid on large bets. Alternatively some players use their tight image to fire off bluffs. So you want to know where players are likely to attack and try and force them to do something they aren't as comfortable with. So you want to use total aggression rank and betting volume to improve the defensive side of your game by looking up players who try to bluff you on small bets(or playing back) and cutting off their short game and giving up when they go big. Players may also just call their blinds but give up too often on the flop, which is the critical street in Omaha. Ideally it would be good to have not just a knowledge just how well do your opponents defend compared to other people? Do they only defend small (necessitating multiple bullets) or are they really going all the way? The same way aggression can be broken into frequency and volume, so too can calling and folding (although the latter is a trickier to calculate). All of these things though necessitate for an Omaha player the ability to write custom stats in their poker tracker. Currently trackers use a stat sorting system that based on No-Limit Hold'em.
An offhanded story I read a few months ago involved Patrik Antonius uninstalling his poker tracker the day he installed it. Why? My guess is he can probably keep track of the players in his games more easily through note taking, memory and review. There are typically less than 50 people in the world that play in the same game as him. Size is an important factor for memory. I don't need a GPS to find my local supermarket and post office. The area is small enough that it's easy to store it in memory, much like driving through ones hometown. Increase the size of the map to the city of the Los Angeles and you've been given a completely different task. In the latter case I'm going to go to Google maps to get directions or use a GPS. Similarly there is a difficulty of playing defense in games of large populations because it's often difficult to remember your opponents’ tendencies. When you're playing in games where the total population size is large, you might rarely play the same person in a similar situation twice. In this case you'll want to write custom stats into your tracking and heads up display programs.
There are undervalued skills that PLO players can develop, mainly defensive ones. The ability to steal and the ability to defend in the correct spots is the place to start to build a solid foundation of a mid-stakes PLO game. If you are able to do this at a high percentage (and can avoid tilt) you will do well. In baseball players who stole bases and had a high batting average were highly priced and concurrently didn't produce many runs. Players in the NBA were able to play a more stifling defense by going over scouting reports of their opponents’ tendencies and forcing them into uncomfortable situations. Similarly playing Pot Limit Omaha you can use the tools and analysis I've described, like Principal Component Analysis to find out where most of the variance in your game is happening. Along with defensive stats like total aggression rank and betting volume you can then start improving the undervalued skills within each game and start improving your results.
Read more about winning with statistics at Poker Database Analysis


