Editor’s Note: this is a chapter from David’s forthcoming short book Probability for 12 Year Olds (And Maybe You).
We will temporarily leave Statistics for a moment in order to allow me to teach you one last probability concept. It’s a bit tricky, especially for students who try to learn it from books or teachers who unnecessarily make the subject more technical than it needs to be (mainly by using intimidating formulas). But the fact is that if it is framed the way I do, it is pretty much logic and common sense, much like most probability and permutations and combinations problems are. That distinguishes it from Statistics, a subject that includes formulas and algorithms that usually can't be simply deduced solely via careful thinking and no fairly advanced math.
But even clever thinkers sometimes find Bayes' Theorem problems tougher than basic probability problems. So I decided to give you a mental break and switch to some easily memorized Statistics definitions and techniques before now getting to Bayes. Next chapter I will switch back to statistics when I tell you about a terrible error that some people make when they try to use those definitions and techniques to evaluate their experiments or clinical trials. But since avoiding that error requires that you use and understand Bayes' Theorem, I need to insert the subject between the Statistics chapters.
Put simply, you use Bayes' Theorem not to compute the probability that something will happen, but rather to compute the probability that something happened in a particular way once you have found out that it did indeed happen. (Or if you hypothesized that it happened.) How you do this is often pretty obvious. You simply form a ratio between the probability of a successful result happening your way and the probability of that successful result happening some other way. What is the probability that the Mets won (or will have won) the World Series, given that a New York team won it? If you knew that the Yankees were 20% before the season started and the Mets were 10%, then, with no subsequent info other than the city the winner came from, you can say that they were a 20-10 or a 2-1 underdog. Interestingly this is one time that it is OK to use odds rather than probabilities to help you compute your answer. Of course once you do that it is normally better to change that answer into a fraction or percentage. In this case "1/3".
The above example used subjective probabilities. Handicapper's opinions. Similar techniques work for pure math probabilities. If you tell me that you will win a prize if you flip ten coins and get at least nine heads, then if you run out of the casino all excited that you won, I will know that there is a 10/11 chance that you got exactly nine heads. If eight heads also won then your glee would mean that there was a 45/56 chance you hit exactly eight. If you don't see this you need to reread the Permutations and Combinations chapter (either that or go back to your grade school math book to help you realize that if you are comparing 45/1024, 10/1024, and 1/1024 you can ignore the denominators.)
Another simple example that used to come up all the time in the old days, involves Jacks or better Draw Poker which was the main game in California card rooms. A player was dealt "openers" about 23% of the time. JJ-KK about 10%. AA about 5% (there was usually a joker that could be used for aces straights and flushes), two pairs about 5%, trips, about 2% and a pat hand about 1%. Knowing that and knowing the minimum hand an opponent needs to open (or call or raise) it is usually a simple matter to calculate the chances your hand is better than his. If he will open with kings or better and you have 3322 you will have him beat almost exactly half the time. (He is the favorite though because he improves more often when he is beat than you do) A simple Bayes' Theorem problem. As is calculating your chances if his bet after the draw indicates different possible degrees of improvement on his part.
Now let’s do four versions of a slightly trickier question.
There are three coins in your pocket. They seem the same but you know that one of them is double headed. If you grab one without looking at it, flip it four times and get four heads, what is the chance you grabbed the double header? There are two ways you could get that result. One way would be to grab a fair coin and then flip four heads. That's 2/3 x 1/2 x 1/2 x 1/2 x 1/2 = 2/48 or 1/24. The other way is if you grabbed the bad coin and then got four heads. That's 1/3 x 1 x 1 x 1 x 1 = 1/3 or 8/24. 8/24 vs. 1/24. Its 8-1 in favor of the bad coin. Eight ninths of the time you will look down and see that it was that coin that you flipped.
But what if there were more than three coins in you pocket including the double headed one. Say there were 20 total. Flipping four heads is still strong evidence that I picked the bad coin. But I shouldn't bet on it at even odds. The chances I pick the bad coin and then get four heads is 1/20 x 1 x 1 x 1 x 1. But the chances I pick a fair coin and get four out of four heads is 19/20 x 1/2 x 1/2 x 1/2 x 1/2 = 19/320. The bad coin gave us a probability of 16 out of 320 (1/20). So one of those 19 good coins is a 19 to 16 favorite to be the one picked compared to the double header. A good coin will be what you picked 19 out of 35 times when four heads are flipped.
But not if five heads are flipped. Even though there is only one bad coin and 19 good ones, five heads is enough to swing you back to being willing to bet you picked the bad coin. The relevant probabilities are 1/ 20 vs. 19/640. Do you see why? 32/640 vs. 19/640. The bad coin will be the pick 32 out of 51 times that there are five flips and five heads.
Finally let’s postulate that all the coins in your pocket are unfair though none are double headed. Rather they are weighted. Two of them will come up heads 60% of the time while the third one produces heads 80%. You pick a coin, flip it four times, and get three heads. What's the chance you picked the 80 percenter? If your guess is that it’s close to 50-50 you have good instincts. Picking the 80% coin and then getting exactly three heads will occur 1/3 x 4/5 x 4/5 x 4/5 x 1/5 (the tail) x 4 (That last "4" is because the tail can be in any of the four spots.) That's 256/1875. Picking one of the 60% coins and flipping three out of four heads is 2/3 x 3/5 x 3/5 x 3/5 x 2/5 (the tail) x 4 or 216/1875. The one 80% coin is slightly more likely to have been the coin picked than the two 60 % coins combined. 256 -216. 256/ 472. (Notice though that it wouldn't have been if there were three rather than two 60% coins in my pocket.)
Two real life scenarios that can not be analyzed well without Bayes' Theorem involve coincidences and medical tests for rare diseases. They will be discussed shortly. But first let’s talk about the widespread error that people who don't use Bayes as part of their basis to come to a conclusion. That's next.