Probability theory is always a more difficult topic than it seems at first sight. In another posting, I talked about the „two envelopes problem“ and described my basic approach to explain and understand problems with „probability“. Here, I am going to show you
- a nice problem concerning repeated drawing from an urn,
- the boy and girl problem,
- the Monty Hall problem,
- the Martingale betting system.
Drawing from an Urn
We tackle the following problem. I got it by mail from Jens Schleusener. It seems to be an older problem considered by Laplace and de Morgan, as a colleague of mine told me.
Assume an urn filled with N red and green balls, such each the number R of red balls between 0 and N has the same probability. Now, we draw a ball from the urn randomly and get a red one. We put it back and draw another ball. What is the probability that this second ball is also a red?
Although the assumption on the urn is a bit unusual, we can easily see that the probability for a red ball at the first draw is 1/2. We instinctively think that drawing a second time must have the same probability for red as the first time. Why should it change when we put the ball back? Paradoxically, it does!
If that fools you, you are not alone.
The easiest way to make the reasoning precise is to design an experiment which you can carry out over and over again. In our case, we have three steps. The number of total balls is fixed as N. The answer will depend on N.
- We select a number R between 0 and N, each number with equal probability 1/(N+1).
- We draw a ball randomly from the urn and observe its color. This can be simulated by selecting another random number Z1 between 1 and N, and checking Z1<=R. If that is true, we have a red ball. We discard all the cases when it is not true.
- We draw another ball randomly from the urn and observe its color. This can be simulated by selecting another random number Z2 between 1 and N, and checking Z2<=R. If that is true, we have another red ball.
We now have to count how often Z2<=R under condition Z1<=R. We can do this in any programming language. Let us do it in Java. I am using the old-fashioned Random class of Java, and run 10 Million simulations of the experiment.
import java.util.Random; public class Test { public static void main (String args[]) { int N=1000, Rep=10000000; int Total = 0, TwoRed = 0; Random random = new Random(); for (int i=0; i<Rep; i++) { int R = random.nextInt(N+1); int Z1 = random.nextInt(N)+1; if (Z1<=R) { Total++; int Z2 = random.nextInt(N)+1; if (Z2<=R) TwoRed++; } } System.out.println((double)TwoRed/Total); } }
The surprising output will be close to 2/3, something like „0.6665737814723665“. Let me try to explain this result.
First, we take the case N=2 to convince ourselves that the intuitive answer 1/2 is not correct.
Assume, we take the first ball and it was red. Then we have only two situations left: (red,green) and (red,red). The first one has one red ball, which is selected as R in 1/3 of all cases, because the cases are (red,green) and (green,red). Thus it has a probability of 1/6. The second one has two red balls, thus it has a probability of 1/3. We note that (red,red) is twice as likely as (red,green).
Now we select another ball, which can be the first or the second because we return the first ball to the urn. If we select the first again, we always get a red ball. If we select the second, we get a red ball in 2/3 of the cases. The average of these numbers is
\frac12 \left( 1 + \frac23 \right) = \frac56
This is also approximately the result of the simulation code above, if we set N=2.
You get the same result by considering all 12 possible outcomes of (R,Z1,Z2) as in the simulation above. But I skip that additional confirmation.
For big N, we can make life easier by taking R, Z1, and Z2 uniformly in an interval, say [0,1]. For a specific R, the probability to get a red ball is then simply R. If we average out on all selections of R, we get
\int_0^1 R \,dR = \frac12
Now, our question can be answered by computing the probability to get two red balls. For a specific R this is R squared. Thus
\int_0^1 R^2 \,dR = \frac13
We have to divide that by the condition that the first ball is red to start with and get.
P\{\text{(red,red)|(red,*)}\} = \frac{1/3}{1/2} = \frac23
The notation on the left means the probability for (red,red) under the condition that the first ball was red.
If you draw more balls and they all turn out to be red, you are in the setup that was studied by Laplace. After, e.g., 10 red balls it should be obvious that the probability for next ball to be red is not 1/2. In fact, it is (n+1)/(n+2) after n red balls, approaching 1. This is the Rule of Succession, discussed by Laplace.
Let us turn to another, more realistic setup, which yields a new problem and a new answer.
Assume an urn filled with N red and green balls, where each ball is selected red or green with equal probability. Now, we draw a ball from the urn randomly and get a red one as before. We put it back and draw another ball. What is the probability that this is also a red ball?
This is a fundamental different setup, since we now select each ball for the urn independently of the others. When we have fixed the number of balls as above, the balls can no longer be selected independently from each other.
The reasoning is now much easier. But the result will still be paradoxical. It is not true that the second ball has a probability of 1/2 to be red too. Let me explain.
The first ball has a chance of 1/2 to be red, obviously. If we put it back and draw another ball, there are two cases.
- We draw another ball. The chances for this ball to be red is 1/2, simply because it is another ball.
- We draw the same ball. The chances for this ball to be red is 1, of course.
On average, we get the following chance for the second ball to be red too. Thus the probability can be computed in the following way.
\frac1N \left( 1 + \frac{N-1}{2}\right) = \frac12 + \frac{1}{2N}
In the case of two balls, this is 3/4. The limit for N to infinity is 1/2.
Let me convince you by studying the case N=2. We get four situations with equal probability 1/4.
(red,red), (red,green), (green,red), (green,green)
If we selected the first ball and it was red, only two are remaining.
(red,red), (red,green)
Both are equally likely, and we select one of the two balls equally likely. Thus in 3/4 of the selections we get a red ball, indeed.
The following is a simulation in Java. We will use a true simulation of the urn of the case, not a simple selection of random numbers. It will yield something like „0.7500778499221501“.
import java.util.Random; public class Test { public static void main (String args[]) { int N=2, Rep=10000000; int Total = 0, TwoRed = 0; int Urn[] = new int[N]; Random random = new Random(); for (int i=0; i<Rep; i++) { for (int j=0; j<N; j++) Urn[j] = random.nextInt(2); int Z1 = Urn[random.nextInt(N)]; if (Z1 == 1) { Total++; int Z2 = Urn[random.nextInt(N)]; if (Z2 == 1) TwoRed++; } } System.out.println((double)TwoRed/Total); } }
The Boy and Girl Paradox
There is a similar problem, called the boy and girl paradox.
We meet a father with his son on the street and he tells us that he has another kid. How likely is it that the other kid is also a son?
The experiment goes like this.
- We select one father which has indeed two kids from the big pool of fathers. We can assume that he will have one of the combinations (girl,girl), (girl,boy), (boy,girl), (boy,boy) with equal probability, where we ordered the pair by age.
- We select one of the kids randomly and assume it is a boy. I.e., we discard all cases where it is not a boy.
- We count how many times the other one is a boy.
We immediately note, that we can only get another boy, when we started with (boy,boy), and will not get another boy, when we started with (girl,boy) or (boy,girl). Thus the correct answer is 1/3.
This is different from the experiment we just studied with the answer 3/4, because we explicitly select the other kid. We are not allowed to take the same ball as before.
What if we try to simulate this problem with two coins. We flip both coins on a table and uncover one of them. Assume, it shows head. How probable is it that the other coin also shows head? It is clearly 1/2. Because we opened the right hand coin, the result on the left coin is independent. It is similar to knowing that the son in front of us is the youngest kid. To simulate the girl and boy paradox, we must uncover both and ask how often there will be two heads if there is at least one head. This is clearly a different question.
The way the father selects the kid for the walk is very important. If you know or assume that fathers always walks with the older one and leave the younger one at home with their mother, the answer is 1/2. The problem also changes if the father tells you that they are on their way to buy a bicycle for his son who has birthday tomorrow. The selection of the father is now fixed, and the chances to be male for the sibling of a boy who had birthday to day are clearly 1/2.
The Monty Hall Problem
This famous problem is often told like this:
You are in a gameshow. The host shows you three closed doors and explains that behind one of them there is a Porsche. Behind the other two are goats, the losing tickets. You are to select one door which you do. But the host opens another door and shows you the goat. He offers you to change your choice. Should you do it?
The answer depends very much on the rules of the game. If the rules are set in front, the problem is in fact easy to understand.
- If the host has a free choice to open another door or not, I would not change. He may mislead you, or he may help you. You do not know. Sticking with the door has 1/3 chance of winning.
- But if the host had to open another door with a goat, you should change. This will win if you selected a goat door and lose only if you selected the Porsche. So you get a 2/3 chance of winning.
This „problem“ shows how important it is to know so rules of the game you are playing.
The Martingale Betting Scheme
This is the most difficult of the problems, I think.
In a casino you play a game with 1/2 chance of winning. if you win, you get double your input. You decide to double your input until you win. Since you must at some point, you think you have found a sure way of winning.
The arithmetic of the scheme is simple. If you win after n rounds, you have a win of 1.
2 \cdot 2^n - (1 + 2 + 4 + \ldots + 2^n) = 1
The mind boggling fact is that it is „almost sure“ that you win at some point. We can make the set of all possible infinite sequences of wins and losses a probability space. Then the one sequence with losses only has indeed probability 0. I cannot go into the details of infinite products of probability spaces here. But the doubling scheme is one of the problems that occur when we construct these spaces. It has no practical meaning whatever.
In reality, no game in a casino has an exact chance of 1/2. If you set on Rouge or Noir in Roulette, you lose when the 0 appears. Moreover, all casinos have a maximum for a bet to ensure they can never get bankrupt.