EMT – Simulation of the Two Child Problem

The two-child problem is a problem in probability theory with a solution that seems paradox on first sight. I wrote about that problem before. Let me repeat the explanation and do a simulation to convince everybody that the solution is really correct. You can find information on that problem on Wikipedia.

The first simple version starts with this story: You meet a man in a bar and he mentions his daughter. You ask him about his children and he answers that he has two kids. The question is: What is the probability of the other kid (the one he was not talking about) to be female?

The answer depends on some important details. But let us assume you have no information if the other kid is younger or older, or any other special information about the daughter and the man is just a random man from the population. These details will turn out to be important! Then the intuitive answer to our question 1/2 is wrong, or rather, it does not make sense. The correct answer is 1/3.

In my opinion, each probabilistic problem makes sense only if we can devise an experiment that would in theory or in practice simulate the problem. Our computation should predict the outcome of the experiment. Problems that cannot be simulated in any way are not interesting to me. This includes most problems in probability or measure theory which need the help of the axiom of choice.

But our problem can easily be simulated. And setting up the simulation gives great insights into the meaning of our question and the terms „probability“ and „randomly selected“. Let us do it in Euler Math Toolbox (EMT). What we do is a Monte-Carlo simulation. We need to make the following assumption: The man has two kids with random gender, one of which is female, and the probability for a kid to be either gender is 1/2 (no diverse genders in this posting). So, we draw 1000000 pairs of kids with random gender. Then we count the proportion of two female kids among all pairs with at least one female kid.

>n = 1000000
 1000000
>K = intrandom(2,n,2)
 Real 2 x 1000000 matrix
 
             2             1             1             2     ...
             1             2             2             2     ...
>i = nonzeros(K[1]==1 || K[2]==1); ni = length(i); ni/n
 0.749596
>sum(K[1,i]==1 && K[2,i]==1) / ni
 0.334372115113

The syntax may seem cryptic, but it is intuitive if you understand the Matrix language. K contains a pair of kids in each column (n=1000000 columns). „K[1]==1“ returns a vector of 0/1 with 1 (true) on each position where the vector K[1] (the first row of K) is 1. I.e., a vector indicating the pairs where the first kid is female. „||“ is short for „or“, and nonzeros() returns the indices of the non-zero elements of a vector. Thus „ni“ is the number of pairs such that either kid is female. As expected, „ni/n“ is approximately 3/4. There are four cases, and one case (tow boys) is wrong.

In the final line, we count the numbers of pairs in the „i“-columns, where both kids are female, using sum() which sums up the ones and compare that to the total number of right cases „ni“. The answer is approximately 1/3.

This should not surprise us since there are three cases in the „i“-columns: (1,2), (2,1), (1,1). Only one of these cases is the correct one.

Why does the problem depend on the details? For this, we assume that it is Tuesday and the man in the bar has her birthday today. Surprisingly, this changes the problem completely! Even if we only know that the daughter is born on a Tuesday the problem changes drastically.

Let us start with the Tuesday problem. We simulate the same now by randomly selecting weekdays for the birthday of both kids.

>n = 1000000;
>K = intrandom(2,n,2);
>D = intrandom(2,n,7)
 Real 2 x 1000000 matrix
 
             7             4             6             4     ...
             7             2             5             5     ...
>i = nonzeros((K[1]==1 && D[1]==2) || (K[2]==1 && D[2]==2)); ni = length(i); ni/n
 0.137799
>sum(K[1,i]==1 && K[2,i]==1) / ni
 0.481679838025

The code did not change very much. But the „i“-columns now contain only columns with one Tuesday girl (2 means Tuesday above). The probability for this much lower, of course. We have 4*49=196 cases in total (4 gender pairs, 7 days for each). Of these, only 13 contain a Tuesday girl. There are 6 with the first kid a Tuesday girl and the other a girl born on another day, and 6 with the younger in the same way, and one with two Tuesday kids, plus 14 cases with one Tuesday kid and a boy. This is 27/496~0.13775.

Out of these 27 cases, we have 13 cases with two girls, as computed above. We have 13/27~0.48148. The simulation works as good as it can. The accuracy of a Monte-Carlo simulation is only about 1/sqrt(n)=1/1000.

If we know that the birthday is today, we can just take the probability as 1/2. In this case, the pick of the other child is almost independent of the pick of the first child in contrast to the original problem. And that is the true reason for the confusion. If we had known that the girl is the elder one the pick of the other child is independent and thus female with probability 1/2. But we only know that one of the kids is female. That changes the situation completely and both picks depend on each other.

I recently saw a YouTube video which I do not link here because it only adds to the confusion. The video could not explain why it makes a difference between knowing that the girl has a birthday, and knowing the precise birthday. This is easy to explain if you think of an experiment as above. Drawing a man with two kids, one of them being a daughter, is a different experiment than drawing a man with two kids, one of them is a daughter born on a Tuesday.

Published 2019/04/30.

EMT – Simulation of the Two Child Problem

Schreibe einen Kommentar Antworten abbrechen