Statistical Rethinking: Chapter 2

Below are my attempts to work through the solutions for the exercises of Chapter 2 of Richard McElreath’s ‘Statistical Rethinking: A Bayesian course with examples in R and Stan’. If anyone notices any errors (of which there will inevitably be some), I would be happy to be notified!

You can find out more about the book (including links to full-length lectures) here.

Note: I completed Chapter 2 several weeks ago, and I can’t find my answers to the “medium” level questions (I think I calculated some of them by hand), so I’ll start at the “hard” ones this time.

I’m not going to post the full questions. Highlighted syntax refers to R code.

2H1
Species A gives birth to twins 10% of the time, otherwise birthing a single infant.
Species B births twins 20% of the time, otherwise birthing singleton infants.
A female panda of unknown species has just given birth to twins. What is the probability that her next birth will also be twins?

pr_A_T <- 0.1  # probability of species A producing twins
pr_B_T <- 0.2  # probability of species B producing twins
prior_AorB <- 0.5  # prior probability of it being either species A or B
# probability of two twin births in a row = posterior probability of twins for species A + posterior probability of twins for species B
# posterior probability of twins for species A = (probability of species A producing twins * prior) / probability of twins at a single point of time

# probability of twins at a single point of time (without prior knowledge)
pr_twins <- (pr_A_T * prior_AorB + pr_B_T * prior_AorB)

pr_T_A <- (pr_A_T * prior_AorB)/(pr_twins)
pr_T_B <- (pr_B_T * prior_AorB)/(pr_twins)
pr_T_T <- (pr_T_A * pr_A_T) + (pr_T_B * pr_B_T)

# Solution: Probability of two twin births in a row

pr_T_T

[1] 0.1666667

This answer seems plausible, considering that it lies between 0.1 and 0.2 (the respective probabilities of species A and B producing twins).

2H2
What is the probability that the panda we have is from species A, assuming we have observed only the first birth and that it was twins?

# We already computed this with the "probability of twins, given species A" function above.
pr_T_A

[1] 0.3333333

It appears less likely (lower than 0.5) for Ms. Panda to be of species A, as species B has a slightly higher tendency of popping out twins. Note that this does not rule out species A — a 1/3 probability should not be underestimated.

2H3
The same panda mother has a second birth and it is not twins, but a singleton infant. What is the posterior probability that this panda is species A?

# probability of first giving birth to twins, then to a singleton in general
pr_NT_T <- 0.1 * 0.9 * 0.5 + 0.2 * 0.8 * 0.5

# probability of twins and then a single baby for species A
pr_TN_A <- 0.1 * 0.9

pr_A_TN <- (pr_TN_A * prior_AorB) / pr_NT_T
pr_A_TN

[1] 0.36

We now have a slight increase in the probability that we’re dealing with species A, thanks to Baby Singleton.

2H4
We now have a genetic test!
The probability that it correctly identifies a species A panda is 0.8.
The probability that it correctly identifies a species B panda is 0.65.
Our panda tests positive for species A.
Part 1: Disregarding the previous information of the births, what is the posterior probability that our panda is species A?
Part 2: Redo calculation using the birth data.

pr_A_A <- 0.8
pr_A_B <- 0.35
pr_B_B <- 0.65
pr_B_A <- 0.2

pr_testA_A <- 0.8
pr_testA <- pr_A_A * prior_AorB + pr_A_B * prior_AorB

pr_A_testA <- (pr_testA_A * prior_AorB) / pr_testA
pr_A_testA

[1] 0.6956522

Edit (28.07.2018): It was brought to my attention that the logic of the solution above isn’t quite clear, so here’s an attempt to put the code into words:

Bayes’s Theorem:
P(really species A|test says A) = P(test says A|really species A)*P(prior) / P(test says A)

or in further detail:

The probability of a true positive identification of species A when the test says “A”
=
(probability of the test saying A when it really is species A [0.8] * probability of it being species A [our prior of 0.5])
divided by
(probability of the test saying A under any circumstances, regardless of whether it’s a true positive or false negative result)

For Part 2, I will simply replace the prior with what we learned in 2H3.

pr_A_A <- 0.8
pr_A_B <- 0.35
pr_B_B <- 0.65
pr_B_A <- 0.2

pr_testA_A <- 0.8
pr_testA <- pr_A_A * pr_A_TN + pr_A_B * (1 - pr_A_TN)

pr_A_testA <- (pr_testA_A * pr_A_TN) / pr_testA

pr_A_testA_TN <- (pr_testA_A * pr_A_TN) / pr_testA
pr_A_testA_TN

[1] 0.5625

When considering our prior knowledge of the actual births, the confidence of the test is now reduced from almost 70% to about 56% that our panda descends from species A (although this is still higher than the probability of 33-36% without the genetic test).

Teilen mit: