Prior distributions and uncertainty

Peter Ralph

8 January 2018 – Advanced Biological Statistics

Today

Course structure and introductions
- short survey
Probability: review and notation
Bayesian analysis
Application: Beta distributed coins

Biased coins

a motivating example

Suppose I have two trick coins:

one (coin A) comes up heads 75% of the time, and
the other (coin B) only 25% of the time.

But, I lost one and I don’t know which! So, I flip it 10 times and get 6 heads. Which is it, and how sure are you?

Possible answers:

Er, probably coin #1?
Well, \[\begin{aligned} \P\{ \text{6 H in 10 flips} \given \text{coin A} \} &= \binom{10}{6} (.75)^6 (.25)^4 \\ &= 0.146 \end{aligned}\] and \[\begin{aligned} \P\{ \text{6 H in 10 flips} \given \text{coin B} \} &= \binom{10}{6} (.25)^6 (.75)^4 \\ &= 0.016 \end{aligned}\] … so, probably coin A?

For a precise answer…

Before flipping, each coin seems equally likely. Then

\[\begin{aligned} \P\{ \text{coin A} \given \text{6 H in 10 flips} \} &= \frac{ \frac{1}{2} \times 0.146 }{ \frac{1}{2} \times 0.146 + \frac{1}{2} \times 0.016 } \\ &= 0.9 \end{aligned}\]

Probability: review and notation

Probability rules:

Probabilities are proportions: $\hspace{2em} 0 \le \P\{A\} \le 1$
Everything: $\hspace{2em} \P\{ \Omega \} = 1$
Complements: $\hspace{2em} \P\{ \text{not } A\} = 1 - \P\{A\}$
Disjoint events: If $\hspace{2em} \P\{A \text{ and } B\} = 0$ then $\hspace{2em} \P\{A \text{ or } B\} = \P\{A\} + \P\{B\}$.
Independence: $A$ and $B$ are independent iff $\P\{A \text{ and } B\} = \P\{A\} \P\{B\}$.
Conditional probability: \[\P\{A \given B\} = \frac{\P\{A \text{ and } B\}}{ \P\{B\} }\]

Bayes’ rule

A consequence is

\[\P\{B \given A\} = \frac{\P\{B\} \P\{A \given B\}}{ \P\{A\} } .\]

In “Bayesian statistics”:

$B$: possible model
$A$: data
$\P\{B\}$: prior weight on model $B$
$\P\{A \given B\}$: likelihood of data under $B$
$\P\{B\} \P\{A \given B\}$: posterior weight on $B$
$\P\{A\}$: total sum of posterior weights

Group problem

More coins

Suppose instead I had 9 coins, with probabilities 10%, 20%, …, 80%, 90%; as before I flipped on 10 times and got 6 heads. For each $\theta$ in $0.1, 0.2, \ldots, 0.8, 0.9$, find \[\begin{aligned} \P\{\text{ coin has prob $\theta$ } \given \text{ 6 H in 10 flips } \} \end{aligned}\] and plot these as a bar plot.

Question: which coin(s) is it, and how sure are you? (and, what do you mean when you say how sure you are)

Time allowing, do it again with 99 coins. And 999 coins. Does your answer change?

Breaking it down

Uniform prior

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk the_prior

Weak prior

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk weak_prior

Strong prior

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk strong_prior

The likelihod: 6 H in 10 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk ten_flips

The likelihod: 30 H in 50 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk fifty_flips

The likelihod: 60 H in 100 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk 100_flips

The likelihod: 6,000 H in 10,000 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk ten_thou_flips

Stochastic Minute

The Beta Distribution

If \[P \sim \text{Beta}(a,b)\] then $P$ has probability density \[p(\theta) = \frac{ \theta^{a-1} (1 - \theta)^{b-1} }{ B(a,b) } . \]

Takes values between 0 and 1.
If $U_{(1)} < U_{(2)} < \cdots < U_{(n)}$ are sorted, independent $\text{Unif}[0,1]$ then $U_{(k)} \sim \text{Beta}(k, n-k+1)$.
Mean: $a/(a+b)$.
Larger $a+b$ is more tightly concentrated (like $1/\sqrt{a+b}$)

((Thursday))

Today

A look at the homework
What’s the question? (and how do we answer it?)
Beta-binomial coins.
Simulation exercise.
Online analysis example.
Reporting uncertainty.
Normal approximation

A question

What is the right answer to the “coin question”?

Recall: there were nine possible values of $\theta$.

Which coin is it, and how sure are you?

Possible types of answer:

“best guess”
“range of values”
“don’t know”

Give examples of when each type of answer is the right one.

theta <- (1:9)/10
prior <- rep(1/9, 9)
likelihood <- dbinom(6, size=10, prob=theta)
posterior <- prior*likelihood/sum(prior*likelihood)
names(posterior) <- theta
barplot(posterior, xlab='true prob of heads', main='posterior probability')

plot of chunk plot_pos

Unknown coins

Motivating example

Now suppose we want to estimate the probability of heads for a coin without knowing the possible values. (or, a disease incidence, or error rate in an experiment, …)

We flip it $n$ times and get $z$ Heads.

The likelihood of this, given the prob-of-heads $\theta$, is: \[p(z \given \theta) = \binom{n}{z}\theta^z (1-\theta)^{n-z} . \]

How to weight the possible $\theta$? Need a flexible set of weighting functions, i.e., prior distributions on $[0,1]$.

Beta distributions.

What would we use if:

the coin is probably close to fair.
the disease is probably quite rare.
no idea whatsoever.

Beta-Binomial Bayesian analysis

If \[\begin{aligned} P &\sim \text{Beta}(a,b) \\ Z &\sim \text{Binom}(n,P) , \end{aligned}\] then by Bayes’ rule: \[\begin{aligned} \P\{ P = \theta \given Z = z\} &= \frac{\P\{Z = z \given P = \theta \} \P\{P = \theta\}}{\P\{Z = z\}} \\ &= \frac{ \binom{n}{z}\theta^z (1-\theta)^{n-z} \times \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)} }{ \text{(something)} } \\ &= \text{(something else)} \times \theta^{a + z - 1} (1-\theta)^{b + n - z - 1} . \end{aligned}\]

“Miraculously” (the Beta is the conjugate prior for the Binomial), \[\begin{aligned} P \given Z = z \sim \text{Beta}(a+z, b+n-z) . \end{aligned}\]

Exercise: check this.

Simulate $10^6$ coin probabilities, called $\theta$, from Beta(5,5). (rbeta)
For each coin, simulate 10 flips. (rbinom)
Make a histogram of the true probabilities (values of $\theta$) of only those coins having 3 of 10 heads.
Compare the distribution to Beta($a$,$b$) – with what $a$ and $b$? (dbeta)
Explain what happened.

Beta-binomial: example analysis

Discuss/demonstration:

We flip an odd-looking coin 100 times, and get 65 heads. What is it’s true* probability of heads?

True = ??
What prior to use?
Is it reasonable that $\theta = 1/2$?
Best guess at $\theta$?
How far off are we, probably?
How much does the answer depend on the prior?
Does our procedure work on simulated data?