Peter Ralph
15 January 2018 – Advanced Biological Statistics
Credible intervals
Hierarchical coins
Introduction to MCMC
Stan
If we want a point estimate:
These all convey “where the posterior distribution is”, more or less.
What about uncertainty?
Definition: A 95% credible region is a portion of parameter space having a total of 95% of the posterior probability.
(same with other numbers for “95%”)
If we construct a 95% credible interval for \(\theta\) for each of many datasets; and the coin in each dataset has \(\theta\) drawn independently from the prior, then the true \(\theta\) will fall in its credible interval for 95% of the datasets.
If we construct a 95% credible interval for \(\theta\) with a dataset, and the distribution of the coin’s true \(\theta\) across many parallel universes is given by the prior, then the true \(\theta\) will be in the credible interval in 95% of those universes.
Given my prior beliefs (prior distribution), the posterior distribution is the most rational\({}^*\) way to update my beliefs to account for the data.
\({}^*\) if you do this many times you will be wrong least often
\({}^*\) or you will be wrong in the fewest possible universes
Definition: The “95% highest density interval” is the 95% credible interval with the highest posterior probability density at each point.
((back to the simple example))
Suppose now we have data from \(n\) different coins from the same source (or, sequences, or …)
Suppose now we have data from \(n\) different coins from the same source. We don’t assume they have the same \(\theta\), but don’t know what it’s distribution is, so try to learn it.
\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\text{mode}=\omega, \text{conc}=\kappa) \\ \omega &\sim \Beta(A, B) \\ \kappa &\sim \Gam(S, R) \end{aligned}\]
note: The “mode” and “concentration” are related to the shape parameters by: \(\alpha = \omega (\kappa - 2) + 1\) and \(\beta = (1 - \omega) (\kappa - 2) + 1\).
What is different between:
Pick a value of \(\theta\) at random from \(\Beta(3,1)\). Flip one thousand \(\theta\)-coins, 500 times each.
Pick one thousand random \(\theta_i \sim \Beta(3,1)\) values. Flip one thousand coins, one for each \(\theta_i\), 500 times each.
In which case does knowing the outcome of one coin give you information about the likely behavior of other coins?
ncoins <- 1000
nflips <- 100
theta1 <- rbeta(1,3,1)
binom_Z <- rbinom(ncoins, size=nflips, prob=theta1)
theta2 <- rbeta(ncoins,3,1)
bb_Z <- rbinom(ncoins, size=nflips, prob=theta2)
hist(binom_Z, breaks=30, col=adjustcolor("blue", 0.5), main='', xlim=c(0,nflips), freq=FALSE, xlab='number of Heads')
hist(bb_Z, breaks=30, col=adjustcolor("red", 0.5), add=TRUE, freq=FALSE)
\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\text{mode}=\omega, \text{conc}=\kappa) \\ \omega &\sim \Beta(A, B) \\ \kappa &\sim \Gam(S, R) \end{aligned}\]
Goal: find the posterior distribution of \(\omega\), \(\kappa\), and \(\theta_1, \ldots, \theta_n\).
Problem: we don’t have a nice mathematical expression for this posterior distribution.
Goal: Given:
“find”/ask questions of the posterior distribution on \(\theta\),
\[\begin{aligned} p(\theta \given D) = \frac{ p(D \given \theta) p(\theta) }{ p(D) } . \end{aligned}\]
Problem: usually we can’t write down an expression for this (because of the “\(p(D)\)”).
Solution: we’ll make up a way to draw random samples from it.
Toy example:
(return to beta-binomial coin example)
Do we think that \(\theta < 0.5\)?
(before:)
(now:)
i.e., “random-walk-based stochastic integration”
Example: Gibbs sampling for uniform distribution on a region. (picture)
Produces a random sequence of samples \(\theta_1, \theta_2, \ldots, \theta_N\).
At each step, starting at \(\theta_k\):
Propose a new location (nearby?): \(\theta_k'\)
Decide whether to accept it.
Set \(k \leftarrow k+1\); if \(k=N\) then stop.
The magic comes from doing proposals and acceptance so that the \(\theta\)’s are samples from the distribution we want.
Rules are chosen so that \(p(\theta \given D)\) is the stationary distribution (long-run average!) of the random walk (the “Markov chain”).
The chain must mix fast enough so the distribution of visited states converges to \(p(\theta \given D)\).
Because of autocorrelation, \((\theta_1, \theta_2, \ldots, \theta_N)\) are not \(N\) independent samples: they are roughly equivalent to \(N_\text{eff} < N\) independent samples.
For better mixing, acceptance probabilities should not be too high or too low.
Starting several chains far apart can help diagnose failure to mix: Gelman’s \(r\) (“shrink”) quantifies how different they are.
Three people, with randomness provided by others:
Pick a random \(\{N,S,E,W\}\).
Take a step in that direction,
Question: What distribution will this sample from?
Do this for 10 iterations. Have the chains mixed?
Now:
Pick a random \(\{N,S,E,W\}\).
Take a \(1+\Poisson(5)\) number of steps in that direction,
Does it mix faster?
Would \(1 + \Poisson(50)\) steps be better?
Imagine the walkers are on a hill, and:
Pick a random \(\{N,S,E,W\}\).
If
What would this do?
Thanks to Metropolis-Hastings, if “elevation” is \(p(\theta \given D)\), then setting \(p = p(\theta' \given D) / p(\theta \given D)\) makes the stationary distribution \(p(\theta \given D)\).
data {
// stuff you input
}
transformed data {
// stuff that's calculated from the data (just once, at the start)
}
parameters {
// stuff you want to learn the posterior distribution of
}
transformed parameters {
// stuff that's calculated from the parameters (at every step)
}
model {
// the action!
}
generated quantities {
// stuff you want computed also along the way
}
We’ve flipped a coin 10 times and got 6 Heads. We think the coin is close to fair, so put a \(\Beta(20,20)\) prior on it’s probability of heads, and want the posterior distribution.
\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\] Sample from \[\theta \given Z = 6\]
\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]
Sample from \[\theta \given Z = 6\]
data {
// stuff you input
}
parameters {
// stuff you want to learn
// the posterior distribution of
}
model {
// the action!
}
\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]
Sample from \[\theta \given Z = 6\]
data {
int N; // number of flips
int Z; // number of heads
}
parameters {
// stuff you want to learn
// the posterior distribution of
}
model {
// the action!
}
\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]
Sample from \[\theta \given Z = 6\]
data {
int N; // number of flips
int Z; // number of heads
}
parameters {
// probability of heads
real<lower=0,upper=1> theta;
}
model {
// the action!
}
\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]
Sample from \[\theta \given Z = 6\]
data {
int N; // number of flips
int Z; // number of heads
}
parameters {
// probability of heads
real<lower=0,upper=1> theta;
}
model {
Z ~ binomial(N, theta);
theta ~ beta(20, 20);
}
library(rstan)
fit <- stan(model_code=stan_block, # stan block from above
data=list(N=10, Z=6),
chains=3, iter=10000)
##
## SAMPLING FOR MODEL '5f3115ac9593cded393e67d2d0e3ce84' NOW (CHAIN 1).
##
## Gradient evaluation took 6e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 10000 [ 0%] (Warmup)
## Iteration: 1000 / 10000 [ 10%] (Warmup)
## Iteration: 2000 / 10000 [ 20%] (Warmup)
## Iteration: 3000 / 10000 [ 30%] (Warmup)
## Iteration: 4000 / 10000 [ 40%] (Warmup)
## Iteration: 5000 / 10000 [ 50%] (Warmup)
## Iteration: 5001 / 10000 [ 50%] (Sampling)
## Iteration: 6000 / 10000 [ 60%] (Sampling)
## Iteration: 7000 / 10000 [ 70%] (Sampling)
## Iteration: 8000 / 10000 [ 80%] (Sampling)
## Iteration: 9000 / 10000 [ 90%] (Sampling)
## Iteration: 10000 / 10000 [100%] (Sampling)
##
## Elapsed Time: 0.033406 seconds (Warm-up)
## 0.029742 seconds (Sampling)
## 0.063148 seconds (Total)
##
##
## SAMPLING FOR MODEL '5f3115ac9593cded393e67d2d0e3ce84' NOW (CHAIN 2).
##
## Gradient evaluation took 3e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 10000 [ 0%] (Warmup)
## Iteration: 1000 / 10000 [ 10%] (Warmup)
## Iteration: 2000 / 10000 [ 20%] (Warmup)
## Iteration: 3000 / 10000 [ 30%] (Warmup)
## Iteration: 4000 / 10000 [ 40%] (Warmup)
## Iteration: 5000 / 10000 [ 50%] (Warmup)
## Iteration: 5001 / 10000 [ 50%] (Sampling)
## Iteration: 6000 / 10000 [ 60%] (Sampling)
## Iteration: 7000 / 10000 [ 70%] (Sampling)
## Iteration: 8000 / 10000 [ 80%] (Sampling)
## Iteration: 9000 / 10000 [ 90%] (Sampling)
## Iteration: 10000 / 10000 [100%] (Sampling)
##
## Elapsed Time: 0.029308 seconds (Warm-up)
## 0.029064 seconds (Sampling)
## 0.058372 seconds (Total)
##
##
## SAMPLING FOR MODEL '5f3115ac9593cded393e67d2d0e3ce84' NOW (CHAIN 3).
##
## Gradient evaluation took 4e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 10000 [ 0%] (Warmup)
## Iteration: 1000 / 10000 [ 10%] (Warmup)
## Iteration: 2000 / 10000 [ 20%] (Warmup)
## Iteration: 3000 / 10000 [ 30%] (Warmup)
## Iteration: 4000 / 10000 [ 40%] (Warmup)
## Iteration: 5000 / 10000 [ 50%] (Warmup)
## Iteration: 5001 / 10000 [ 50%] (Sampling)
## Iteration: 6000 / 10000 [ 60%] (Sampling)
## Iteration: 7000 / 10000 [ 70%] (Sampling)
## Iteration: 8000 / 10000 [ 80%] (Sampling)
## Iteration: 9000 / 10000 [ 90%] (Sampling)
## Iteration: 10000 / 10000 [100%] (Sampling)
##
## Elapsed Time: 0.029373 seconds (Warm-up)
## 0.031705 seconds (Sampling)
## 0.061078 seconds (Total)
lp__
is the log posterior density. Note n_eff
.
## Inference for Stan model: 5f3115ac9593cded393e67d2d0e3ce84.
## 3 chains, each with iter=10000; warmup=5000; thin=1;
## post-warmup draws per chain=5000, total post-warmup draws=15000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## theta 0.52 0.00 0.07 0.39 0.47 0.52 0.57 0.65 5162 1
## lp__ -35.11 0.01 0.69 -37.05 -35.28 -34.85 -34.67 -34.62 7664 1
##
## Samples were drawn using NUTS(diag_e) at Tue Jan 16 09:26:33 2018.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Fuzzy caterpillars are good.
Stan uses ggplot2.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
What’s the posterior probability that \(\theta < 0.5\)?
## [1] 0.3792
## [1] 0.3555356
We’ve flipped a coin 100 times and got 23 Heads. We don’t want the prior to affect our results, so put a \(\Beta(1,1)\) prior on it’s probability of heads, and want the posterior distribution.
Question: What’s the posterior probability that \(\theta < 0.15\)?
If \(T \sim \Exp(\text{rate}=\lambda)\), then
\[\begin{aligned} \P\{ T \in dt \} = \lambda e^{-\lambda t} dt . \end{aligned}\]
\(T\) can be any nonnegative real number.
\(T\) is memoryless: \[\begin{aligned} \P\{ T > x + y \given T > x \} = \P\{ T > y \} . \end{aligned}\]
A machine produces \(n\) widgets per second; each widget has probability \(\lambda/n\) of being broken. The time until the first broken widget appears (in seconds) is approximately \(\sim \Exp(\lambda)\).
If \(S \sim \Gam(\text{shape}=\alpha, \text{rate}=\lambda)\), then
\[\begin{aligned} \P\{ S \in dt \} = \frac{\alpha^\lambda}{\Gam(\alpha)} t^{\alpha - 1} e^{-\lambda t} dt . \end{aligned}\]
If \(T_1, \ldots, T_k\) are independent \(\Exp(\lambda)\), then \(S = T_1 + \cdots + T_k\) is \(\Gam(k, \lambda)\).
A machine produces \(n\) widgets per second; each widget has probability \(\lambda/n\) of being broken. The time until the \(k^\text{th}\) broken widget appears (in seconds) is approximately \(\sim \Gam(k, \lambda)\).
We have a dataset of batting averages of baseball players, having
## Player PriPos Hits AtBats PlayerNumber PriPosNumber
## 1 Fernando Abad Pitcher 1 7 1 1
## 2 Bobby Abreu Left Field 53 219 2 7
## 3 Tony Abreu 2nd Base 18 70 3 4
## 4 Dustin Ackley 2nd Base 137 607 4 4
## 5 Matt Adams 1st Base 21 86 5 3
## 6 Nathan Adcock Pitcher 0 1 6 1
The overall batting average of the 948 players is 0.2546597.
Here is the average by position.
batting %>% group_by(PriPos) %>%
summarise(num=n(), BatAvg=sum(Hits)/sum(AtBats)) %>%
select(PriPos, num, BatAvg)
## # A tibble: 9 x 3
## PriPos num BatAvg
## <fctr> <int> <dbl>
## 1 1st Base 81 0.259
## 2 2nd Base 72 0.256
## 3 3rd Base 75 0.265
## 4 Catcher 103 0.247
## 5 Center Field 67 0.264
## 6 Left Field 103 0.259
## 7 Pitcher 324 0.129
## 8 Right Field 60 0.264
## 9 Shortstop 63 0.255
What’s the overall batting average?
Do some positions tend to be better batters?
How much variation is there?
first_model <- "
data {
int N;
int hits[N];
int at_bats[N];
}
parameters {
real<lower=0, upper=1> theta;
}
model {
hits ~ binomial(at_bats, theta);
theta ~ beta(1, 1);
} "
first_fit <- stan(model_code=first_model, chains=3, iter=1000,
data=list(N=nrow(batting),
hits=batting$Hits,
at_bats=batting$AtBats))
##
## SAMPLING FOR MODEL '19bcc9c3db3197c73e8a92569f4a671f' NOW (CHAIN 1).
##
## Gradient evaluation took 3.3e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 1000 [ 0%] (Warmup)
## Iteration: 100 / 1000 [ 10%] (Warmup)
## Iteration: 200 / 1000 [ 20%] (Warmup)
## Iteration: 300 / 1000 [ 30%] (Warmup)
## Iteration: 400 / 1000 [ 40%] (Warmup)
## Iteration: 500 / 1000 [ 50%] (Warmup)
## Iteration: 501 / 1000 [ 50%] (Sampling)
## Iteration: 600 / 1000 [ 60%] (Sampling)
## Iteration: 700 / 1000 [ 70%] (Sampling)
## Iteration: 800 / 1000 [ 80%] (Sampling)
## Iteration: 900 / 1000 [ 90%] (Sampling)
## Iteration: 1000 / 1000 [100%] (Sampling)
##
## Elapsed Time: 0.054316 seconds (Warm-up)
## 0.039323 seconds (Sampling)
## 0.093639 seconds (Total)
##
##
## SAMPLING FOR MODEL '19bcc9c3db3197c73e8a92569f4a671f' NOW (CHAIN 2).
##
## Gradient evaluation took 2.5e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 1000 [ 0%] (Warmup)
## Iteration: 100 / 1000 [ 10%] (Warmup)
## Iteration: 200 / 1000 [ 20%] (Warmup)
## Iteration: 300 / 1000 [ 30%] (Warmup)
## Iteration: 400 / 1000 [ 40%] (Warmup)
## Iteration: 500 / 1000 [ 50%] (Warmup)
## Iteration: 501 / 1000 [ 50%] (Sampling)
## Iteration: 600 / 1000 [ 60%] (Sampling)
## Iteration: 700 / 1000 [ 70%] (Sampling)
## Iteration: 800 / 1000 [ 80%] (Sampling)
## Iteration: 900 / 1000 [ 90%] (Sampling)
## Iteration: 1000 / 1000 [100%] (Sampling)
##
## Elapsed Time: 0.046764 seconds (Warm-up)
## 0.038286 seconds (Sampling)
## 0.08505 seconds (Total)
##
##
## SAMPLING FOR MODEL '19bcc9c3db3197c73e8a92569f4a671f' NOW (CHAIN 3).
##
## Gradient evaluation took 2.5e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 1000 [ 0%] (Warmup)
## Iteration: 100 / 1000 [ 10%] (Warmup)
## Iteration: 200 / 1000 [ 20%] (Warmup)
## Iteration: 300 / 1000 [ 30%] (Warmup)
## Iteration: 400 / 1000 [ 40%] (Warmup)
## Iteration: 500 / 1000 [ 50%] (Warmup)
## Iteration: 501 / 1000 [ 50%] (Sampling)
## Iteration: 600 / 1000 [ 60%] (Sampling)
## Iteration: 700 / 1000 [ 70%] (Sampling)
## Iteration: 800 / 1000 [ 80%] (Sampling)
## Iteration: 900 / 1000 [ 90%] (Sampling)
## Iteration: 1000 / 1000 [100%] (Sampling)
##
## Elapsed Time: 0.043877 seconds (Warm-up)
## 0.037571 seconds (Sampling)
## 0.081448 seconds (Total)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
pos_model <- "
data {
int N;
int hits[N];
int at_bats[N];
int npos; // number of positions
int position[N];
}
parameters {
real<lower=0, upper=1> theta[npos];
}
model {
real theta_vec[N];
for (k in 1:N) {
theta_vec[k] = theta[position[k]];
}
hits ~ binomial(at_bats, theta_vec);
theta ~ beta(1, 1);
} "
pos_fit <- stan(model_code=pos_model, chains=3, iter=1000,
data=list(N=nrow(batting),
hits=batting$Hits,
at_bats=batting$AtBats,
npos=nlevels(batting$PriPos),
position=as.numeric(batting$PriPos)))
##
## SAMPLING FOR MODEL '44fb1922c86bd3c60c0d669cfb345a15' NOW (CHAIN 1).
##
## Gradient evaluation took 8.1e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.81 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 1000 [ 0%] (Warmup)
## Iteration: 100 / 1000 [ 10%] (Warmup)
## Iteration: 200 / 1000 [ 20%] (Warmup)
## Iteration: 300 / 1000 [ 30%] (Warmup)
## Iteration: 400 / 1000 [ 40%] (Warmup)
## Iteration: 500 / 1000 [ 50%] (Warmup)
## Iteration: 501 / 1000 [ 50%] (Sampling)
## Iteration: 600 / 1000 [ 60%] (Sampling)
## Iteration: 700 / 1000 [ 70%] (Sampling)
## Iteration: 800 / 1000 [ 80%] (Sampling)
## Iteration: 900 / 1000 [ 90%] (Sampling)
## Iteration: 1000 / 1000 [100%] (Sampling)
##
## Elapsed Time: 0.250756 seconds (Warm-up)
## 0.228964 seconds (Sampling)
## 0.47972 seconds (Total)
##
##
## SAMPLING FOR MODEL '44fb1922c86bd3c60c0d669cfb345a15' NOW (CHAIN 2).
##
## Gradient evaluation took 7.4e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.74 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 1000 [ 0%] (Warmup)
## Iteration: 100 / 1000 [ 10%] (Warmup)
## Iteration: 200 / 1000 [ 20%] (Warmup)
## Iteration: 300 / 1000 [ 30%] (Warmup)
## Iteration: 400 / 1000 [ 40%] (Warmup)
## Iteration: 500 / 1000 [ 50%] (Warmup)
## Iteration: 501 / 1000 [ 50%] (Sampling)
## Iteration: 600 / 1000 [ 60%] (Sampling)
## Iteration: 700 / 1000 [ 70%] (Sampling)
## Iteration: 800 / 1000 [ 80%] (Sampling)
## Iteration: 900 / 1000 [ 90%] (Sampling)
## Iteration: 1000 / 1000 [100%] (Sampling)
##
## Elapsed Time: 0.225025 seconds (Warm-up)
## 0.420713 seconds (Sampling)
## 0.645738 seconds (Total)
##
##
## SAMPLING FOR MODEL '44fb1922c86bd3c60c0d669cfb345a15' NOW (CHAIN 3).
##
## Gradient evaluation took 6.4e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.64 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 1000 [ 0%] (Warmup)
## Iteration: 100 / 1000 [ 10%] (Warmup)
## Iteration: 200 / 1000 [ 20%] (Warmup)
## Iteration: 300 / 1000 [ 30%] (Warmup)
## Iteration: 400 / 1000 [ 40%] (Warmup)
## Iteration: 500 / 1000 [ 50%] (Warmup)
## Iteration: 501 / 1000 [ 50%] (Sampling)
## Iteration: 600 / 1000 [ 60%] (Sampling)
## Iteration: 700 / 1000 [ 70%] (Sampling)
## Iteration: 800 / 1000 [ 80%] (Sampling)
## Iteration: 900 / 1000 [ 90%] (Sampling)
## Iteration: 1000 / 1000 [100%] (Sampling)
##
## Elapsed Time: 0.234072 seconds (Warm-up)
## 0.187733 seconds (Sampling)
## 0.421805 seconds (Total)
Every individual different:
\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\omega_{p_i}, \kappa_{p_i}) \\ \omega_p &\sim \Beta(1, 1) \\ \kappa_p &\sim \Gam(0.1, 0.1) . \end{aligned}\]
Variable types in Stan:
int x; // an integer
int y[10]; // ten integers
real z; // a number
real z[2,5]; // a 2x5 array of numbers
vector u[10]; // length 10 vector
matrix v[10,10]; // 10x10 matrix
vector[10] w[10]; // ten length 10 vectors
;
pos_model <- "
data {
int N; // number of players
int hits[N];
int at_bats[N];
int npos; // number of positions
int position[N];
}
parameters {
real<lower=0, upper=1> theta[N];
real<lower=0, upper=1> omega[npos];
real<lower=0> kappa[npos];
}
model {
real alpha[N];
real beta[N];
for (i in 1:N) {
alpha[i] = omega[position[i]] * kappa[position[i]];
beta[i] = (1 - omega[position[i]]) * kappa[position[i]];
}
hits ~ binomial(at_bats, theta);
for (i in 1:N) {
theta[i] ~ beta(alpha[i], beta[i]);
}
omega ~ beta(1,1);
kappa ~ gamma(0.1,0.1);
} "
(discussion / see Kruschke Ch. 9)
Suppose that I have a large pot of coins that are all similar to each other. I flip each one ten times, and record the number of Heads. What is each coin’s probability of Heads?
Treated separately, we would be very uncertain about each coin.
Together, they should tell us very accurately what are likely values of \(\theta\).
This information can improve the estimate of each separate \(\theta\).
How does shrinkage affect the baseball inference?