Adding levels of randomness

Peter Ralph

15 January 2018 – Advanced Biological Statistics

Outline

Credible intervals
- applied to the Beta-Binomial example
Hierarchical coins
Introduction to MCMC
Stan

Reporting uncertainty

How do we communicate results?

If we want a point estimate:

posterior mean,
posterior median, or
maximum a posteriori estimate (“MAP”: highest posterior density).

These all convey “where the posterior distribution is”, more or less.

What about uncertainty?

Credible region

Definition: A 95% credible region is a portion of parameter space having a total of 95% of the posterior probability.

(same with other numbers for “95%”)

Interpretation #1

If we construct a 95% credible interval for \(\theta\) for each of many datasets; and the coin in each dataset has \(\theta\) drawn independently from the prior, then the true \(\theta\) will fall in its credible interval for 95% of the datasets.

Interpretation #2

If we construct a 95% credible interval for \(\theta\) with a dataset, and the distribution of the coin’s true \(\theta\) across many parallel universes is given by the prior, then the true \(\theta\) will be in the credible interval in 95% of those universes.

Interpretation #3

Given my prior beliefs (prior distribution), the posterior distribution is the most rational\({}^*\) way to update my beliefs to account for the data.

\({}^*\) if you do this many times you will be wrong least often

\({}^*\) or you will be wrong in the fewest possible universes

But which credible interval?

Definition: The “95% highest density interval” is the 95% credible interval with the highest posterior probability density at each point.

((back to the simple example))

Hierarchical coins

Motivating problem: more coins

Suppose now we have data from \(n\) different coins from the same source (or, sequences, or …)

Suppose now we have data from \(n\) different coins from the same source. We don’t assume they have the same \(\theta\), but don’t know what it’s distribution is, so try to learn it.

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\text{mode}=\omega, \text{conc}=\kappa) \\ \omega &\sim \Beta(A, B) \\ \kappa &\sim \Gam(S, R) \end{aligned}\]

note: The “mode” and “concentration” are related to the shape parameters by: \(\alpha = \omega (\kappa - 2) + 1\) and \(\beta = (1 - \omega) (\kappa - 2) + 1\).

Binomial versus Beta-Binomial

What is different between:

Pick a value of \(\theta\) at random from \(\Beta(3,1)\). Flip one thousand \(\theta\)-coins, 500 times each.
Pick one thousand random \(\theta_i \sim \Beta(3,1)\) values. Flip one thousand coins, one for each \(\theta_i\), 500 times each.

In which case does knowing the outcome of one coin give you information about the likely behavior of other coins?

ncoins <- 1000
nflips <- 100
theta1 <- rbeta(1,3,1)
binom_Z <- rbinom(ncoins, size=nflips, prob=theta1)
theta2 <- rbeta(ncoins,3,1)
bb_Z <- rbinom(ncoins, size=nflips, prob=theta2)
hist(binom_Z, breaks=30, col=adjustcolor("blue", 0.5), main='', xlim=c(0,nflips), freq=FALSE, xlab='number of Heads')
hist(bb_Z, breaks=30, col=adjustcolor("red", 0.5), add=TRUE, freq=FALSE)

plot of chunk beta_or_binom

A problem

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\text{mode}=\omega, \text{conc}=\kappa) \\ \omega &\sim \Beta(A, B) \\ \kappa &\sim \Gam(S, R) \end{aligned}\]

Goal: find the posterior distribution of \(\omega\), \(\kappa\), and \(\theta_1, \ldots, \theta_n\).

Problem: we don’t have a nice mathematical expression for this posterior distribution.

Markov Chain Monte Carlo

When you can’t do the integrals: MCMC

Goal: Given:

a model with parameters \(\theta\),
a prior distribution \(p(\theta)\) on \(\theta\), and
data, \(D\),

“find”/ask questions of the posterior distribution on \(\theta\),

\[\begin{aligned} p(\theta \given D) = \frac{ p(D \given \theta) p(\theta) }{ p(D) } . \end{aligned}\]

Problem: usually we can’t write down an expression for this (because of the “\(p(D)\)”).

Solution: we’ll make up a way to draw random samples from it.

Toy example:

(return to beta-binomial coin example)

Do we think that \(\theta < 0.5\)?

(before:)

pbeta(0.5, post_a, post_b)

(now:)

mean(rbeta(1e6, post_a, post_b) < 0.5)

How? Markov chain Monte Carlo!

i.e., “random-walk-based stochastic integration”

Example: Gibbs sampling for uniform distribution on a region. (picture)

Overview of MCMC

Produces a random sequence of samples \(\theta_1, \theta_2, \ldots, \theta_N\).

Begin somewhere (at \(\theta_1\)).

At each step, starting at \(\theta_k\):

Propose a new location (nearby?): \(\theta_k'\)
Decide whether to accept it.
- if so: set \(\theta_{k+1} \leftarrow \theta_k'\)
- if not: set \(\theta_{k+1} \leftarrow \theta_k\)
Set \(k \leftarrow k+1\); if \(k=N\) then stop.

The magic comes from doing proposals and acceptance so that the \(\theta\)’s are samples from the distribution we want.

Key concepts

Rules are chosen so that \(p(\theta \given D)\) is the stationary distribution (long-run average!) of the random walk (the “Markov chain”).
The chain must mix fast enough so the distribution of visited states converges to \(p(\theta \given D)\).
Because of autocorrelation, \((\theta_1, \theta_2, \ldots, \theta_N)\) are not \(N\) independent samples: they are roughly equivalent to \(N_\text{eff} < N\) independent samples.
For better mixing, acceptance probabilities should not be too high or too low.
Starting several chains far apart can help diagnose failure to mix: Gelman’s \(r\) (“shrink”) quantifies how different they are.

On your feet

Three people, with randomness provided by others:

Pick a random \(\{N,S,E,W\}\).
Take a step in that direction,
- unless you’d run into a wall or a table.

Question: What distribution will this sample from?

Do this for 10 iterations. Have the chains mixed?

Now:

Pick a random \(\{N,S,E,W\}\).
Take a \(1+\Poisson(5)\) number of steps in that direction,
- unless you’d run into a wall or a table.

Does it mix faster?

Would \(1 + \Poisson(50)\) steps be better?

How it works

Imagine the walkers are on a hill, and:

Pick a random \(\{N,S,E,W\}\).
If
- the step is uphill, then take it.
- the step is downhill, then flip a \(p\)-coin; if you get Heads, stay were you are.

What would this do?

Thanks to Metropolis-Hastings, if “elevation” is \(p(\theta \given D)\), then setting \(p = p(\theta' \given D) / p(\theta \given D)\) makes the stationary distribution \(p(\theta \given D)\).

MC Stan

“Quick and easy” MCMC: Stan

The skeletal Stan program

data {
    // stuff you input
}
transformed data {
    // stuff that's calculated from the data (just once, at the start)
}
parameters {
    // stuff you want to learn the posterior distribution of
}
transformed parameters {
    // stuff that's calculated from the parameters (at every step)
}
model {
    // the action!
}
generated quantities {
    // stuff you want computed also along the way
}

Beta-Binomial with Stan

First, in words:

We’ve flipped a coin 10 times and got 6 Heads. We think the coin is close to fair, so put a \(\Beta(20,20)\) prior on it’s probability of heads, and want the posterior distribution.

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\] Sample from \[\theta \given Z = 6\]

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

Sample from \[\theta \given Z = 6\]

data {
    // stuff you input
}
parameters {
    // stuff you want to learn 
    // the posterior distribution of
}
model {
    // the action!
}

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

Sample from \[\theta \given Z = 6\]

data {
    int N;   // number of flips
    int Z;   // number of heads
}
parameters {
    // stuff you want to learn 
    // the posterior distribution of
}
model {
    // the action!
}

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

Sample from \[\theta \given Z = 6\]

data {
    int N;   // number of flips
    int Z;   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta;  
}
model {
    // the action!
}

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

Sample from \[\theta \given Z = 6\]

data {
    int N;   // number of flips
    int Z;   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta;
}
model {
    Z ~ binomial(N, theta);
    theta ~ beta(20, 20);
}

Running the MCMC: rstan

library(rstan)
fit <- stan(model_code=stan_block,  # stan block from above
            data=list(N=10, Z=6),
            chains=3, iter=10000)

## 
## SAMPLING FOR MODEL '5f3115ac9593cded393e67d2d0e3ce84' NOW (CHAIN 1).
## 
## Gradient evaluation took 6e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:    1 / 10000 [  0%]  (Warmup)
## Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Iteration: 10000 / 10000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.033406 seconds (Warm-up)
##                0.029742 seconds (Sampling)
##                0.063148 seconds (Total)
## 
## 
## SAMPLING FOR MODEL '5f3115ac9593cded393e67d2d0e3ce84' NOW (CHAIN 2).
## 
## Gradient evaluation took 3e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:    1 / 10000 [  0%]  (Warmup)
## Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Iteration: 10000 / 10000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.029308 seconds (Warm-up)
##                0.029064 seconds (Sampling)
##                0.058372 seconds (Total)
## 
## 
## SAMPLING FOR MODEL '5f3115ac9593cded393e67d2d0e3ce84' NOW (CHAIN 3).
## 
## Gradient evaluation took 4e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:    1 / 10000 [  0%]  (Warmup)
## Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Iteration: 10000 / 10000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.029373 seconds (Warm-up)
##                0.031705 seconds (Sampling)
##                0.061078 seconds (Total)

lp__ is the log posterior density. Note n_eff.

print(fit)

## Inference for Stan model: 5f3115ac9593cded393e67d2d0e3ce84.
## 3 chains, each with iter=10000; warmup=5000; thin=1; 
## post-warmup draws per chain=5000, total post-warmup draws=15000.
## 
##         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
## theta   0.52    0.00 0.07   0.39   0.47   0.52   0.57   0.65  5162    1
## lp__  -35.11    0.01 0.69 -37.05 -35.28 -34.85 -34.67 -34.62  7664    1
## 
## Samples were drawn using NUTS(diag_e) at Tue Jan 16 09:26:33 2018.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Fuzzy caterpillars are good.

stan_trace(fit)

plot of chunk trace_rstan

Stan uses ggplot2.

stan_hist(fit) + xlim(0,1)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk plot_rstan

What’s the posterior probability that \(\theta < 0.5\)?

samples <- extract(fit)
mean(samples$theta < 0.5)

## [1] 0.3792

# compare to analytic solution
pbeta(0.5, shape1=10+6, shape2=10+4)

## [1] 0.3555356

Your turn!

We’ve flipped a coin 100 times and got 23 Heads. We don’t want the prior to affect our results, so put a \(\Beta(1,1)\) prior on it’s probability of heads, and want the posterior distribution.

Question: What’s the posterior probability that \(\theta < 0.15\)?

Stochastic minute

Exponential, and Gamma

If \(T \sim \Exp(\text{rate}=\lambda)\), then

\[\begin{aligned} \P\{ T \in dt \} = \lambda e^{-\lambda t} dt . \end{aligned}\]

\(T\) can be any nonnegative real number.
\(T\) is memoryless: \[\begin{aligned} \P\{ T > x + y \given T > x \} = \P\{ T > y \} . \end{aligned}\]
A machine produces \(n\) widgets per second; each widget has probability \(\lambda/n\) of being broken. The time until the first broken widget appears (in seconds) is approximately \(\sim \Exp(\lambda)\).

If \(S \sim \Gam(\text{shape}=\alpha, \text{rate}=\lambda)\), then

\[\begin{aligned} \P\{ S \in dt \} = \frac{\alpha^\lambda}{\Gam(\alpha)} t^{\alpha - 1} e^{-\lambda t} dt . \end{aligned}\]

If \(T_1, \ldots, T_k\) are independent \(\Exp(\lambda)\), then \(S = T_1 + \cdots + T_k\) is \(\Gam(k, \lambda)\).
A machine produces \(n\) widgets per second; each widget has probability \(\lambda/n\) of being broken. The time until the \(k^\text{th}\) broken widget appears (in seconds) is approximately \(\sim \Gam(k, \lambda)\).

“Hierarchical Coins” with Stan

Baseball

We have a dataset of batting averages of baseball players, having

name
position
number of “at bats”
number of hits

batting <- read.csv("../demos/BattingAverage.csv", header=TRUE)
head(batting)

##          Player     PriPos Hits AtBats PlayerNumber PriPosNumber
## 1 Fernando Abad    Pitcher    1      7            1            1
## 2   Bobby Abreu Left Field   53    219            2            7
## 3    Tony Abreu   2nd Base   18     70            3            4
## 4 Dustin Ackley   2nd Base  137    607            4            4
## 5    Matt Adams   1st Base   21     86            5            3
## 6 Nathan Adcock    Pitcher    0      1            6            1

The overall batting average of the 948 players is 0.2546597.

Here is the average by position.

batting %>% group_by(PriPos) %>% 
    summarise(num=n(), BatAvg=sum(Hits)/sum(AtBats)) %>% 
    select(PriPos, num, BatAvg)

## # A tibble: 9 x 3
##   PriPos         num BatAvg
##   <fctr>       <int>  <dbl>
## 1 1st Base        81  0.259
## 2 2nd Base        72  0.256
## 3 3rd Base        75  0.265
## 4 Catcher        103  0.247
## 5 Center Field    67  0.264
## 6 Left Field     103  0.259
## 7 Pitcher        324  0.129
## 8 Right Field     60  0.264
## 9 Shortstop       63  0.255

Questions?

What’s the overall batting average?
Do some positions tend to be better batters?
How much variation is there?

Everyone is the same

first_model <- "
data {
    int N;
    int hits[N];
    int at_bats[N];
}
parameters {
    real<lower=0, upper=1> theta;
}
model {
    hits ~ binomial(at_bats, theta);
    theta ~ beta(1, 1);
} "
first_fit <- stan(model_code=first_model, chains=3, iter=1000,
                  data=list(N=nrow(batting),
                            hits=batting$Hits,
                            at_bats=batting$AtBats))

## 
## SAMPLING FOR MODEL '19bcc9c3db3197c73e8a92569f4a671f' NOW (CHAIN 1).
## 
## Gradient evaluation took 3.3e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 1000 [  0%]  (Warmup)
## Iteration: 100 / 1000 [ 10%]  (Warmup)
## Iteration: 200 / 1000 [ 20%]  (Warmup)
## Iteration: 300 / 1000 [ 30%]  (Warmup)
## Iteration: 400 / 1000 [ 40%]  (Warmup)
## Iteration: 500 / 1000 [ 50%]  (Warmup)
## Iteration: 501 / 1000 [ 50%]  (Sampling)
## Iteration: 600 / 1000 [ 60%]  (Sampling)
## Iteration: 700 / 1000 [ 70%]  (Sampling)
## Iteration: 800 / 1000 [ 80%]  (Sampling)
## Iteration: 900 / 1000 [ 90%]  (Sampling)
## Iteration: 1000 / 1000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.054316 seconds (Warm-up)
##                0.039323 seconds (Sampling)
##                0.093639 seconds (Total)
## 
## 
## SAMPLING FOR MODEL '19bcc9c3db3197c73e8a92569f4a671f' NOW (CHAIN 2).
## 
## Gradient evaluation took 2.5e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 1000 [  0%]  (Warmup)
## Iteration: 100 / 1000 [ 10%]  (Warmup)
## Iteration: 200 / 1000 [ 20%]  (Warmup)
## Iteration: 300 / 1000 [ 30%]  (Warmup)
## Iteration: 400 / 1000 [ 40%]  (Warmup)
## Iteration: 500 / 1000 [ 50%]  (Warmup)
## Iteration: 501 / 1000 [ 50%]  (Sampling)
## Iteration: 600 / 1000 [ 60%]  (Sampling)
## Iteration: 700 / 1000 [ 70%]  (Sampling)
## Iteration: 800 / 1000 [ 80%]  (Sampling)
## Iteration: 900 / 1000 [ 90%]  (Sampling)
## Iteration: 1000 / 1000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.046764 seconds (Warm-up)
##                0.038286 seconds (Sampling)
##                0.08505 seconds (Total)
## 
## 
## SAMPLING FOR MODEL '19bcc9c3db3197c73e8a92569f4a671f' NOW (CHAIN 3).
## 
## Gradient evaluation took 2.5e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 1000 [  0%]  (Warmup)
## Iteration: 100 / 1000 [ 10%]  (Warmup)
## Iteration: 200 / 1000 [ 20%]  (Warmup)
## Iteration: 300 / 1000 [ 30%]  (Warmup)
## Iteration: 400 / 1000 [ 40%]  (Warmup)
## Iteration: 500 / 1000 [ 50%]  (Warmup)
## Iteration: 501 / 1000 [ 50%]  (Sampling)
## Iteration: 600 / 1000 [ 60%]  (Sampling)
## Iteration: 700 / 1000 [ 70%]  (Sampling)
## Iteration: 800 / 1000 [ 80%]  (Sampling)
## Iteration: 900 / 1000 [ 90%]  (Sampling)
## Iteration: 1000 / 1000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.043877 seconds (Warm-up)
##                0.037571 seconds (Sampling)
##                0.081448 seconds (Total)

stan_hist(first_fit)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk start_res

Every pitcher is the same

pos_model <- "
data {
    int N;
    int hits[N];
    int at_bats[N];
    int npos; // number of positions
    int position[N];
}
parameters {
    real<lower=0, upper=1> theta[npos];
}
model {
    real theta_vec[N];
    for (k in 1:N) {
        theta_vec[k] = theta[position[k]];
    }
    hits ~ binomial(at_bats, theta_vec);
    theta ~ beta(1, 1);
} "
pos_fit <- stan(model_code=pos_model, chains=3, iter=1000,
                  data=list(N=nrow(batting),
                            hits=batting$Hits,
                            at_bats=batting$AtBats,
                            npos=nlevels(batting$PriPos),
                            position=as.numeric(batting$PriPos)))

## 
## SAMPLING FOR MODEL '44fb1922c86bd3c60c0d669cfb345a15' NOW (CHAIN 1).
## 
## Gradient evaluation took 8.1e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.81 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 1000 [  0%]  (Warmup)
## Iteration: 100 / 1000 [ 10%]  (Warmup)
## Iteration: 200 / 1000 [ 20%]  (Warmup)
## Iteration: 300 / 1000 [ 30%]  (Warmup)
## Iteration: 400 / 1000 [ 40%]  (Warmup)
## Iteration: 500 / 1000 [ 50%]  (Warmup)
## Iteration: 501 / 1000 [ 50%]  (Sampling)
## Iteration: 600 / 1000 [ 60%]  (Sampling)
## Iteration: 700 / 1000 [ 70%]  (Sampling)
## Iteration: 800 / 1000 [ 80%]  (Sampling)
## Iteration: 900 / 1000 [ 90%]  (Sampling)
## Iteration: 1000 / 1000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.250756 seconds (Warm-up)
##                0.228964 seconds (Sampling)
##                0.47972 seconds (Total)
## 
## 
## SAMPLING FOR MODEL '44fb1922c86bd3c60c0d669cfb345a15' NOW (CHAIN 2).
## 
## Gradient evaluation took 7.4e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.74 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 1000 [  0%]  (Warmup)
## Iteration: 100 / 1000 [ 10%]  (Warmup)
## Iteration: 200 / 1000 [ 20%]  (Warmup)
## Iteration: 300 / 1000 [ 30%]  (Warmup)
## Iteration: 400 / 1000 [ 40%]  (Warmup)
## Iteration: 500 / 1000 [ 50%]  (Warmup)
## Iteration: 501 / 1000 [ 50%]  (Sampling)
## Iteration: 600 / 1000 [ 60%]  (Sampling)
## Iteration: 700 / 1000 [ 70%]  (Sampling)
## Iteration: 800 / 1000 [ 80%]  (Sampling)
## Iteration: 900 / 1000 [ 90%]  (Sampling)
## Iteration: 1000 / 1000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.225025 seconds (Warm-up)
##                0.420713 seconds (Sampling)
##                0.645738 seconds (Total)
## 
## 
## SAMPLING FOR MODEL '44fb1922c86bd3c60c0d669cfb345a15' NOW (CHAIN 3).
## 
## Gradient evaluation took 6.4e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.64 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 1000 [  0%]  (Warmup)
## Iteration: 100 / 1000 [ 10%]  (Warmup)
## Iteration: 200 / 1000 [ 20%]  (Warmup)
## Iteration: 300 / 1000 [ 30%]  (Warmup)
## Iteration: 400 / 1000 [ 40%]  (Warmup)
## Iteration: 500 / 1000 [ 50%]  (Warmup)
## Iteration: 501 / 1000 [ 50%]  (Sampling)
## Iteration: 600 / 1000 [ 60%]  (Sampling)
## Iteration: 700 / 1000 [ 70%]  (Sampling)
## Iteration: 800 / 1000 [ 80%]  (Sampling)
## Iteration: 900 / 1000 [ 90%]  (Sampling)
## Iteration: 1000 / 1000 [100%]  (Sampling)
## 
##  Elapsed Time: 0.234072 seconds (Warm-up)
##                0.187733 seconds (Sampling)
##                0.421805 seconds (Total)

theta_samples <- extract(pos_fit)$theta
layout(matrix(1:9, nrow=3))
for (k in 1:ncol(theta_samples)) {
    hist(theta_samples[,k], main=levels(batting$PriPos)[k], xlim=c(0.1, 0.3),
         col=adjustcolor('red',0.6), xlab='batting avg', freq=FALSE)
}

plot of chunk pos_res

Your turn :

Every individual different:

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\omega_{p_i}, \kappa_{p_i}) \\ \omega_p &\sim \Beta(1, 1) \\ \kappa_p &\sim \Gam(0.1, 0.1) . \end{aligned}\]

Variable types in Stan:

int x;       // an integer
int y[10];   // ten integers
real z;      // a number
real z[2,5]; // a 2x5 array of numbers

vector u[10];      // length 10 vector
matrix v[10,10];   // 10x10 matrix
vector[10] w[10];  // ten length 10 vectors

don’t forget the ;
make sure R types match
read the error messages

pos_model <- "
data {
    int N;   // number of players
    int hits[N];
    int at_bats[N];
    int npos; // number of positions
    int position[N];
}
parameters {
    real<lower=0, upper=1> theta[N];
    real<lower=0, upper=1> omega[npos];
    real<lower=0> kappa[npos];
}
model {
    real alpha[N];
    real beta[N];
    for (i in 1:N) {
        alpha[i] = omega[position[i]] * kappa[position[i]];
        beta[i] = (1 - omega[position[i]]) * kappa[position[i]];
    }
    hits ~ binomial(at_bats, theta);
    for (i in 1:N) {
        theta[i] ~ beta(alpha[i], beta[i]);
    }
    omega ~ beta(1,1);
    kappa ~ gamma(0.1,0.1);
} "

pos_fit <- stan(model_code=pos_model, chains=3, iter=1000,
                  data=list(N=nrow(batting),
                            hits=batting$Hits,
                            at_bats=batting$AtBats,
                            npos=nlevels(batting$PriPos),
                            position=as.numeric(batting$PriPos)))

What questions can we ask?

(discussion / see Kruschke Ch. 9)

Example

Suppose that I have a large pot of coins that are all similar to each other. I flip each one ten times, and record the number of Heads. What is each coin’s probability of Heads?

Treated separately, we would be very uncertain about each coin.
Together, they should tell us very accurately what are likely values of \(\theta\).
This information can improve the estimate of each separate \(\theta\).

How does shrinkage affect the baseball inference?

Adding levels of randomness

Outline

Reporting uncertainty

How do we communicate results?

Credible region

Interpretation #1

Interpretation #2

Interpretation #3

But which credible interval?

Hierarchical coins

Motivating problem: more coins

Binomial versus Beta-Binomial

A problem

Markov Chain Monte Carlo

When you can’t do the integrals: MCMC

How? Markov chain Monte Carlo!

Overview of MCMC

Key concepts

On your feet

How it works

MC Stan

“Quick and easy” MCMC: Stan

The skeletal Stan program

Beta-Binomial with Stan

First, in words:

Running the MCMC: rstan

Your turn!

Stochastic minute

Exponential, and Gamma

“Hierarchical Coins” with Stan

Baseball

Questions?

Everyone is the same

Every pitcher is the same

Your turn :

What questions can we ask?

Sharing power // Shrinkage

Example