Peter Ralph
19 February 2018 – Advanced Biological Statistics
So far we’ve focused on building models for the data.
Models involve parameters, that are often the target of our inference.
We put priors on parameters, for several reasons:
To be able to communicate uncertainty using the posterior.
To incorporate prior information.
To “strongly encourage” certain model requirements (e.g., sparsity).
Dimension reduction and visualization (e.g., PCA)
Clustering and categorization
Time series
Spatial and network models
These all involve new models and mew ways of using priors to achieve analysis goals.
In addition to sampling from the posterior distribution, Stan can do optimization: hill climb to the top.
Definition: the maximum a posteriori (MAP) estimate is the set of parameter values that maximize the posterior likelihood.
Recall that \[\begin{aligned} \text{posterior} = \text{prior} \times \text{likelihood} . \end{aligned}\] … so this is closely related to the maximum likelihood estimate (MLE).
optimizing package:rstan R Documentation
Obtain a point estimate by maximizing the joint posterior
Description:
Obtain a point estimate by maximizing the joint posterior from the
model defined by class ‘stanmodel’.
Usage:
## S4 method for signature 'stanmodel'
optimizing(object, data = list(),
seed = sample.int(.Machine$integer.max, 1), init = 'random',
check_data = TRUE, sample_file = NULL,
algorithm = c("LBFGS", "BFGS", "Newton"),
verbose = FALSE, hessian = FALSE, as_vector = TRUE,
draws = 0, constrained = TRUE, ...)
Arguments:
object: An object of class ‘stanmodel’.
data: A named ‘list’ or ‘environment’ providing the data for the
model or a character vector for all the names of objects used
as data. See the Note section in ‘stan’.
## Initial log joint probability = 78.1921
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## $par
## lambda
## 6.150003
##
## $value
## [1] 100.4236
##
## $return_code
## [1] 0
timings <- lapply(10^(1:5),
function (N) {
Z <- rpois(N, 5)
a <- system.time(
optimizing(pois_model,
data=list(N=N,
Z=Z)))
b <- system.time(
stan(model_code=pois_block,
data=list(N=N,
Z=Z)))
list(optim=a, mcmc=b) } )
## Initial log joint probability = -35.4992
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 217.053
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 2972.9
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 28752.7
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 295465
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
The downside of point estimates is
that you’ve got no estimate of uncertainty.
## ------------------------------------------------------------
## EXPERIMENTAL ALGORITHM:
## This procedure has not been thoroughly tested and may be unstable
## or buggy. The interface is subject to change.
## ------------------------------------------------------------
##
##
##
## Gradient evaluation took 0.002431 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 24.31 seconds.
## Adjust your expectations accordingly!
##
##
## Begin eta adaptation.
## Iteration: 1 / 250 [ 0%] (Adaptation)
## Iteration: 50 / 250 [ 20%] (Adaptation)
## Iteration: 100 / 250 [ 40%] (Adaptation)
## Iteration: 150 / 250 [ 60%] (Adaptation)
## Iteration: 200 / 250 [ 80%] (Adaptation)
## Success! Found best value [eta = 1] earlier than expected.
##
## Begin stochastic gradient ascent.
## iter ELBO delta_ELBO_mean delta_ELBO_med notes
## 100 -2e+05 1.000 1.000
## 200 -2e+05 0.507 1.000
## 300 -2e+05 0.338 0.013
## 400 -2e+05 0.254 0.013
## 500 -2e+05 0.203 0.001 MEDIAN ELBO CONVERGED
##
## Drawing a sample of size 1000 from the approximate posterior...
## COMPLETED.
## Inference for Stan model: f559a4f52ec33e0ff0a8f08350a487f1.
## 1 chains, each with iter=1000; warmup=0; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=1000.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## lambda 5.03 0.01 5.01 5.03 5.03 5.04 5.05
## lp__ 0.00 0.00 0.00 0.00 0.00 0.00 0.00
##
## Approximate samples were drawn using VB(meanfield) at Mon Feb 19 21:47:45 2018.
## We recommend genuine 'sampling' from the posterior distribution for final inferences!
## Inference for Stan model: f559a4f52ec33e0ff0a8f08350a487f1.
## 1 chains, each with iter=1000; warmup=0; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=1000.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## lambda 5.03 0.01 5.01 5.03 5.03 5.04 5.05
## lp__ 0.00 0.00 0.00 0.00 0.00 0.00 0.00
##
## Approximate samples were drawn using VB(meanfield) at Mon Feb 19 21:47:45 2018.
## We recommend genuine 'sampling' from the posterior distribution for final inferences!
Let’s practice writing models, and compare the resuls of stan( )
to optimizing( )
.
Write down a model on the whiteboard.
Explain the model, how to find what you want from it, to another pair of people.
Code up the Stan model.
Simulate some test data.
Run optimizing( )
to get point estimates.
Number of mosquitos caught in traps at 20 different time points at 4 locations; temperature and rainfall are also measured.
Transpiration rates of 5 trees each of 100 strains, along with genotype at five SNPs putatively linked to stomatal efficiency.
Presence or absence of Wolbachia parasites in fifty flies are sampled from each of 100 populations, along with the sex and transcription levels of ten immune-related genes of each fly.
Modifications: (a) change the numbers - 1,000 SNPs instead of five? (b) make it robust (to outliers)!
Number of mosquitos caught in traps at 20 different time points at 4 locations; temperature and rainfall are also measured.
\(Z_i\) : number of mosquitos caught in trap \(i\)
\(\text{time}_i\): which day that trap \(i\) was run on (out of 20) - categorical, between 1 and 20, same for each location
\(\text{loc}_i\): location (out of four) of trap \(i\)
\(\text{temp}_i\) : temperature at 7am when trap \(i\) was run
\(\text{rain}_i\) : rainfall in previous 24hrs to running trap \(i\)
include temp and rain as predictors
use an exponential link function so that effects are multiplicative
\[\begin{aligned} Z_i &\sim \Poisson(\exp(\lambda_i)) \\ \lambda_i &\sim \Normal(\mu_i, \sigma) \\ \mu_i &= \exp\left( b_2 \times \text{rain}_i + b_3 \times \text{temp}_i + b_0[\text{loc}_i] + b_1[\text{time}_i] \\ b_0 &\sim \Normal(\nu_0, \sigma_0) \\ b_1 &\sim \Normal(\nu_1, \sigma_1) \\ \end{aligned}\]
moz_block <- "
data {
int N; // number of trap runs
int Z[N]; // number of mozzies
int loc[N]; // location index
int time[N]; // index of sampling day
vector[N] temp; // temperature
vector[N] rain; // rainfall
}
parameters {
vector[N] lambda;
real<lower=0> sigma;
vector[4] b0; // four locations
vector[20] b1; // twenty times
real b2;
real b3;
real nu0;
real nu1;
real<lower=0> sigma0;
real<lower=0> sigma1;
}
model {
vector[N] mu;
Z ~ poisson_log(lambda);
mu = b2 * rain + b3 * temp + b0[loc] + b1[time];
lambda ~ normal(mu, sigma);
b0 ~ normal(nu0, sigma0);
b1 ~ normal(nu1, sigma1);
b2 ~ normal(0, 10);
b3 ~ normal(0, 10);
nu0 ~ normal(0, 10);
nu1 ~ normal(0, 10);
sigma ~ normal(0, 10);
sigma0 ~ normal(0, 10);
sigma1 ~ normal(0, 10);
}"
moz_data <- list(
N=80,
Z=x$Z,
loc=x$loc,
time=x$time,
temp=(x$temp-mean(x$temp))/sd(x$temp),
rain=(x$rain)/10)
moz_fit <- stan(model_code=moz_block,
data=moz_data,
iter=1000, chains=4)
## Warning: There were 157 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: There were 1169 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
Conclusion: it runs; but need to adjust things to get good results.
Transpiration rates of 5 trees each of 100 strains, along with genotype at five SNPs putatively linked to stomatal efficiency.
\(T_i\) : transpiration rate of tree \(i\), for \(1 \le i \le 500\)
\(S_i\) : strain of tree \(i\), an index between 1 and 100
\(G_{ij}\) : genotype of SNP \(j\) in tree \(i\), for \(1 \le i \le 500\) and \(1 \le j \le 5\); takes the value 0, 1, or 2.
To make things easier, compute \[\begin{aligned} H_{ij} &= 1 \qquad \text{ if } G_{ij} = 1 \\ D_{ij} &= 1 \qquad \text{ if } G_{ij} = 2 \end{aligned}\] which are zero otherwise.
\[\begin{aligned} T_i &\sim \log\Normal(\mu_i, \sigma) \\ \mu_i &= b_{0,S_i} + b_{1,S_i} H_{i,1} + b_{2,S_i} H_{i,2} + b_{3,S_i} H_{i,3} + b_{4,S_i} H_{i,4} + b_{5,S_i} H_{i,5} \\ & \qquad {} + c_{1,S_i} D_{i,1} + c_{2,S_i} D_{i,2} + c_{3,S_i} D_{i,3} + c_{4,S_i} D_{i,4} + c_{5,S_i} D_{i,5} \\ b_k[s] &= \text{effect of het SNP $k$ in strain $s$} \\ &\sim \Cauchy(\nu_k, \tau_k) \\ c_k[s] &= \text{effect of hom SNP $k$ in strain $s$} \\ &\sim \Cauchy(\omega_k, u_k) \\ \nu &\sim \Normal(0, 10) \\ \omega &\sim \Normal^+(0, 10) \\ \tau &\sim \Normal(0, 10) \\ u &\sim \Normal^+(0, 10) \\ \sigma &\sim \Normal^+(0, 20) \end{aligned}\]
nsnps <- 5
nstrains <- 100
nindivs <- 5
trees <- data.frame(id=1:(nindivs * nstrains),
strain=rep(1:nstrains, each=nindivs))
# strain SNP proportions
snp_p <- matrix(rbeta(nsnps * nstrains, 0.5, 0.5), ncol=nsnps)
geno <- matrix(rbinom(nindivs*nstrains*nsnps, size=2,
prob=snp_p[trees$strain,]), ncol=nsnps)
colnames(geno) <- paste0("g", 1:nsnps)
H <- (geno == 1)
D <- (geno == 2)
# transpiration
true_b0 <- rnorm(nstrains, mean=5, sd=0.1)
# snp effects
true_b <- c(2.0, 0, 0, 0, 0)/100
true_c <- c(4.0, 3.0, 0, 0, 0)/100
# strain * snp effects
true_bmat <- matrix(rep(true_b, each=nstrains), ncol=nsnps)
true_bmat[1:20, 3] <- -2.0/100
true_cmat <- matrix(rep(true_b, each=nstrains), ncol=nsnps)
true_cmat[1:20, 3] <- -4.0/100
# combined
trees$true_mu <- (true_b0[trees$strain] +
rowSums(H * true_bmat[trees$strain,]) +
rowSums(D * true_cmat[trees$strain,]))
# noise
true_sigma <- 0.05
trees$transpiration <- exp(rnorm(nstrains*nindivs,
mean=trees$true_mu, sd=true_sigma))
tree_block <- "
data {
int N; // number of trees
int nsnps; // number of SNPs
vector[N] T; // transp rates
int S[N]; // strain index
matrix[N, nsnps] H; // 0 or 1 if het
matrix[N, nsnps] D; // 0 or 1 if hom
}
parameters {
vector[100] b0;
matrix[100,nsnps] b;
matrix[100,nsnps] c;
real<lower=0> sigma;
vector[nsnps] nu;
vector<lower=0>[nsnps] tau;
vector[nsnps] omega;
vector<lower=0>[nsnps] u;
}
model {
vector[N] mu;
mu = b0[S];
for (j in 1:nsnps) {
mu += (b[S,j] .* H[,j]) + (c[S,j] .* D[,j]);
b[,j] ~ cauchy(nu[j], tau[j]);
c[,j] ~ cauchy(omega[j], u[j]);
}
T ~ lognormal(mu, sigma);
nu ~ normal(0, 10);
omega ~ normal(0, 10);
tau ~ normal(0, 10);
u ~ normal(0, 10);
sigma ~ normal(0, 20);
}"
tree_fit <- stan(model_code=tree_block,
data=list(N=nstrains*nindivs,
nsnps=nsnps,
T=trees$transpiration,
S=trees$strain,
H=matrix(as.numeric(H), ncol=nsnps),
D=matrix(as.numeric(D), ncol=nsnps)),
iter=1000)
## Warning: There were 2000 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems