Optimization, and review

Peter Ralph

19 February 2018 – Advanced Biological Statistics

Overview

Building models

So far we’ve focused on building models for the data.

Models involve parameters, that are often the target of our inference.

We put priors on parameters, for several reasons:

To be able to communicate uncertainty using the posterior.
To incorporate prior information.
To “strongly encourage” certain model requirements (e.g., sparsity).

Some remaining topics: branching out

Dimension reduction and visualization (e.g., PCA)
Clustering and categorization
Time series
Spatial and network models

These all involve new models and mew ways of using priors to achieve analysis goals.

Optimization

Another trick up Stan’s sleeve

In addition to sampling from the posterior distribution, Stan can do optimization: hill climb to the top.

Definition: the maximum a posteriori (MAP) estimate is the set of parameter values that maximize the posterior likelihood.

Recall that \[\begin{aligned} \text{posterior} = \text{prior} \times \text{likelihood} . \end{aligned}\] … so this is closely related to the maximum likelihood estimate (MLE).

optimizing                package:rstan                R Documentation

Obtain a point estimate by maximizing the joint posterior

Description:

     Obtain a point estimate by maximizing the joint posterior from the
     model defined by class ‘stanmodel’.

Usage:

     ## S4 method for signature 'stanmodel'
     optimizing(object, data = list(), 
         seed = sample.int(.Machine$integer.max, 1), init = 'random', 
         check_data = TRUE, sample_file = NULL, 
         algorithm = c("LBFGS", "BFGS", "Newton"),
         verbose = FALSE, hessian = FALSE, as_vector = TRUE, 
         draws = 0, constrained = TRUE, ...)   
     
Arguments:

  object: An object of class ‘stanmodel’.

    data: A named ‘list’ or ‘environment’ providing the data for the
          model or a character vector for all the names of objects used
          as data.  See the Note section in ‘stan’.

How to do it

pois_block <- "
data {
    int N; // number of obs
    int Z[N]; // counts
}
parameters {
    real<lower=0> lambda;
}
model {
    Z ~ poisson(lambda);
}
"
pois_model <- stan_model(
                model_code=pois_block)

Z <- rpois(20, 5)
pois_opt <- optimizing(pois_model,
                       data=list(N=20,
                                 Z=Z))

## Initial log joint probability = 78.1921
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

## $par
##   lambda 
## 6.150003 
## 
## $value
## [1] 100.4236
## 
## $return_code
## [1] 0

It is fast

timings <- lapply(10^(1:5), 
    function (N) {
        Z <- rpois(N, 5)
        a <- system.time(
                 optimizing(pois_model,
                            data=list(N=N,
                                      Z=Z)))
        b <- system.time(
                 stan(model_code=pois_block,
                      data=list(N=N,
                                Z=Z)))
        list(optim=a, mcmc=b) } )

## Initial log joint probability = -35.4992
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 217.053
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 2972.9
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 28752.7
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
## Initial log joint probability = 295465
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

plot of chunk plot_pois_stan

The downside of point estimates is

that you’ve got no estimate of uncertainty.

Another shortcut: “variational Bayes”

## ------------------------------------------------------------
## EXPERIMENTAL ALGORITHM:
##   This procedure has not been thoroughly tested and may be unstable
##   or buggy. The interface is subject to change.
## ------------------------------------------------------------
## 
## 
## 
## Gradient evaluation took 0.002431 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 24.31 seconds.
## Adjust your expectations accordingly!
## 
## 
## Begin eta adaptation.
## Iteration:   1 / 250 [  0%]  (Adaptation)
## Iteration:  50 / 250 [ 20%]  (Adaptation)
## Iteration: 100 / 250 [ 40%]  (Adaptation)
## Iteration: 150 / 250 [ 60%]  (Adaptation)
## Iteration: 200 / 250 [ 80%]  (Adaptation)
## Success! Found best value [eta = 1] earlier than expected.
## 
## Begin stochastic gradient ascent.
##   iter       ELBO   delta_ELBO_mean   delta_ELBO_med   notes 
##    100     -2e+05             1.000            1.000
##    200     -2e+05             0.507            1.000
##    300     -2e+05             0.338            0.013
##    400     -2e+05             0.254            0.013
##    500     -2e+05             0.203            0.001   MEDIAN ELBO CONVERGED
## 
## Drawing a sample of size 1000 from the approximate posterior... 
## COMPLETED.

## Inference for Stan model: f559a4f52ec33e0ff0a8f08350a487f1.
## 1 chains, each with iter=1000; warmup=0; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=1000.
## 
##        mean   sd 2.5%  25%  50%  75% 97.5%
## lambda 5.03 0.01 5.01 5.03 5.03 5.04  5.05
## lp__   0.00 0.00 0.00 0.00 0.00 0.00  0.00
## 
## Approximate samples were drawn using VB(meanfield) at Mon Feb 19 21:47:45 2018.

## We recommend genuine 'sampling' from the posterior distribution for final inferences!

pois_vb

## Inference for Stan model: f559a4f52ec33e0ff0a8f08350a487f1.
## 1 chains, each with iter=1000; warmup=0; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=1000.
## 
##        mean   sd 2.5%  25%  50%  75% 97.5%
## lambda 5.03 0.01 5.01 5.03 5.03 5.04  5.05
## lp__   0.00 0.00 0.00 0.00 0.00 0.00  0.00
## 
## Approximate samples were drawn using VB(meanfield) at Mon Feb 19 21:47:45 2018.

## We recommend genuine 'sampling' from the posterior distribution for final inferences!

Exercises

Write models, optimize

Let’s practice writing models, and compare the resuls of stan( ) to optimizing( ).

Write down a model on the whiteboard.
Explain the model, how to find what you want from it, to another pair of people.
Code up the Stan model.
Simulate some test data.
Run optimizing( ) to get point estimates.

Pick a situation

Number of mosquitos caught in traps at 20 different time points at 4 locations; temperature and rainfall are also measured.
Transpiration rates of 5 trees each of 100 strains, along with genotype at five SNPs putatively linked to stomatal efficiency.
Presence or absence of Wolbachia parasites in fifty flies are sampled from each of 100 populations, along with the sex and transcription levels of ten immune-related genes of each fly.

Modifications: (a) change the numbers - 1,000 SNPs instead of five? (b) make it robust (to outliers)!

Case 1: Mosquitos

The situation

Number of mosquitos caught in traps at 20 different time points at 4 locations; temperature and rainfall are also measured.

Data

$Z_i$ : number of mosquitos caught in trap $i$
$\text{time}_i$: which day that trap $i$ was run on (out of 20) - categorical, between 1 and 20, same for each location
$\text{loc}_i$: location (out of four) of trap $i$
$\text{temp}_i$ : temperature at 7am when trap $i$ was run
$\text{rain}_i$ : rainfall in previous 24hrs to running trap $i$

include temp and rain as predictors
use an exponential link function so that effects are multiplicative

\[\begin{aligned} Z_i &\sim \Poisson(\exp(\lambda_i)) \\ \lambda_i &\sim \Normal(\mu_i, \sigma) \\ \mu_i &= \exp\left( b_2 \times \text{rain}_i + b_3 \times \text{temp}_i + b_0[\text{loc}_i] + b_1[\text{time}_i] \\ b_0 &\sim \Normal(\nu_0, \sigma_0) \\ b_1 &\sim \Normal(\nu_1, \sigma_1) \\ \end{aligned}\]

moz_block <- "
data {
    int N; // number of trap runs
    int Z[N];  // number of mozzies
    int loc[N]; // location index
    int time[N]; // index of sampling day
    vector[N] temp; // temperature
    vector[N] rain; // rainfall
}
parameters {
    vector[N] lambda;
    real<lower=0> sigma;
    vector[4] b0;  // four locations
    vector[20] b1; // twenty times
    real b2;
    real b3;
    real nu0;
    real nu1;
    real<lower=0> sigma0;
    real<lower=0> sigma1;
}
model {
    vector[N] mu;
    Z ~ poisson_log(lambda);
    mu = b2 * rain + b3 * temp + b0[loc] + b1[time];
    lambda ~ normal(mu, sigma);
    b0 ~ normal(nu0, sigma0);
    b1 ~ normal(nu1, sigma1);
    b2 ~ normal(0, 10);
    b3 ~ normal(0, 10);
    nu0 ~ normal(0, 10);
    nu1 ~ normal(0, 10);
    sigma ~ normal(0, 10);
    sigma0 ~ normal(0, 10);
    sigma1 ~ normal(0, 10);
}"

Simulate data

tp <- list(b2=0, b3=0.25, b0=1:4, b1=rep(0,20))
x <- expand.grid(time=1:20, loc=1:4)
x$temp <- 20 + rnorm(80, 5)
x$rain <- round(rgamma(80, 1/10, 1/10), 2)
x$mu <- exp(tp$b0[x$loc] + tp$b1[x$time] 
            + tp$b2 * x$rain + tp$b3 * x$temp)
x$Z <- rpois(80, x$mu)

Run Stan

moz_data <- list(
              N=80,
              Z=x$Z,
              loc=x$loc,
              time=x$time,
              temp=(x$temp-mean(x$temp))/sd(x$temp),
              rain=(x$rain)/10)

moz_fit <- stan(model_code=moz_block,
                data=moz_data,
                iter=1000, chains=4)

## Warning: There were 157 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: There were 1169 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

## Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low

## Warning: Examine the pairs() plot to diagnose sampling problems

Conclusion: it runs; but need to adjust things to get good results.

Case #2: trees

The situation

Transpiration rates of 5 trees each of 100 strains, along with genotype at five SNPs putatively linked to stomatal efficiency.

Data

$T_i$ : transpiration rate of tree $i$, for $1 \le i \le 500$
$S_i$ : strain of tree $i$, an index between 1 and 100
$G_{ij}$ : genotype of SNP $j$ in tree $i$, for $1 \le i \le 500$ and $1 \le j \le 5$; takes the value 0, 1, or 2.

To make things easier, compute \[\begin{aligned} H_{ij} &= 1 \qquad \text{ if } G_{ij} = 1 \\ D_{ij} &= 1 \qquad \text{ if } G_{ij} = 2 \end{aligned}\] which are zero otherwise.

\[\begin{aligned} T_i &\sim \log\Normal(\mu_i, \sigma) \\ \mu_i &= b_{0,S_i} + b_{1,S_i} H_{i,1} + b_{2,S_i} H_{i,2} + b_{3,S_i} H_{i,3} + b_{4,S_i} H_{i,4} + b_{5,S_i} H_{i,5} \\ & \qquad {} + c_{1,S_i} D_{i,1} + c_{2,S_i} D_{i,2} + c_{3,S_i} D_{i,3} + c_{4,S_i} D_{i,4} + c_{5,S_i} D_{i,5} \\ b_k[s] &= \text{effect of het SNP $k$ in strain $s$} \\ &\sim \Cauchy(\nu_k, \tau_k) \\ c_k[s] &= \text{effect of hom SNP $k$ in strain $s$} \\ &\sim \Cauchy(\omega_k, u_k) \\ \nu &\sim \Normal(0, 10) \\ \omega &\sim \Normal^+(0, 10) \\ \tau &\sim \Normal(0, 10) \\ u &\sim \Normal^+(0, 10) \\ \sigma &\sim \Normal^+(0, 20) \end{aligned}\]

Simulate data

nsnps <- 5
nstrains <- 100
nindivs <- 5
trees <- data.frame(id=1:(nindivs * nstrains),
                    strain=rep(1:nstrains, each=nindivs))
# strain SNP proportions
snp_p <- matrix(rbeta(nsnps * nstrains, 0.5, 0.5), ncol=nsnps)
geno <- matrix(rbinom(nindivs*nstrains*nsnps, size=2, 
                      prob=snp_p[trees$strain,]), ncol=nsnps)
colnames(geno) <- paste0("g", 1:nsnps)
H <- (geno == 1)
D <- (geno == 2)

# transpiration
true_b0 <- rnorm(nstrains, mean=5, sd=0.1)
# snp effects
true_b <- c(2.0, 0, 0, 0, 0)/100
true_c <- c(4.0, 3.0, 0, 0, 0)/100
# strain * snp effects
true_bmat <- matrix(rep(true_b, each=nstrains), ncol=nsnps)
true_bmat[1:20, 3] <- -2.0/100
true_cmat <- matrix(rep(true_b, each=nstrains), ncol=nsnps)
true_cmat[1:20, 3] <- -4.0/100
# combined
trees$true_mu <- (true_b0[trees$strain] + 
                  rowSums(H * true_bmat[trees$strain,]) + 
                  rowSums(D * true_cmat[trees$strain,]))
# noise
true_sigma <- 0.05
trees$transpiration <- exp(rnorm(nstrains*nindivs, 
                                 mean=trees$true_mu, sd=true_sigma))

Stan block

tree_block <- "
data {
    int N; // number of trees
    int nsnps; // number of SNPs
    vector[N] T; // transp rates
    int S[N]; // strain index
    matrix[N, nsnps] H; // 0 or 1 if het
    matrix[N, nsnps] D; // 0 or 1 if hom
}
parameters {
    vector[100] b0;
    matrix[100,nsnps] b;
    matrix[100,nsnps] c;
    real<lower=0> sigma;
    vector[nsnps] nu;
    vector<lower=0>[nsnps] tau;
    vector[nsnps] omega;
    vector<lower=0>[nsnps] u;
}
model {
    vector[N] mu;
    mu = b0[S];
    for (j in 1:nsnps) {
        mu += (b[S,j] .* H[,j]) + (c[S,j] .* D[,j]);
        b[,j] ~ cauchy(nu[j], tau[j]);
        c[,j] ~ cauchy(omega[j], u[j]);
    }
    T ~ lognormal(mu, sigma);
    nu ~ normal(0, 10);
    omega ~ normal(0, 10);
    tau ~ normal(0, 10);
    u ~ normal(0, 10);
    sigma ~ normal(0, 20);
}"

tree_fit <- stan(model_code=tree_block,
                 data=list(N=nstrains*nindivs,
                           nsnps=nsnps,
                           T=trees$transpiration,
                           S=trees$strain,
                           H=matrix(as.numeric(H), ncol=nsnps),
                           D=matrix(as.numeric(D), ncol=nsnps)),
                 iter=1000)

## Warning: There were 2000 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

## Warning: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low

## Warning: Examine the pairs() plot to diagnose sampling problems