Time series

Peter Ralph

8 March 2018 – Advanced Biological Statistics

Modeling time series

Time series

A time series is a sequence of observations \[\begin{aligned} (y_1, y_2, \ldots, y_N) , \end{aligned}\] that were taken at some set of times \[\begin{aligned} t_1 < t_2 < \cdots < t_N . \end{aligned}\]

In general, the goal is to understand how what happens next depends on the previous history and maybe some predictor variables \[\begin{aligned} (x_1, x_2, \ldots, x_N) , \end{aligned}\] taken at the same set of times.

A simple example

AR(1)

The simplest time series model is purely phenomenological: \[\begin{aligned} y_{k+1} &= \alpha + \beta y_k + \epsilon_k \\ \epsilon_k &\sim N(0, \sigma^2) . \end{aligned}\]

This is “autoregressive, of order 1”.

Rewriting this as \[\begin{aligned} \left(y_{k+1} - \frac{\alpha}{1-\beta}\right) &= \beta \left(y_k - \frac{\alpha}{1-\beta}\right) + \epsilon_k \\ \epsilon_k &\sim N(0, \sigma^2) , \end{aligned}\] we see that if $|\beta| < 1$ then this oscillates stably about $\alpha / (1-\beta)$.

Simulation

First, let’s simulate some data.

truth <- list(alpha=5, 
              beta=0.2,
              sigma=0.5)
N <- 100
y <- rep(0, N)
y[1] <- rnorm(1, mean=5, sd=5)
for (k in 1:(N-1)) {
    y[k+1] <- (truth$alpha + truth$beta * y[k] 
               + rnorm(1, mean=0, sd=truth$sigma))
}

Plotted as a time series

plot(y, xlab='time', ylab=expression(y[t]), type='b')
abline(h=truth$alpha/(1-truth$beta), col='red', lty=3)

plot of chunk plot_ar1

Plotted in phase space

plot(y[-N], y[-1], xlab=expression(y[t]), ylab=expression(y[t+1]), type='l', col=adjustcolor("black", 0.8))
points(x=truth$alpha/(1-truth$beta), y=truth$alpha/(1-truth$beta), col='red')

plot of chunk plot_ar1_

A Stan model

ar1_block <- "
data {
    int N;
    vector[N] y;
}
parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
}
model {
    y[2:N] ~ normal(alpha + beta * y[1:(N - 1)], sigma);
    alpha ~ normal(0, 20);
    beta ~ normal(0, 1);
    sigma ~ normal(0, 5);
}
"
ar1_model <- stan_model(model_code=ar1_block)

ar1_fit <- sampling(ar1_model,
                    iter=3000, chains=3,
                    data=list(N=N,
                              y=y))
print(ar1_fit)

## Inference for Stan model: 6ff5c9ed423ae7b3b71dcd556630e896.
## 3 chains, each with iter=3000; warmup=1500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=4500.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha  4.47    0.02 0.61  3.29  4.04  4.45  4.87  5.69  1591    1
## beta   0.29    0.00 0.10  0.10  0.23  0.29  0.36  0.48  1598    1
## sigma  0.48    0.00 0.03  0.42  0.45  0.48  0.50  0.56  2087    1
## lp__  23.47    0.03 1.22 20.29 22.94 23.77 24.34 24.86  1227    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Mar  7 10:46:45 2018.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

A simple, multivariate model

Multivariate AR(1)

Just like before: \[\begin{aligned} y_{k+1} &= A + B y_k + \epsilon_k \\ \epsilon_k &\sim N(0, \Sigma) , \end{aligned}\]

except now

$y_n$ is a vector of lenth $n$
$A$ is a vector of lenth $n$
$B$ is a $n \times n$ matrix
$\epsilon_n$ is a vector of $n$ possibly correlated Gaussians
$\Sigma$ is a $n \times n$ positive definite matrix

Stochastic minute

The Multivariate Gaussian

If a random vector $(\epsilon_1, \ldots, \epsilon_n)$ is multivariate Gaussian with mean vector $\mu$ and covariance matrix $\Sigma$ then \[\begin{aligned} \epsilon_k &\sim N(\mu_k, \Sigma_{kk}) , \end{aligned}\] and \[\begin{aligned} \cov[\epsilon_i, \epsilon_j] &= \Sigma_{ij} . \end{aligned}\]

Also (remarkably), for any vector $z$, \[\begin{aligned} \sum_i z_i \epsilon_i &\sim N(\sum_i z_i \mu_i, \sum_{ij} z_i \Sigma_{ij} z_j) ,\\ \text{i.e.,}\quad z^T \epsilon &\sim N(z^T \mu, z^T \Sigma z) . \end{aligned}\]

In other words, a multivariate Gaussian vector is a distribution that looks Gaussian in any direction.

Are you positive definite?

The matrix $\Sigma$ must be symmetric: \[\begin{aligned} \Sigma_{ij} = \cov[\epsilon_i, \epsilon_j] = \cov[\epsilon_j, \epsilon_i] = \Sigma_{ji} . \end{aligned}\]

Since variances are positive, \[\begin{aligned} \var[z^T \epsilon] &= z^T \Sigma z > 0 , \qquad \text{for any $z$} . \end{aligned}\]

These put constraints on $\Sigma$: it must be symmetric, positive definite.

Note: actually, nonnegative definite would suffice, but Stan only deals with “full rank” multivariate Gaussians.

In Stan

How to declare a covariance matrix:

    cov_matrix[K] Sigma;

A useful prior:

    Sigma ~ wishart(nu, S);

The Wishart distribution

If $X$ is a $N \times K$ matrix, each row containing an independent sample from $\Normal(0, S)$, then \[\begin{aligned} \cov[X] &\sim \Wishart(N, S) . \end{aligned}\]

Challenge: sample from the Wishart, with N=10 and \[\begin{aligned} S = \begin{bmatrix} 1 & 1/2 & 0 \\ 1/2 & 1 & 1/4 \\ 0 & 1/4 & 2 \end{bmatrix} . \end{aligned}\]

library(mvtnorm)
S <- matrix( c(1, 1/2, 0, 1/2, 1, 1/4, 0, 1/4, 2), nrow=3)
X <- rmvnorm(10, mean=rep(0,3), sigma=S)
Sigma <- cov(X)
X

##               [,1]        [,2]       [,3]
##  [1,]  0.633593508  1.91531130  1.0339688
##  [2,]  0.202938277 -0.06591836  1.7092590
##  [3,] -1.558708971 -0.77679237  1.1441345
##  [4,]  1.616767409  0.84333610 -1.6951287
##  [5,]  0.016669834 -0.66704282 -0.8543952
##  [6,]  0.338531008 -0.48013294  1.0009107
##  [7,] -1.276957390 -1.47534697  0.6566716
##  [8,]  0.001619262 -0.27211542  0.1809046
##  [9,] -1.232654633 -0.26403322  1.6616964
## [10,] -0.618324299 -0.10891272 -0.5674274

Another useful note:

Just as positive numbers have well-defined square roots, so symmetric, positive definite matrices do as well, called the Choleky decomposition:

S <- cov(matrix(rnorm(400), ncol=4))
S

##             [,1]        [,2]       [,3]        [,4]
## [1,]  1.11291048 -0.04853800 0.05299169 -0.05088764
## [2,] -0.04853800  1.03000432 0.02917037 -0.02774929
## [3,]  0.05299169  0.02917037 1.33573885  0.03820457
## [4,] -0.05088764 -0.02774929 0.03820457  0.83080347

chol(S)

##          [,1]        [,2]       [,3]        [,4]
## [1,] 1.054946 -0.04600995 0.05023167 -0.04823721
## [2,] 0.000000  1.01384782 0.03105153 -0.02955935
## [3,] 0.000000  0.00000000 1.15423197  0.03599404
## [4,] 0.000000  0.00000000 0.00000000  0.90901448

t(chol(S)) %*% chol(S)

##             [,1]        [,2]       [,3]        [,4]
## [1,]  1.11291048 -0.04853800 0.05299169 -0.05088764
## [2,] -0.04853800  1.03000432 0.02917037 -0.02774929
## [3,]  0.05299169  0.02917037 1.33573885  0.03820457
## [4,] -0.05088764 -0.02774929 0.03820457  0.83080347

Fact: to simulate from

\[\begin{aligned} x \sim \Normal(0, \Sigma) \\ \end{aligned}\]

you can multiply independent Normals by the Cholesky decomposition of $\Sigma$:

x <- matrix(rnorm(4 * 1e6), ncol=4) %*% chol(S)
cov(x)

##             [,1]        [,2]       [,3]        [,4]
## [1,]  1.11580841 -0.04972242 0.05386599 -0.05080609
## [2,] -0.04972242  1.03417832 0.02980374 -0.02751660
## [3,]  0.05386599  0.02980374 1.33233655  0.03673243
## [4,] -0.05080609 -0.02751660 0.03673243  0.82966394

A simple, multivariate model

Multivariate AR(1)

Just like before: \[\begin{aligned} y_{k+1} &= A + B y_k + \epsilon_k \\ \epsilon_k &\sim N(0, \Sigma) , \end{aligned}\]

except now

$y_k$ is a vector of length $n$
$A$ is a vector of length $n$
$B$ is a $n \times n$ matrix
$\epsilon_k$ is a vector of $n$ possibly correlated Gaussians
$\Sigma$ is a $n \times n$ positive definite matrix

Simulation

First, let’s simulate some data, where \[\begin{aligned} y_1(t+1) &= 1 + 0.8 y_1(t) + 0.2 (y_2(t) - y_3(t)) + \epsilon_1(t) \\ y_2(t+1) &= 2 + 0.9 y_2(t) + \epsilon_2(t) \\ y_3(t+1) &= 3 + 0.9 y_3(t) + \epsilon_3(t) \end{aligned}\] and the noise is correlated: \[\begin{aligned} \Sigma &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0.5 \\ 0 & 0.5 & 1 \end{bmatrix} \end{aligned}\] i.e., $y_3$ and $y_2$ tend to go up and down together.

library(mvtnorm)  # for rmvnorm
n <- 3
mv_truth <- list(A=1:3,
                 B=matrix(c(0.8, 0.2, -0.2,
                            0.0, 0.9, 0.0,
                            0.0, 0.0, 0.9), ncol=3, byrow=TRUE),
                 Sigma=matrix(c(1, 0, 0,
                                0, 1, 0.5,
                                0, 0.5, 1), ncol=3, byrow=TRUE))
N <- 100
mv_y <- matrix(0, nrow=N, ncol=n)
mv_y[1,] <- rnorm(n, mean=c(5, 25, 25), sd=5)
for (k in 1:(N-1)) {
    mv_y[k+1,] <- (mv_truth$A + mv_y[k,] %*% mv_truth$B
                   + rmvnorm(1, mean=rep(0,n), sigma=mv_truth$Sigma))
}

Plotted as a time series

matplot(mv_y, xlab='time', ylab=expression(mv_y[t]), type='b')

plot of chunk plot_ar1_mv

Plotted in phase space

plot of chunk plot_ar1_mv_

Exercise

Write a Stan block for the multivariate AR(1) model.

mv_ar1_block <- "
data {
    int N; // number of observations
    int n; // the dimension
    matrix[N,n] y;
    cov_matrix[n] S;
}
parameters {
    row_vector[n] A;
    matrix[n,n] B;
    cov_matrix[n] Sigma;
}
model {
    for (k in 1:(N-1)) {
        y[k+1,] ~ multi_normal(A + y[k,] * B, Sigma);
    }
    A ~ normal(0, 20);
    for (k in 1:n) {
        B[k,] ~ normal(0, 3);
    }
    // uninformative prior on Sigma
    Sigma ~ wishart(3, S);
}
"
mv_ar1_model <- stan_model(model_code=mv_ar1_block)

mv_ar1_fit <- sampling(mv_ar1_model,
                       data=list(N=N,
                                 n=n,
                                 y=mv_y,
                                 S=diag(n)),
                       iter=1000, chains=2)

Hm, the result looks noisy.

summary(mv_ar1_fit)$summary

## Error in summary(mv_ar1_fit)$summary: $ operator is invalid for atomic vectors

An oscillator

A discrete, noisy oscillator

Suppose we have regular, noisy observations from a discrete, noisy oscillator.

The system itself does \[\begin{aligned} x_{t+1} - x_t &= \alpha y_t + \Normal(0, \sigma_{xy}) \\ y_{t+1} - y_t &= - \beta x_t + \Normal(0, \sigma_{xy}) \end{aligned}\] but we only get to observe \[\begin{aligned} X_t &= x_t + \Normal(0, \sigma_\epsilon) \\ Y_t &= y_t + \Normal(0, \sigma_\epsilon) . \end{aligned}\]

Here’s what this looks like.

true_osc <- list(alpha=.1,
                 beta=.05,
                 sigma_xy=.01,
                 sigma_eps=.1)
N <- 500
xy <- matrix(nrow=N, ncol=2)
xy[1,] <- c(3,0)
for (k in 1:(N-1)) {
    xy[k+1,] <- (xy[k,] 
                + c(true_osc$alpha * xy[k,2],
                    (-1) * true_osc$beta * xy[k,1])
                + rnorm(2, 0, true_osc$sigma_xy))
}
XY <- xy + rnorm(N*2, 0, true_osc$sigma_eps)

plot of chunk plot_osc

A Stan block

osc_block <- "
data {
    int N;
    vector[N] X;
    vector[N] Y;
}
parameters {
    real alpha;
    real beta;
    real<lower=0> sigma_xy;
    real<lower=0> sigma_eps;
    vector[N] x;
    vector[N] y;
}
model {
    x[2:N] ~ normal(x[1:(N-1)] + alpha * y[1:(N-1)], sigma_xy);
    y[2:N] ~ normal(y[1:(N-1)] - beta * x[1:(N-1)], sigma_xy);
    X ~ normal(x, sigma_eps);
    Y ~ normal(y, sigma_eps);
    alpha ~ normal(0, 1);
    beta ~ normal(0, 1);
    sigma_xy ~ normal(0, 1);
    sigma_eps ~ normal(0, 1);
}
"
osc_model <- stan_model(model_code=osc_block)

osc_fit <- sampling(osc_model,
                    data=list(N,
                              X=XY[,1],
                              Y=XY[,2]),
                    iter=1000, chains=3,
                    control=list(max_treedepth=12))

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low

## Warning: Examine the pairs() plot to diagnose sampling problems

How’d we do?

cbind(rstan::summary(osc_fit, pars=c("alpha", "beta", "sigma_xy", "sigma_eps"))$summary, 
      truth=c(true_osc$alpha, true_osc$beta, true_osc$sigma_xy, true_osc$sigma_eps))

##                 mean      se_mean           sd        2.5%         25%
## alpha     0.10027484 7.548355e-06 0.0002620254 0.099767459 0.100087329
## beta      0.04959407 3.981173e-06 0.0001453410 0.049302785 0.049497526
## sigma_xy  0.01016856 3.163218e-04 0.0015125117 0.007275602 0.009151821
## sigma_eps 0.10415614 6.739671e-05 0.0024588853 0.099336078 0.102485456
##                  50%        75%      97.5%      n_eff      Rhat truth
## alpha     0.10028131 0.10045248 0.10076468 1204.98657 1.0014733  0.10
## beta      0.04958854 0.04969267 0.04987458 1332.76788 0.9984255  0.05
## sigma_xy  0.01012091 0.01112764 0.01342035   22.86332 1.0703627  0.01
## sigma_eps 0.10410202 0.10583318 0.10921730 1331.06464 1.0038145  0.10

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit

A noisier oscillator

More realism?

Let’s try that again, with more noise.

Here’s what this looks like.

true_osc2 <- list(alpha=.1,
                 beta=.05,
                 sigma_xy=.05,
                 sigma_eps=.5)
xy2 <- matrix(nrow=N, ncol=2)
xy2[1,] <- c(3,0)
for (k in 1:(N-1)) {
    xy2[k+1,] <- (xy2[k,] 
                + c(true_osc2$alpha * xy2[k,2],
                    (-1) * true_osc2$beta * xy2[k,1])
                + rnorm(2, 0, true_osc2$sigma_xy))
}
XY2 <- xy2 + rnorm(N*2, 0, true_osc2$sigma_eps)

plot of chunk plot_osc2

osc_fit2 <- sampling(osc_model,
                    data=list(N,
                              X=XY2[,1],
                              Y=XY2[,2]),
                    iter=1000, chains=3,
                    control=list(max_treedepth=12))

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low

## Warning: Examine the pairs() plot to diagnose sampling problems

How’d we do?

cbind(rstan::summary(osc_fit2, pars=c("alpha", "beta", "sigma_xy", "sigma_eps"))$summary, 
      truth=c(true_osc2$alpha, true_osc2$beta, true_osc2$sigma_xy, true_osc2$sigma_eps))

##                 mean      se_mean           sd       2.5%        25%
## alpha     0.09977365 3.868091e-05 0.0014981051 0.09693679 0.09871924
## beta      0.05058485 2.181817e-05 0.0008450139 0.04888203 0.05003310
## sigma_xy  0.04888524 1.058616e-03 0.0061682198 0.03836894 0.04434006
## sigma_eps 0.49500456 2.933624e-04 0.0113618758 0.47337393 0.48763187
##                  50%        75%      97.5%      n_eff      Rhat truth
## alpha     0.09976657 0.10080819 0.10273807 1500.00000 0.9992035  0.10
## beta      0.05061366 0.05112784 0.05225583 1500.00000 0.9995298  0.05
## sigma_xy  0.04836393 0.05306193 0.06140414   33.95026 1.0846841  0.05
## sigma_eps 0.49472709 0.50237105 0.51755222 1500.00000 1.0026377  0.50

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit2

Missing data

Even more realism?

Now what if we actually don’t observe most of the $Y$ values?

Here’s what this looks like.

true_osc3 <- list(alpha=.1,
                 beta=.05,
                 sigma_xy=.05,
                 sigma_eps=.5)
xy3 <- matrix(nrow=N, ncol=2)
xy3[1,] <- c(3,0)
for (k in 1:(N-1)) {
    xy3[k+1,] <- (xy3[k,] 
                + c(true_osc3$alpha * xy3[k,2],
                    (-1) * true_osc3$beta * xy3[k,1])
                + rnorm(2, 0, true_osc3$sigma_xy))
}
XY3 <- xy3 + rnorm(N*2, 0, true_osc3$sigma_eps)
obs_y <- floor(seq(1, N, length.out=5))
XY3[setdiff(1:N, obs_y), 2] <- NA

plot of chunk plot_osc3

A new Stan block

osc_block_missing <- "
data {
    int N;
    vector[N] X;
    int k; // number of observed Y
    int obs_y[k]; // which Y values are observed
    vector[k] Y;
}
parameters {
    real alpha;
    real beta;
    real<lower=0> sigma_xy;
    real<lower=0> sigma_eps;
    vector[N] x;
    vector[N] y;
}
model {
    x[2:N] ~ normal(x[1:(N-1)] + alpha * y[1:(N-1)], sigma_xy);
    y[2:N] ~ normal(y[1:(N-1)] - beta * x[1:(N-1)], sigma_xy);
    X ~ normal(x, sigma_eps);
    Y ~ normal(y[obs_y], sigma_eps);
    alpha ~ normal(0, 1);
    beta ~ normal(0, 1);
    sigma_xy ~ normal(0, 1);
    sigma_eps ~ normal(0, 1);
}
"
osc_model_missing <- stan_model(model_code=osc_block_missing)

osc_fit3 <- sampling(osc_model_missing,
                    data=list(N,
                              X=XY3[,1],
                              k=length(obs_y),
                              obs_y=obs_y,
                              Y=XY3[obs_y,2]),
                    iter=1000, chains=3,
                    control=list(max_treedepth=12))

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low

## Warning: Examine the pairs() plot to diagnose sampling problems

How’d we do?

cbind(rstan::summary(osc_fit3, pars=c("alpha", "beta", "sigma_xy", "sigma_eps"))$summary, 
      truth=c(true_osc3$alpha, true_osc3$beta, true_osc3$sigma_xy, true_osc3$sigma_eps))

##                 mean      se_mean          sd       2.5%        25%
## alpha     0.10393336 0.0009268078 0.007705552 0.09040166 0.09863952
## beta      0.04721744 0.0004147620 0.003438507 0.04049462 0.04493189
## sigma_xy  0.04301800 0.0052723682 0.008877628 0.03001169 0.03576129
## sigma_eps 0.47543913 0.0005778553 0.016542496 0.44599898 0.46314292
##                  50%        75%      97.5%      n_eff     Rhat truth
## alpha     0.10326524 0.10870398 0.12052463  69.123893 1.043283  0.10
## beta      0.04721705 0.04953988 0.05385456  68.729305 1.040078  0.05
## sigma_xy  0.04213721 0.04798025 0.06384010   2.835192 1.686882  0.05
## sigma_eps 0.47495666 0.48555496 0.50975366 819.528259 1.014492  0.50

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit3

Summary

Time series:

Autoregressive models: like linear regression.
Multivariate: covariance matrix prior
Oscillator: like an AR(1) but explicitly modeling the underlying, unobserved process.
- note: this was a type of Hidden Markov Model (HMM)
Missing data: the inference worked well even without observing most of one entire dimension
- our strong idea about the underlying process made inference possible.