\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \newcommand{\given}{\;\vert\;} \]

Time series

Peter Ralph

8 March 2018 – Advanced Biological Statistics

Modeling time series

Time series

A time series is a sequence of observations \[\begin{aligned} (y_1, y_2, \ldots, y_N) , \end{aligned}\] that were taken at some set of times \[\begin{aligned} t_1 < t_2 < \cdots < t_N . \end{aligned}\]

In general, the goal is to understand how what happens next depends on the previous history and maybe some predictor variables \[\begin{aligned} (x_1, x_2, \ldots, x_N) , \end{aligned}\] taken at the same set of times.

A simple example

AR(1)

The simplest time series model is purely phenomenological: \[\begin{aligned} y_{k+1} &= \alpha + \beta y_k + \epsilon_k \\ \epsilon_k &\sim N(0, \sigma^2) . \end{aligned}\]

This is “autoregressive, of order 1”.

Rewriting this as \[\begin{aligned} \left(y_{k+1} - \frac{\alpha}{1-\beta}\right) &= \beta \left(y_k - \frac{\alpha}{1-\beta}\right) + \epsilon_k \\ \epsilon_k &\sim N(0, \sigma^2) , \end{aligned}\] we see that if \(|\beta| < 1\) then this oscillates stably about \(\alpha / (1-\beta)\).

Simulation

First, let’s simulate some data.

Plotted as a time series

plot of chunk plot_ar1

Plotted in phase space

plot of chunk plot_ar1_

A Stan model

## Inference for Stan model: 6ff5c9ed423ae7b3b71dcd556630e896.
## 3 chains, each with iter=3000; warmup=1500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=4500.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha  4.47    0.02 0.61  3.29  4.04  4.45  4.87  5.69  1591    1
## beta   0.29    0.00 0.10  0.10  0.23  0.29  0.36  0.48  1598    1
## sigma  0.48    0.00 0.03  0.42  0.45  0.48  0.50  0.56  2087    1
## lp__  23.47    0.03 1.22 20.29 22.94 23.77 24.34 24.86  1227    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Mar  7 10:46:45 2018.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

A simple, multivariate model

Multivariate AR(1)

Just like before: \[\begin{aligned} y_{k+1} &= A + B y_k + \epsilon_k \\ \epsilon_k &\sim N(0, \Sigma) , \end{aligned}\]

except now

  • \(y_n\) is a vector of lenth \(n\)
  • \(A\) is a vector of lenth \(n\)
  • \(B\) is a \(n \times n\) matrix
  • \(\epsilon_n\) is a vector of \(n\) possibly correlated Gaussians
  • \(\Sigma\) is a \(n \times n\) positive definite matrix

Stochastic minute

The Multivariate Gaussian

If a random vector \((\epsilon_1, \ldots, \epsilon_n)\) is multivariate Gaussian with mean vector \(\mu\) and covariance matrix \(\Sigma\) then \[\begin{aligned} \epsilon_k &\sim N(\mu_k, \Sigma_{kk}) , \end{aligned}\] and \[\begin{aligned} \cov[\epsilon_i, \epsilon_j] &= \Sigma_{ij} . \end{aligned}\]

Also (remarkably), for any vector \(z\), \[\begin{aligned} \sum_i z_i \epsilon_i &\sim N(\sum_i z_i \mu_i, \sum_{ij} z_i \Sigma_{ij} z_j) ,\\ \text{i.e.,}\quad z^T \epsilon &\sim N(z^T \mu, z^T \Sigma z) . \end{aligned}\]

In other words, a multivariate Gaussian vector is a distribution that looks Gaussian in any direction.

Are you positive definite?

The matrix \(\Sigma\) must be symmetric: \[\begin{aligned} \Sigma_{ij} = \cov[\epsilon_i, \epsilon_j] = \cov[\epsilon_j, \epsilon_i] = \Sigma_{ji} . \end{aligned}\]

Since variances are positive, \[\begin{aligned} \var[z^T \epsilon] &= z^T \Sigma z > 0 , \qquad \text{for any $z$} . \end{aligned}\]

These put constraints on \(\Sigma\): it must be symmetric, positive definite.

Note: actually, nonnegative definite would suffice, but Stan only deals with “full rank” multivariate Gaussians.

In Stan

How to declare a covariance matrix:

    cov_matrix[K] Sigma;

A useful prior:

    Sigma ~ wishart(nu, S);

The Wishart distribution

If \(X\) is a \(N \times K\) matrix, each row containing an independent sample from \(\Normal(0, S)\), then \[\begin{aligned} \cov[X] &\sim \Wishart(N, S) . \end{aligned}\]

Challenge: sample from the Wishart, with N=10 and \[\begin{aligned} S = \begin{bmatrix} 1 & 1/2 & 0 \\ 1/2 & 1 & 1/4 \\ 0 & 1/4 & 2 \end{bmatrix} . \end{aligned}\]

##               [,1]        [,2]       [,3]
##  [1,]  0.633593508  1.91531130  1.0339688
##  [2,]  0.202938277 -0.06591836  1.7092590
##  [3,] -1.558708971 -0.77679237  1.1441345
##  [4,]  1.616767409  0.84333610 -1.6951287
##  [5,]  0.016669834 -0.66704282 -0.8543952
##  [6,]  0.338531008 -0.48013294  1.0009107
##  [7,] -1.276957390 -1.47534697  0.6566716
##  [8,]  0.001619262 -0.27211542  0.1809046
##  [9,] -1.232654633 -0.26403322  1.6616964
## [10,] -0.618324299 -0.10891272 -0.5674274

Another useful note:

Just as positive numbers have well-defined square roots, so symmetric, positive definite matrices do as well, called the Choleky decomposition:

##             [,1]        [,2]       [,3]        [,4]
## [1,]  1.11291048 -0.04853800 0.05299169 -0.05088764
## [2,] -0.04853800  1.03000432 0.02917037 -0.02774929
## [3,]  0.05299169  0.02917037 1.33573885  0.03820457
## [4,] -0.05088764 -0.02774929 0.03820457  0.83080347
##          [,1]        [,2]       [,3]        [,4]
## [1,] 1.054946 -0.04600995 0.05023167 -0.04823721
## [2,] 0.000000  1.01384782 0.03105153 -0.02955935
## [3,] 0.000000  0.00000000 1.15423197  0.03599404
## [4,] 0.000000  0.00000000 0.00000000  0.90901448
##             [,1]        [,2]       [,3]        [,4]
## [1,]  1.11291048 -0.04853800 0.05299169 -0.05088764
## [2,] -0.04853800  1.03000432 0.02917037 -0.02774929
## [3,]  0.05299169  0.02917037 1.33573885  0.03820457
## [4,] -0.05088764 -0.02774929 0.03820457  0.83080347

Fact: to simulate from

\[\begin{aligned} x \sim \Normal(0, \Sigma) \\ \end{aligned}\]

you can multiply independent Normals by the Cholesky decomposition of \(\Sigma\):

##             [,1]        [,2]       [,3]        [,4]
## [1,]  1.11580841 -0.04972242 0.05386599 -0.05080609
## [2,] -0.04972242  1.03417832 0.02980374 -0.02751660
## [3,]  0.05386599  0.02980374 1.33233655  0.03673243
## [4,] -0.05080609 -0.02751660 0.03673243  0.82966394

A simple, multivariate model

Multivariate AR(1)

Just like before: \[\begin{aligned} y_{k+1} &= A + B y_k + \epsilon_k \\ \epsilon_k &\sim N(0, \Sigma) , \end{aligned}\]

except now

  • \(y_k\) is a vector of length \(n\)
  • \(A\) is a vector of length \(n\)
  • \(B\) is a \(n \times n\) matrix
  • \(\epsilon_k\) is a vector of \(n\) possibly correlated Gaussians
  • \(\Sigma\) is a \(n \times n\) positive definite matrix

Simulation

First, let’s simulate some data, where \[\begin{aligned} y_1(t+1) &= 1 + 0.8 y_1(t) + 0.2 (y_2(t) - y_3(t)) + \epsilon_1(t) \\ y_2(t+1) &= 2 + 0.9 y_2(t) + \epsilon_2(t) \\ y_3(t+1) &= 3 + 0.9 y_3(t) + \epsilon_3(t) \end{aligned}\] and the noise is correlated: \[\begin{aligned} \Sigma &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0.5 \\ 0 & 0.5 & 1 \end{bmatrix} \end{aligned}\] i.e., \(y_3\) and \(y_2\) tend to go up and down together.

Plotted as a time series

plot of chunk plot_ar1_mv

Plotted in phase space

plot of chunk plot_ar1_mv_

Exercise

Write a Stan block for the multivariate AR(1) model.

Hm, the result looks noisy.

## Error in summary(mv_ar1_fit)$summary: $ operator is invalid for atomic vectors

An oscillator

A discrete, noisy oscillator

Suppose we have regular, noisy observations from a discrete, noisy oscillator.

The system itself does \[\begin{aligned} x_{t+1} - x_t &= \alpha y_t + \Normal(0, \sigma_{xy}) \\ y_{t+1} - y_t &= - \beta x_t + \Normal(0, \sigma_{xy}) \end{aligned}\] but we only get to observe \[\begin{aligned} X_t &= x_t + \Normal(0, \sigma_\epsilon) \\ Y_t &= y_t + \Normal(0, \sigma_\epsilon) . \end{aligned}\]

Here’s what this looks like.

plot of chunk plot_osc

A Stan block

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems

How’d we do?

##                 mean      se_mean           sd        2.5%         25%
## alpha     0.10027484 7.548355e-06 0.0002620254 0.099767459 0.100087329
## beta      0.04959407 3.981173e-06 0.0001453410 0.049302785 0.049497526
## sigma_xy  0.01016856 3.163218e-04 0.0015125117 0.007275602 0.009151821
## sigma_eps 0.10415614 6.739671e-05 0.0024588853 0.099336078 0.102485456
##                  50%        75%      97.5%      n_eff      Rhat truth
## alpha     0.10028131 0.10045248 0.10076468 1204.98657 1.0014733  0.10
## beta      0.04958854 0.04969267 0.04987458 1332.76788 0.9984255  0.05
## sigma_xy  0.01012091 0.01112764 0.01342035   22.86332 1.0703627  0.01
## sigma_eps 0.10410202 0.10583318 0.10921730 1331.06464 1.0038145  0.10

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit

A noisier oscillator

More realism?

Let’s try that again, with more noise.

Here’s what this looks like.

plot of chunk plot_osc2

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems

How’d we do?

##                 mean      se_mean           sd       2.5%        25%
## alpha     0.09977365 3.868091e-05 0.0014981051 0.09693679 0.09871924
## beta      0.05058485 2.181817e-05 0.0008450139 0.04888203 0.05003310
## sigma_xy  0.04888524 1.058616e-03 0.0061682198 0.03836894 0.04434006
## sigma_eps 0.49500456 2.933624e-04 0.0113618758 0.47337393 0.48763187
##                  50%        75%      97.5%      n_eff      Rhat truth
## alpha     0.09976657 0.10080819 0.10273807 1500.00000 0.9992035  0.10
## beta      0.05061366 0.05112784 0.05225583 1500.00000 0.9995298  0.05
## sigma_xy  0.04836393 0.05306193 0.06140414   33.95026 1.0846841  0.05
## sigma_eps 0.49472709 0.50237105 0.51755222 1500.00000 1.0026377  0.50

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit2

Missing data

Even more realism?

Now what if we actually don’t observe most of the \(Y\) values?

Here’s what this looks like.

plot of chunk plot_osc3

A new Stan block

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems

How’d we do?

##                 mean      se_mean          sd       2.5%        25%
## alpha     0.10393336 0.0009268078 0.007705552 0.09040166 0.09863952
## beta      0.04721744 0.0004147620 0.003438507 0.04049462 0.04493189
## sigma_xy  0.04301800 0.0052723682 0.008877628 0.03001169 0.03576129
## sigma_eps 0.47543913 0.0005778553 0.016542496 0.44599898 0.46314292
##                  50%        75%      97.5%      n_eff     Rhat truth
## alpha     0.10326524 0.10870398 0.12052463  69.123893 1.043283  0.10
## beta      0.04721705 0.04953988 0.05385456  68.729305 1.040078  0.05
## sigma_xy  0.04213721 0.04798025 0.06384010   2.835192 1.686882  0.05
## sigma_eps 0.47495666 0.48555496 0.50975366 819.528259 1.014492  0.50

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit3

Summary

Time series:

  1. Autoregressive models: like linear regression.

  2. Multivariate: covariance matrix prior

  3. Oscillator: like an AR(1) but explicitly modeling the underlying, unobserved process.

    • note: this was a type of Hidden Markov Model (HMM)
  4. Missing data: the inference worked well even without observing most of one entire dimension

    • our strong idea about the underlying process made inference possible.