\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \newcommand{\given}{\;\vert\;} \]

Spatial, and network models

Peter Ralph

12 March 2018 – Advanced Biological Statistics

Spatial models

A simple scenario

Suppose we have estimates of abundance of a soil microbe from a number of samples across our study area:

plot of chunk show_study

The data

(x,y) : spatial coords; z : abundance

##            x         y        z
## 1  0.5766037 0.5863978 3.845720
## 2  0.2230729 0.2747410 3.571873
## 3  0.3318966 0.1476570 4.683332
## 4  0.7107246 0.8014103 3.410312
## 5  0.8194490 0.3864098 3.726504
## 6  0.4237206 0.8204507 3.356803
## 7  0.9635445 0.6849373 4.424564
## 8  0.9781304 0.8833893 5.795779
## 9  0.8405219 0.1119208 3.962872
## 10 0.9966112 0.7788340 5.155496
## 11 0.8659590 0.6210109 3.911921
## 12 0.7014217 0.2741515 4.235119
## 13 0.3904731 0.3014697 3.657133
## 14 0.3147697 0.3490438 3.342371
## 15 0.8459473 0.3291311 3.835436
## 16 0.1392785 0.8095039 4.035925
## 17 0.5181206 0.2744821 4.694328
## 18 0.5935508 0.9390949 3.054045
## 19 0.9424617 0.9022671 5.518488
## 20 0.6280196 0.1770531 4.617342

Goals:

  1. (descriptive) What spatial scale does abundance vary over?

  2. (predictive) What are the likely (range of) abundances at new locations?

Spatial covariance

Tobler’s First Law of Geography:

Everything is related to everything else, but near things are more related than distant things.

Modeler: Great, covariance is a decreasing function of distance.

A decreasing function of distance.

A convenient choice: the covariance between two points distance \(d\) apart is \[\begin{aligned} \alpha^2 \exp\left(- \frac{1}{2}\left(\frac{d}{\rho}\right)^2 \right) . \end{aligned}\]

  • \(\alpha\) controls the overall variance (amount of noise)

  • \(\rho\) is the spatial scale that covariance decays over

In Stan

cov_exp_quad() documentation
cov_exp_quad() documentation

Here’s an R function that takes a set of locations (xy), a variance scaling alpha, and a spatial scale rho:

Challenge: simulate spatially autocorrelated random Gaussian values, and plot them, in space. Pick parameters so you can tell they are autocorrelated.

to color points by a continuous value:

     colorRampPalette(c('blue', 'red'))(24)[cut(xy$z, breaks=24)]

Simulation

plot of chunk plot_pts

Simulation number 2

plot of chunk plot_pts2

Back to the data

Goals

  1. (descriptive) What spatial scale does abundance vary over?

    \(\Rightarrow\) What is \(\rho\)?

  2. (predictive) What are the likely (range of) abundances at new locations?

    \(\Rightarrow\) Add unobserved abundances as parameters.

A basic Stan block

Challenge: we would like to estimate the abundance at the k locations new_xy. Add this feature to the Stan block.


data {
    int N; // number of obs
    vector[2] xy[N]; // spatial pos
    vector[N] z;
}
parameters {
    real<lower=0> alpha;
    real<lower=0> rho;
}
model {
    matrix[N, N] K;
    K = cov_exp_quad(xy, alpha, rho);

    z ~ multi_normal(rep_vector(0.0, N), K);
    alpha ~ normal(0, 5);
    rho ~ normal(0, 5);
}

A solution

Simulate data

It runs.

## [20,1]
##    user  system elapsed 
##   1.369   0.036   2.447

Does it work?

##       truth mean      se_mean     sd        2.5%      25%       50%      
## alpha 2.5   4.234435  0.08918686  2.023869  1.77594   2.775801  3.721438 
## rho   0.6   0.6926219 0.01211368  0.2500757 0.3555691 0.5261485 0.6432864
## delta 0.1   0.2805626 0.007569845 0.1598728 0.0942439 0.1729031 0.2409412
## mu    5     5.330636  0.09851299  2.568532  0.1735635 3.843481  5.406842 
##       75%       97.5%     n_eff    Rhat    
## alpha 5.101349  9.161582  514.9476 1.009623
## rho   0.8055962 1.368224  426.1778 1.009142
## delta 0.3386181 0.7397388 446.0414 1.004646
## mu    6.868921  9.874479  679.8029 1.000725

plot of chunk plot_pts_interp

Conclusions

  1. (descriptive) What spatial scale does abundance vary over?

    Values are correlated over distances of order \(\rho=0.6926219\) units of distance.

  2. (predictive) What are the likely (range of) abundances at new locations?

    These are

##                   x         y     mean        sd
## new_z[1] 0.01268409 0.3892949 4.120024 0.7242546
## new_z[2] 0.13522095 0.1532280 5.420720 0.6137691
## new_z[3] 0.55666986 0.9800330 2.687552 0.8721023
## new_z[4] 0.54442795 0.4793665 3.824439 0.6073334
## new_z[5] 0.19853665 0.6226803 2.494086 0.6191298

Network models

What we’ve just done

  • Model: the data are multivariate Normal,

  • with nearby locations have more correlated values.

  • To do this, we set the entries of the covariance matrix equal to a decreasing function of distance;

  • then, we could estimate values at unobserved locations.

We could do the same thing in a network model, calculating “nearby” and “distance” with the network.

Example

A network:

plot of chunk plot_network

Here is the adjacency matrix of the network:

##        1     2     3     4     5     6     7     8     9    10    11    12
## 1  FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
## 2  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## 3   TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## 4  FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 5  FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## 6  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## 7  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 8  FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 9   TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## 10 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 11 FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 12 FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 13 FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 14 FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## 15 FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## 16 FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 17 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## 18 FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 19 FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## 20 FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##       13    14    15    16    17    18    19    20
## 1  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 2  FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
## 3  FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## 4  FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
## 5  FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 6  FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
## 7   TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
## 8  FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 9  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 10 FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE
## 11 FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
## 12 FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
## 13 FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
## 14 FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## 15  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## 16  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## 17 FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
## 18 FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
## 19 FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## 20  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

Our model has random variates at each node of the network, correlated with neighbors.

Suppose that

  1. Every node has mean \(\mu\) and variance \(\alpha^2\).

  2. Neighboring nodes have covariance \(\alpha^2 \times \rho\).

Then, the covariance matrix is

Now we can simulate values on the network:

## Error in rmvnorm(1, mean = rep(truth$mu, N), sigma = truth$covmat): sigma must be a symmetric matrix

Analysis options

  1. Suppose the network is unknown, and infer it.

  2. Suppose the network is known, and use it as a smoothing prior to help infer noisy or unobserved values at some of the nodes.

Summary

We can now

  1. Simulate autocorrelated values on a landscape.

  2. Parameterize and fit a spatial, multivariate Normal model.

  3. Interpolate a noisy landscape to unobserved locations.