Peter Ralph
12 March 2018 – Advanced Biological Statistics
Suppose we have estimates of abundance of a soil microbe from a number of samples across our study area:
(x,y)
: spatial coords; z
: abundance
## x y z
## 1 0.5766037 0.5863978 3.845720
## 2 0.2230729 0.2747410 3.571873
## 3 0.3318966 0.1476570 4.683332
## 4 0.7107246 0.8014103 3.410312
## 5 0.8194490 0.3864098 3.726504
## 6 0.4237206 0.8204507 3.356803
## 7 0.9635445 0.6849373 4.424564
## 8 0.9781304 0.8833893 5.795779
## 9 0.8405219 0.1119208 3.962872
## 10 0.9966112 0.7788340 5.155496
## 11 0.8659590 0.6210109 3.911921
## 12 0.7014217 0.2741515 4.235119
## 13 0.3904731 0.3014697 3.657133
## 14 0.3147697 0.3490438 3.342371
## 15 0.8459473 0.3291311 3.835436
## 16 0.1392785 0.8095039 4.035925
## 17 0.5181206 0.2744821 4.694328
## 18 0.5935508 0.9390949 3.054045
## 19 0.9424617 0.9022671 5.518488
## 20 0.6280196 0.1770531 4.617342
Goals:
(descriptive) What spatial scale does abundance vary over?
(predictive) What are the likely (range of) abundances at new locations?
Everything is related to everything else, but near things are more related than distant things.
Modeler: Great, covariance is a decreasing function of distance.
A convenient choice: the covariance between two points distance \(d\) apart is \[\begin{aligned} \alpha^2 \exp\left(- \frac{1}{2}\left(\frac{d}{\rho}\right)^2 \right) . \end{aligned}\]
\(\alpha\) controls the overall variance (amount of noise)
\(\rho\) is the spatial scale that covariance decays over
Here’s an R
function that takes a set of locations (xy
), a variance scaling alpha
, and a spatial scale rho
:
cov_exp_quad <- function (xy, alpha, rho) {
# return the 'quadratic exponential' covariance matrix
# for spatial positions xy
dxy <- as.matrix(dist(xy))
return( alpha^2 * exp( - (1/2) * dxy^2 / rho^2 ) )
}
Challenge: simulate spatially autocorrelated random Gaussian values, and plot them, in space. Pick parameters so you can tell they are autocorrelated.
to color points by a continuous value:
colorRampPalette(c('blue', 'red'))(24)[cut(xy$z, breaks=24)]
(descriptive) What spatial scale does abundance vary over?
\(\Rightarrow\) What is \(\rho\)?
(predictive) What are the likely (range of) abundances at new locations?
\(\Rightarrow\) Add unobserved abundances as parameters.
sp_block <- "
data {
int N; // number of obs
vector[2] xy[N]; // spatial pos
vector[N] z;
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
}
model {
matrix[N, N] K;
K = cov_exp_quad(xy, alpha, rho);
z ~ multi_normal(rep_vector(0.0, N), K);
alpha ~ normal(0, 5);
rho ~ normal(0, 5);
}
"
# check this compiles
sp_model <- stan_model(model_code=sp_block)
Challenge: we would like to estimate the abundance at the k
locations new_xy
. Add this feature to the Stan block.
data {
int N; // number of obs
vector[2] xy[N]; // spatial pos
vector[N] z;
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
}
model {
matrix[N, N] K;
K = cov_exp_quad(xy, alpha, rho);
z ~ multi_normal(rep_vector(0.0, N), K);
alpha ~ normal(0, 5);
rho ~ normal(0, 5);
}
sp_block <- "
data {
int N; // number of obs
vector[2] old_xy[N]; // spatial pos
vector[N] old_z;
int n;
vector[2] new_xy[n]; // new locs
}
transformed data {
vector[2] xy[N+n];
xy[1:N] = old_xy;
xy[(N+1):(N+n)] = new_xy;
print(dims(old_z));
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
vector[n] new_z;
real<lower=0> delta;
real mu;
}
model {
matrix[N+n, N+n] K;
vector[N+n] z;
K = cov_exp_quad(xy, alpha, rho);
for (k in 1:(N+n)) {
K[k,k] += delta;
}
z[1:N] = old_z;
z[(N+1):(N+n)] = new_z;
z ~ multi_normal(rep_vector(mu, N+n), K);
alpha ~ normal(0, 5);
rho ~ normal(0, 5);
delta ~ normal(0, 5);
mu ~ normal(0, 5);
}
"
new_sp <- stan_model(model_code=sp_block)
library(mvtnorm)
N <- 20
xy <- data.frame(x=runif(N), y=runif(N))
dxy <- as.matrix(dist(xy))
ut <- upper.tri(dxy, diag=TRUE)
truth <- list(rho=.6,
delta=.1,
alpha=2.5,
mu=5)
truth$covmat <- (truth$delta * diag(N)
+ truth$alpha^2 * exp(-(dxy/truth$rho)^2))
xy$z <- as.vector(rmvnorm(1, mean=rep(truth$mu,N), sigma=truth$covmat))
new_xy <- cbind(x=runif(5), y=runif(5))
sp_data <- list(N=nrow(xy),
old_xy=cbind(xy$x, xy$y),
old_z=xy$z,
n=nrow(new_xy),
new_xy=as.matrix(new_xy))
(sp_time <- system.time(
sp_fit <- sampling(new_sp,
data=sp_data,
iter=1000,
chains=2,
control=list(adapt_delta=0.99,
max_treedepth=12))))
## [20,1]
## user system elapsed
## 1.369 0.036 2.447
cbind(truth=truth[c("alpha", "rho", "delta", "mu")],
rstan::summary(sp_fit, pars=c("alpha", "rho", "delta", "mu"))$summary)
## truth mean se_mean sd 2.5% 25% 50%
## alpha 2.5 4.234435 0.08918686 2.023869 1.77594 2.775801 3.721438
## rho 0.6 0.6926219 0.01211368 0.2500757 0.3555691 0.5261485 0.6432864
## delta 0.1 0.2805626 0.007569845 0.1598728 0.0942439 0.1729031 0.2409412
## mu 5 5.330636 0.09851299 2.568532 0.1735635 3.843481 5.406842
## 75% 97.5% n_eff Rhat
## alpha 5.101349 9.161582 514.9476 1.009623
## rho 0.8055962 1.368224 426.1778 1.009142
## delta 0.3386181 0.7397388 446.0414 1.004646
## mu 6.868921 9.874479 679.8029 1.000725
(descriptive) What spatial scale does abundance vary over?
Values are correlated over distances of order \(\rho=0.6926219\) units of distance.
(predictive) What are the likely (range of) abundances at new locations?
These are
## x y mean sd
## new_z[1] 0.01268409 0.3892949 4.120024 0.7242546
## new_z[2] 0.13522095 0.1532280 5.420720 0.6137691
## new_z[3] 0.55666986 0.9800330 2.687552 0.8721023
## new_z[4] 0.54442795 0.4793665 3.824439 0.6073334
## new_z[5] 0.19853665 0.6226803 2.494086 0.6191298
Model: the data are multivariate Normal,
with nearby locations have more correlated values.
To do this, we set the entries of the covariance matrix equal to a decreasing function of distance;
then, we could estimate values at unobserved locations.
We could do the same thing in a network model, calculating “nearby” and “distance” with the network.
A network:
Here is the adjacency matrix of the network:
## 1 2 3 4 5 6 7 8 9 10 11 12
## 1 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## 3 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## 4 FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## 7 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 8 FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 9 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## 10 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 11 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 12 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 13 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 14 FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## 15 FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## 16 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 17 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## 18 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 19 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## 20 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 13 14 15 16 17 18 19 20
## 1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## 4 FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## 5 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## 7 TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## 8 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 9 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 10 FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE
## 11 FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## 12 FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
## 13 FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## 14 FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## 15 TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## 16 TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## 17 FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## 18 FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## 19 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## 20 TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
Our model has random variates at each node of the network, correlated with neighbors.
Suppose that
Every node has mean \(\mu\) and variance \(\alpha^2\).
Neighboring nodes have covariance \(\alpha^2 \times \rho\).
Then, the covariance matrix is
Now we can simulate values on the network:
## Error in rmvnorm(1, mean = rep(truth$mu, N), sigma = truth$covmat): sigma must be a symmetric matrix
Suppose the network is unknown, and infer it.
Suppose the network is known, and use it as a smoothing prior to help infer noisy or unobserved values at some of the nodes.
Simulate autocorrelated values on a landscape.
Parameterize and fit a spatial, multivariate Normal model.
Interpolate a noisy landscape to unobserved locations.