Peter Ralph
University of Oregon
Institute of Ecology and Evolution
5th Workshop Probability and Evolution
slides: github:petrelharp/cirm_2021_talk
Joint work with:
Alison Etheridge
Tom Kurtz
Ian Letter
Terence Tsui
title text: “On one hand, every single one of my ancestors going back billions of years has managed to figure it out. On the other hand, that’s the mother of all sampling biases.”
image by Graham Coop
… describes the relationship of genetic distance and geographic distance, where \[\begin{aligned} d(i,j) &= \text{(genetic distance between individuals $i$ and $j$)} \\ &= \frac{\text{number of nucleotide differences}}{\text{length of the genome}} . \end{aligned}\]
The Wright-Malecót formula says that in 2D: \[\begin{aligned} \mathbb{E}[1 - d(i, j)] &\approx \frac{1}{\mathcal{N} + \log(\sigma_e / (\kappa \sqrt{2\mu}))} K_0\left( \frac{\sqrt{2\mu}|x_i-x_j|}{\sigma_e} \right) . \end{aligned}\] (see e.g., Barton, Depaulis & Etheridge (2002); Smith & Weissman 2020)
But: what’s \(\sigma_e\)?
The lineage of a bit of modern genome is \[\begin{aligned} L_t = \text{(location of the genetic ancestor at time $t$ ago)} . \end{aligned}\]
Statistical properties of \(L\) tell us about:
gene flow
long-term fitness
\(\sigma_e\)
The dispersal distance is \[\begin{aligned} \sigma^2 &= \text{(mean squared displacement} \\ &\qquad \text{between parent and offspring)} \end{aligned}\] and the effective dispersal distance is \[\sigma^2_e = \lim_{t \to \infty} \frac{1}{t} \mathbb{E}[L_t^2] .\]
discrete space and time,
varying population sizes,
nonoverlapping generations, with movement only at reproduction.
Suppose: \[\begin{aligned} N(x,t) &= \text{(number of individuals $t$ generations ago at $x$)} \\ m(x, y) &= \text{(proportion of seeds from $x$ that go to $y$)} \end{aligned}\]
… and surviving individuals are uniform choices out of available seeds.
and so \[\begin{aligned} \mathbb{P}\{ L_t = y \;|\; L_{t-1} = x \} &= \frac{ N(y,t) m(y, x) } { \sum_z N(z,t) m(z, x) } . \end{aligned}\]
Assume \(L_t\) is Markov given \(N\) (generally requires \(N\) to be large).
Assume a 1D grid of populations, and
\[\begin{aligned} P = \begin{bmatrix} 1-p & p \\ q & 1-q \\ \end{bmatrix} . \end{aligned}\]
And,
\[\begin{aligned} \text{Note: forwards dispersal has } \sigma^2 = \frac{2}{3} . \end{aligned}\]
Then \(L_{t+1} \; | L_t = x\):
habitat | probability | \(\mathbb{E}[(L_1 - L_0)^2]\) |
---|---|---|
GGG | \(\frac{p}{p+q} (1-q)^2\) | 2/3 |
GGB/BGG | \(\frac{p}{p+q} (1-q)q\) | 1/2 |
GBB/BBG | \(\frac{p}{p+q} q(1-p)\) | 1 |
GBG | \(\frac{p}{p+q} qp\) | 1 |
BGB | \(\frac{q}{p+q} pq\) | 0 |
BBB | \(\frac{q}{p+q} (1-p)^2\) | 2/3 |
\[\begin{aligned} \mathbb{E}[(L_1 - L_0)^2] = \frac{2}{3} + \frac{ pq } {3 (p + q) } \left(1 - (p + q)\right) . \end{aligned}\]
\[\begin{aligned} &\sigma^2_e - \sigma^2 \\ &\qquad {} = \frac{ pq } {3 (p + q) } \left(1 - (p + q)\right) . \end{aligned}\] and so \[\begin{aligned} \sigma_e > \sigma \qquad \text{iff} \quad p + q < 1, \end{aligned}\]
\[\begin{aligned} &\sigma^2_e - \sigma^2 \\ &\qquad {} = \frac{ pq } {3 (p + q) } \left(1 - (p + q)\right) . \end{aligned}\] and so \[\begin{aligned} \sigma_e > \sigma \qquad \text{iff} \quad p + q < 1, \end{aligned}\]
So: the random environment can either slow down or speed up a lineage
… even with no temporal correlations!
Now suppose
The environment is a stationary, reversible Markov process on \(\{G, B\}^\mathbb{Z}\).
For each \(t\), the environment \(N(t, \cdot)\) is a stationary Markov chain with transition matrix as before.
Now suppose
The environment is a stationary, reversible Markov process on \(\{G, B\}^\mathbb{Z}\).
For each \(t\), the environment \(N(t, \cdot)\) is a stationary Markov chain with transition matrix as before.
Numerically: changing temporal correlations can also increase or decrease speed.
Next: continuous space (and, time).
The (forwards) dispersal distance is: \[ \sigma^2 = \int |y-x|^2 q(x, dy) .\]
Birth, establishment, and establishment rates depend on local population densities (like Bolker-Pacala):
Vital rates at \(x\) will depend on \(\eta\) through \(p_\epsilon * \eta(x)\).
Mortality increases with crowding: \(\gamma\) and \(r\) are constant, while \[\begin{aligned} \mu(x, \eta) &= \mu(x) \left( 1 - \frac{1}{1 + \exp(p_\epsilon * \eta(x))} \right) . \end{aligned}\]
Fecundity decreasing with crowding: \(\mu\) and \(r\) are constant, while \[\begin{aligned} \gamma(x, \eta) &= \gamma(x) \left( \frac{1}{1 + \exp(p_\epsilon * \eta(x))} \right) . \end{aligned}\]
\[\begin{aligned} & \frac{1}{N} \times N \eta(y) \gamma(y, \eta) \; \hphantom{q(y, dx)} &\qquad &\text{(birth at $y$)} \\ & \hphantom{\frac{1}{N}} \int \; \hphantom{N \eta(y) \gamma(y, \eta)} \; q(y, dx) r(x, \eta) &\qquad &\text{(dispersal to $x$)} \end{aligned}\]
and
\[\begin{aligned} & {} - \frac{1}{N} \times N \eta (x) \mu(x, \eta) & \qquad &\text{(death)} \end{aligned}\]
So: for a test function \(f\), \[\begin{aligned} & \lim_{t \searrow 0} \frac{1}{t} \left. \mathbb{E} \left[ \int f(x) \eta_{t}(dx) - \int f(x) \eta_0(dx) \;|\; \eta_0 = \eta \right] \right\vert_{t=0} \\ &\qquad = \int \int f(x) r(x, \eta) q(y, dx) \gamma(y, \eta) \eta(dy) \\ &\qquad \qquad {} - \int f(x) \mu(x, \eta) \eta(dx) . \end{aligned}\]
\[\begin{aligned} &{} = \int \left\{ \int \left( f(x) r(x, \eta) - f(y) r(y, \eta) \right) q(y, dx) \right\} \gamma(y, \eta) \eta(dy) \\ &\qquad \qquad {} + \int f(x) \left\{ r(x, \eta) \gamma(x, \eta) - \mu(x, \eta) \right\} \eta(dx) . \end{aligned}\]
Since reproduction produces single offspring,
\[\begin{aligned} & \lim_{t \searrow 0} \frac{1}{t} \left. \mathbb{E} \left[ \left( \int f(x) \eta_{t}(dx) - \int f(x) \eta_0(dx) \right)^2 \;|\; \eta_0 = \eta \right] \right\vert_{t=0} \\ &\quad {} = \frac{1}{N} \left\{ \int \int f^2(x) r(x, \eta) q(y, dx) \gamma(y, \eta) \eta(dy) \right. \\ &\qquad \qquad \left. {} + \int f^2(x) \mu(x, \eta) \eta(dx) \right\} \end{aligned}\]
\[\begin{aligned} {} \propto \frac{1}{N} . \hphantom{ \left\{ \int \int f^2(x) r(x, \eta) q(y, dx) \gamma(y, \eta) \eta(dy) \right\} } \end{aligned}\]
As \(N \to \infty\), also rescale time by \(\theta\), and let e.g., \[ r_\theta(x, \eta) \to r(x, \eta) \qquad \text{as } \theta \to \infty . \]
As \(\theta \to \infty\), to see lineages moving, we take \(\sigma = 1/\sqrt{\theta}\), so that \[ \theta \int (g(y) - g(x)) q_\theta(x, dy) \to \frac{1}{2} \Delta g(x) , \]
and population density changes on a time scale of \(\theta\) generations: \[ \theta\left( r_\theta(x, \eta) \gamma_\theta(x, \eta) - \mu_\theta(x, \eta) \right) \to F(x, \eta) . \]
Suppose also the population measure converges:
\[ \eta \to \Xi \qquad \text{as} \qquad \theta, N \to \infty \]
and so \[\begin{aligned} & \lim_{t \searrow 0} \frac{1}{t} \left. \mathbb{E} \left[ \int f(x) \eta_{t}(dx) - \int f(x) \eta_0(dx) \;|\; \eta_0 = \eta \right] \right\vert_{t=0} \\ &\qquad \to \int \left\{\vphantom{\int} \gamma(x, \Xi) \Delta\left( f(\cdot) r(\cdot, \Xi) \right)\!(x) \right. \\ &\qquad \qquad \qquad \left. \vphantom{\int} + f(x) F(x, \Xi) \right\} \Xi(dx) . \end{aligned}\]
The limit acts like \[
``\; \dot \Xi = r \Delta(\gamma \Xi) + F \Xi . \text{''}
\] … but recall that the coefficients are “nonlocal”:
\(r\) may be a function of \(p_\epsilon * \Xi\).
Some options for \(\Xi\):
Stochastic, nonlocal coefficients. (superprocess limit)
Deterministic, nonlocal coefficients.
Deterministic, local coefficients. (PDE limit)
Quadratic variation of the limit is nonzero if \[ \frac{N}{\theta} \to \rho, \] for some \(\rho > 0\).
In other words, since \(\sigma = 1/\sqrt{\theta}\), Wright’s neighborhood size is: \[\begin{aligned} \mathcal{N} &:= \text{(mean number of individuals within distance $\sigma$)} \\ &\propto N \sigma^d \end{aligned}\] … which is equal to \(\rho\) in \(d=2\).
If the limiting measure has density \(\Xi_t(x) dx\), then it’s a weak solution to \[\begin{aligned} \frac{d}{dt} \Xi_t(x) &= r(x, \Xi) \Delta\left( \gamma(\cdot, \Xi_t) \Xi_t(\cdot) \right)(x) + F(x, \Xi_t) \Xi_t(x) . \end{aligned}\]
i.e., \[\begin{aligned} \dot \Xi = r \Delta\left( \gamma \Xi \right) + F \Xi . \end{aligned}\]
Recall that e.g., \[\begin{aligned} r(x, \eta) = r(p_\epsilon * \eta(x)) . \end{aligned}\]
… can we also take \(\epsilon \to 0\), getting \[\begin{aligned} \frac{d}{dt} \Xi_t(x) &= r(\Xi_t(x)) \Delta\left( \gamma(\Xi_t(\cdot)) \Xi_t(\cdot) \right)\!(x) \\ &\qquad {} + F(\Xi_t(x)) \Xi_t(x) ? \end{aligned}\]
For fixed \(\epsilon\), we have that \[ \eta_\epsilon \to \Xi_\epsilon \qquad \text{as } N, \theta \to \infty. \]
We need \(p_\epsilon * \eta_\epsilon(x) \to \Xi(x)\), e.g., \[\begin{aligned} & (p_\epsilon * \eta_\epsilon - p_\epsilon * \Xi_\epsilon) \hphantom{+ (p_\epsilon * \Xi_\epsilon - p_\epsilon * \Xi) + (p_\epsilon * \Xi - \Xi)} \\ &\hphantom{(p_\epsilon * \eta_\epsilon - p_\epsilon * \Xi_\epsilon)} + (p_\epsilon * \Xi_\epsilon - p_\epsilon * \Xi) \hphantom{+ (p_\epsilon * \Xi - \Xi) } \\ &\hphantom{(p_\epsilon * \eta_\epsilon - p_\epsilon * \Xi_\epsilon) + (p_\epsilon * \Xi_\epsilon - p_\epsilon * \Xi)} + (p_\epsilon * \Xi - \Xi) \\ &\qquad \to 0 \qquad \text{as} \qquad N, \theta \to \infty, \qquad \epsilon \to 0 . \end{aligned}\]
Goal: rescale population densities while retaining the notion of lineages.
Method: a lookdown construction, following Kurtz & Rodrigues 2011 and Etheridge & Kurtz 2019.
… this also gives us tightness for the population processes themselves!
Theorem: Suppose that for fixed \(\epsilon\), \[\begin{aligned} r(x, \eta) &= r(x), \\ \gamma_\theta(x, \eta) &= \gamma(p_\epsilon * \eta(x)) + \frac{G(p_\epsilon * \eta(x))}{\theta r(x)} \\ \mu_\theta(x, \eta) &= \gamma(p_\epsilon * \eta(x)) + \frac{H(p_\epsilon * \eta(x))}{\theta} \\ q_\theta(x, dy) &= p_{1/\theta}(x, dy) , \end{aligned}\] with \(\gamma\), \(G\), \(H\), and \(r\) uniformly bounded and Lipschitz continuous, and \(0 < r_0 < r(x) \le 1\) twice diff’able. Then a lookdown construction with maximum level \(N \to \infty\), with \(\theta \to \infty\) and \(\theta / N \to 0\), converges to a measure-valued process \((\eta_t^\infty)_{t \ge 0}\).
The limit is a Cox measure with intensity a product of \(\Xi_t \times \Lambda\), and for every continuous, bounded \(f : \mathbb{R}^d \to \mathbb{R}_+\), \[\begin{aligned} & \int f(x) \Xi_t(dx) - \int f(x) \Xi_0(dx) \\ &= \int_0^t \int \gamma(p_\epsilon * \Xi_s(x)) \Delta \left( f(\cdot) r(\cdot) \right)(x) \\ {}&\qquad + f(x) \left\{ G(p_\epsilon * \Xi_s(x)) - H(p_\epsilon * \Xi_s(x)) \right\} \Xi_s(dx) ds . \end{aligned}\]
Theorem: Suppose that \(r(x) = 1\) and \[\begin{aligned} \gamma_\theta(x, \eta) &= 1 + \frac{G(p_\epsilon * \eta(x))}{\theta} \\ \mu_\theta(x, \eta) &= 1 + \frac{H(p_\epsilon * \eta(x))}{\theta} \\ q_\theta(x, dy) &= p_{1/\theta}(x, dy) , \end{aligned}\] with \(G\), \(H\) positive, Lipschitz continuous with \(G(1) = H(1)\) and such that \[ G(u), H(u) \le C \left( 1 + u^p \right) . \]
Then a lookdown construction with maximum level \(N\), as \(N, \theta \to \infty\), if \(\theta/N \to 0\) and \[ \frac{\theta}{N \epsilon^{3dp/2}} + \frac{1}{\theta \epsilon^{dp/2}} \to 0 , \] converges to a measure-valued process \((\eta_t^\infty)_{t \ge 0}\).
The limit is a Cox measure with intensity a product of \(\Xi_t \times \Lambda\), where \(\Xi_t\) has a density that is a weak solution to \[\begin{aligned} \frac{d}{dt} \Xi &= \Delta \Xi + (G(\Xi) - H(\Xi)) \Xi, \end{aligned}\] for \(x \in \mathbb{R}^d\) and \(t > 0\) and appropriate initial conditions.
Gilia Patterson
death: \(\mu = 0.3\) per generation
establishment: \(r = 0.7\)
dispersal: Gaussian with SD \(\sigma\)
local density: in circle of radius \(\epsilon\)
reproduction: with \(K=2\), \(\lambda=3\), \[ \gamma = \frac{\lambda}{1 + \text{(local density)}/K} \]
non-spatial equilibrium density: \[ K \left( \frac{\lambda}{1 - r} - 1 \right) .\]
SLiM code at github.com/petrelharp/cirm_2021_talk
If a deterministic, local limit holds, with dispersal \(N(m, \sigma^2 I)\), lineages should move in a stationary population density \(n(x) = d\Xi/dx\) as \[\begin{aligned} dL_t = r(L_t) \gamma(L_t) \left\{ \left( 2\sigma^2 \nabla \log(n\gamma)(L_t) - m \right) dt + \sigma dB_t \right\} . \end{aligned}\]
i.e., as Brownian motion run at speed \(\sigma r(y) \gamma(y)\) in the potential \[\begin{aligned} n(y) \gamma(y) e^{-my/(2\sigma^2)} , \end{aligned}\] … which has stationary distribution \[\begin{aligned} \frac{n(y)}{r(y)} e^{-my/(2\sigma^2)} . \end{aligned}\]
Suppose the population density has a traveling wave profile: \(\Xi_t(dx) = n(t,x) dx\) with \[\begin{aligned} n(x,t) = w(x - ct), \end{aligned}\] and a determinstic, local limit holds:
… then \(L + ct\) has generator \[\begin{aligned} \phi &\mapsto r(x) \gamma(x) \left\{ 2 \nabla \log(\gamma w)(x) \cdot \nabla \phi(x) + \Delta \phi(x) \right\} \\ &\qquad {} + c \cdot \nabla \phi(x) . \end{aligned}\]
For instance, take the porous medium equation with logistic growth (in 1D): \[\begin{aligned} \partial_t n_t(x) = \partial_x^2 [n_t(x)^2] + n_t(x) (1 - n_t(x)) , \end{aligned}\] with stable solution \[\begin{aligned} n_t(x) = \left( 1 - \exp\left( \frac{1}{2} (x - t) \right)\right)_+ \end{aligned}\]
To get this, we want \(r=1\) and \[\begin{aligned} \gamma(x, n) &= n(x) \\ \mu(x, n) &= 2 n(x) - 1 . \end{aligned}\]
…so in the stationary frame, the lineage’s generator is \[\begin{aligned} \phi &\mapsto w(x) \left( \phi_{xx} + 4 (\log w)_x \phi_x \right) + \phi_x \\ &= \left(1 - e^{x/2}\right) \phi_{xx} + \left(1 - 2 e^{x/2}\right) \phi_x \qquad \text{on } x < 0. \end{aligned}\]
The lineage has stationary distribution \[\begin{aligned} \pi(x) \propto e^x \left(1 - e^{x/2}\right) \end{aligned}\] for \(x < 0\).
… in constrast to the Fisher-KPP.
When is \(\sigma_e > \sigma\)?
How badly are lineages non-Markov?
What can (and can’t) inferences about reverse-time tell us about forwards-time?
image by CJ Battey
and
Simulation methods: