\[ \newcommand{\E}{\mathbb{E}} \renewcommand{\P}{\mathbb{P}} \DeclareMathOperator{\var}{var} \]

Simulation-based inference for ecology and evolution

Peter Ralph // University of Oregon

Ecology, Evolution, and Conservation Biology
Oregon State University // 24 Jan 2024

Both the UO and OSU are located on the traditional indigenous homeland of the Kalapuya people. Kalapuya people were dispossessed of their indigenous homeland by the United States government and forcibly removed. Today, Kalapuya descendants are primarily citizens of the Confederated Tribes of Grand Ronde and the Confederated Tribes of Siletz Indians, and continue to make important contributions to their communities, to the UO, to Oregon, and to the world.

Outline

Outline of the talk

  1. Tree sequences, and simulations

  2. Spatial landscapes of coevolving traits

  3. Landscapes of genetic diversity

slides: github.com/petrelharp/corvallis-jan-2024

Inference, with genomes

Simulation-based inference

  • bespoke confirmatory simulations
  • optimization of one or two parameters
  • Approximate Bayesian Computation (ABC)
  • deep learning

What do we need

  1. Fast simulation of genomes

  2. Fast computation of summary statistics

Wish list:

Whole genomes, thousands of samples,
from millions of individuals.

Demography:

  • life history
  • separate sexes
  • selfing
  • polyploidy
  • species interactions

Geography:

  • discrete populations
  • continuous landscapes
  • barriers

History:

  • ancient samples
  • range shifts

Natural selection:

  • selective sweeps
  • introgressing alleles
  • background selection
  • quantitative traits
  • incompatibilities
  • local adaptation

Genomes:

  • recombination rate variation
  • gene conversion
  • infinite-sites mutation
  • nucleotide models
  • context-dependence
  • mobile elements
  • inversions
  • copy number variation

Enter SLiM

by Ben Haller and Philipp Messer

an individual-based, scriptable forwards simulator

Ben Haller Philipp Messer

  • Whole genomes,*
  • thousands of samples,
  • from millions of individuals.*

Demography:

  • life history
  • separate sexes*
  • selfing
  • polyploidy*
  • species interactions

Geography:

  • discrete populations
  • continuous landscapes
  • barriers*

History:

  • ancient samples
  • range shifts

Natural selection:

  • selective sweeps
  • introgressing alleles
  • background selection
  • quantitative traits*
  • incompatibilities*
  • local adaptation*

Genomes:

  • recombination rate variation
  • gene conversion
  • infinite-sites mutation
  • nucleotide models
  • context-dependence*
  • mobile elements*
  • inversions*
  • copy number variation

The tree sequence

History is a sequence of trees

For a set of sampled chromosomes, at each position along the genome there is a genealogical tree that says how they are related.

Trees along a chromosome

The succinct tree sequence

is a way to succinctly describe this, er, sequence of trees

and the resulting genome sequences.

tskit logo
jerome kelleher

jerome kelleher

How’s it work? File sizes:

file sizes
genotypes
genotypes and a tree
genotypes and the next tree

For \(N\) samples genotyped at \(M\) sites

Genotype matrix:

\(N \times M\) things.

Tree sequence:

  • \(2N-2\) edges for the first tree
  • \(\sim 4\) edges per each of \(T\) trees
  • \(M\) mutations

\(O(N + T + M)\) things

genotypes and a tree

How’s it work, 2? Computation time:

efficiency of treestat computation

Application to genomic simulations

The main idea

Genomes are very big.

But, if we record the tree sequence that relates everyone to everyone else,

after the simulation is over we can put neutral mutations down on the trees.

Since neutral mutations don’t affect demography,

this is equivalent to having kept track of them throughout.

This means recording the entire genetic history of everyone in the population, ever.

It is not clear this is a good idea.

But, with a few tricks…

A 100x speedup!

SLiM logo

Spatial coevolution: snakes and newts

Battling newts and snakes
Victoria Caudill

Victoria Caudill

Fig 1 from Reimche et al 2020
Fig 2 from Hanifin et al 2008

Why, and how?

A spatial co-evolutionary simulation

  • continuous space

  • local density-dependent mortality

  • additive, costly traits (“toxicity” and “resistance”)

  • various genetic architectures

  • snakes may encounter nearby newts, outcome depends on difference in traits:

    • snake eats newt, gets fitness benefit, or
    • snake dies, newt escapes
Screenshot of a SLiM simulation

No spatial correlation on a flat map

Add some heterogeneity

Conclusions

Fig 2 from Hanifin et al 2008
  • such large correlated differences across the landscape unlikely to be due to nonadaptive forces

  • spatial heterogeneity in ecological factors much more plausible

  • trait genetic architecture has little effect, given sufficient variation

Victoria Caudill

Victoria Caudill

Landscapes of genetic diversity

https://www.biorxiv.org/content/10.1101/2023.02.07.527547v2

Landscapes of genetic diversity within groups of species

The study system: “us”

  • High quality genomic data for 5 species
  • Deep divergence, spanning ~60N generations
  • Conservation of genes, recombination rate and other genomic features
Phylogeny of the great apes from Javier-Prado et al. (2013) Nature

Genetic diversity in the great apes: chromosome 12

Genetic divergence between the great apes: chromosome 12

Goals

  • How correlated are landscapes for closely related species?
  • How much is due to shared footprints of
    • history?
    • selection? what kind?
    • mutational processes?

A quick primer

  • Genetic diversity (\(\pi\)) and divergence (\(d_{XY}\)) are time since common ancestor multiplied by effective mutation rate: \[ \pi, d_{XY} = T_\text{MRCA} \times \mu_\text{effective} . \]
  • Higher mutation rate \(\Rightarrow\) higher \(\pi\) and faster increase of \(d_{XY}\).
  • Positive selection increases fixation rate of new mutations \(\Rightarrow\) \(d_{XY}\) goes up faster,
  • … but, it decreases \(\pi\) nearby.

linked selection: The indirect effects of selection on genomic locations that are linked to the sites under selection by a lack of recombination.

Selection tends to decrease diversity, over a distance determined by recombination rate.

Diversity correlates with recombination rate

Corbett-Detig et al

Hudson 1994; Cutter & Payseur 2013; Corbett-Detig et al 2015

How diversity landscapes change

How diversity landscapes change

How diversity landscapes change

How can landscapes stay correlated?

  • shared processes:
    • positive, negative selection
    • mutation rate variation
    • GC-biased gene conversion

How can landscapes stay correlated?

  • shared processes:
    • positive, negative selection
    • mutation rate variation
    • GC-biased gene conversion

How can landscapes stay correlated?

  • shared processes:
    • positive, negative selection
    • mutation rate variation
    • GC-biased gene conversion

SLiMulations

  • Chromosome-scale simulation (chr12)
  • selection on annotated exons (from Ensembl via stdpopsim)
  • deCode genetic map
  • branches simulated independently with SLiM and merged tskit
  • runtime: days to weeks

Back to the data

The data: empirical correlations for chromosome 12 in the great apes

No correlation under neutrality

  • correlations decay to zero quickly with split time

Mutation rate variation can contribute

  • but correlations do not decay with time

Negative selection produces weak correlations

Positive selection produces strong correlations

  • sometimes too strong

\[ \vphantom{ d_{xy} = \pi_\text{anc} \nearrow + \mu T_\text{MRCA} \searrow } \]

The best fit: both!

  • chosen from 37 distinct simulations
  • deleterious mutations: \(1.2 \times 10^{-8}\)
  • beneficial mutations: \(10^{-12}\)

Conclusions

  • positive selection most necessary for a good fit
  • best guess: \(\approx 10\%\) of fixations on human lineage due to positive selection (1/250 years);
  • \(\approx\) 70% of mutations in exons deleterious
  • fixation rate in exons reduced by half
  • GC-biased gene conversion causes “smile” at ends of chromosomes

Wrap-up

Other applications

  • history of Nebria beetles (Gilia Patterson, Sean Schoville, Yi-Ming Weng)
  • inference of mean dispersal distance (Chris Smith; disperseNN)
  • and of maps of effective density and dispersal distance (Chris Smith; mapNN)
  • inference of recombination rate maps (Jeff Adrion; ReLERNN)
  • inference of maps of population density (Gilia Patterson)
  • and individual movement
  • identification of regions under selection

Software development goals

  • open
  • welcoming and supportive
  • reproducible and well-tested
  • backwards compatible
  • well-documented
  • capacity building
popsim logo

PopSim Consortium

tskit logo

tskit.dev

SLiM logo

Thanks!

  • Andy Kern
  • Victoria Caudill
  • Murillo Rodrigues
  • Gilia Patterson
  • Chris Smith
  • Nate Pope
  • Jiseon Min
  • Clara Rehmann
  • Bruce Edelman
  • Anastasia Teterina
  • Matt Lukac
  • the rest of the Co-Lab

Funding:

  • NIH NIGMS
  • NSF DBI
  • Jerome Kelleher
  • Ben Haller
  • Ben Jeffery
  • Yan Wong
  • Elsie Chevy
  • Madeline Chase
  • Sean Stankowski
  • Matt Streisfeld

tskit logo SLiM logo

Beetles in the mountains

Nebria ingens/riversi

  • ground beetles
  • live on flowing snowmelt in the Sierra Nevada of California
  • cannot fly
  • live 1–2 years

Nebria habitat in the LGM

Good habitat followed the glaciers uphill (SDM by Yi-Ming Weng).

SDM

Goals

  • What was the spatial demographic history of Nebria since the LGM?
  • Where did the ancestors of today’s beetles live in the past?
  • Were the ancestors of N. ingens and N. riversi distinct at the LGM?
SDM

Data

Collected by Yi-Ming Weng and Sean Schoville:

  • 384 beetles at 27 sites
  • low coverage WGS
  • some presence/absence data
  • expert knowledge about good locations
  • a species distribution model

Methods overview

  1. Develop an individual-based SLiMulation.

SLiM screenshot

Methods overview

  1. Develop an individual-based SLiMulation.
  2. Use presence-absence estimates to narrow the range of parameter values based on 200 years of simulation.

SLiM screenshot

Methods overview

  1. Develop an spatial, individual-based SLiMulation.
  2. Do ABC on presence-absence estimates to narrow the range of parameter values based on 200 years of simulation.
  3. Run simulations since LGM, and do ABC on mean heterozygosity and divergence.

SLiM screenshot

Simulations

Simulations

Simulations