ecology, evolution, conservation, and simulation
Peter Ralph
Program in Ecology, Evolution & Conservation Biology
UIUC // 3 February 2021
Human G6PD variants (Howes et al 2013)
How much of the genome is under selection?
Hudson 1994; Cutter & Payseur 2013; Corbett-Detig et al 2015
The indirect effects of selection on genomic locations that are linked to the sites under selection by a lack of recombination.
The data:
\[ \begin{aligned} \pi &= \text{ (within-pop genetic distance) } \\ d_{xy} &= \text{ (between-pop genetic distance) } \end{aligned} \]
But: what kind of selection?
To test theories and fit models, we need simulations with realistic
For a set of sampled chromosomes, at each position along the genome there is a genealogical tree that says how they are related.
The tree sequence is a way to describe this, er, sequence of trees.
Kelleher, Etheridge, and McVean introduced the tree sequence data structure for a fast coalescent simulator, msprime.
stores sequence and genealogical data very efficiently
tree-based sequence storage closely related to haplotype-matching compression
tskit
: python/C tools
If we record the tree sequence that relates everyone to everyone else,
after the simulation is over we can put neutral mutations down on the trees.
Since neutral mutations don’t affect demography,
this is equivalent to having kept track of them throughout.
This means recording the entire genetic history of everyone in the population, ever.
It is not clear this is a good idea.
But, with a few tricks…
For example:
Runtime: 8 hours
\(N=10,000\) diploids
burn-in for \(10N\) generations
population split, with either:
Conclusions:
How strongly does selection enhance or constrain genetic variation?
How much genetic variation is locally adaptive?
How will populations respond to changes in the future?
train inference methods
predict management outcomes
do power analyses
develop intuition
The Co-Lab:
Funding:
Other collaborators: