Peter Ralph
Center for Genome Research and Biocomputing
Oregon State // 26 May 2021
What forces contribute to the variation in genetic diversity along the genome? (explaining variation in diversity)
Which locations along the genome have been the recent targets of positive natural selection? (identifying sweeps)
Where did this individual come from? (inferring location)
How do organisms disperse across the landscape? (dispersal maps)
Whole genomes, thousands of samples,
from millions of individuals.
Demography:
Geography:
History:
Natural selection:
Genomes:
by Ben Haller and Philipp Messer
Ben Haller
Demography:
Geography:
History:
Natural selection:
Genomes:
Idea: if we record how everyone is related to everyone else,
we can put down neutral mutations after the simulation is over instead of carrying them along.
Since neutral mutations don’t affect demography,
this is equivalent to having kept track of them throughout.
For a set of sampled chromosomes, at each position along the genome there is a genealogical tree that says how they are related.
The succinct tree sequence
is a way to succinctly describe this, er, sequence of trees
and the resulting genome sequences.
jerome kelleher
Who inherits from who.
Records: interval (left, right); parent node; child node.
The ancestors those happen in.
Records: time ago (of birth); ID (implicit).
When state changes along the tree.
Records: site it occured at; node it occurred in; derived state.
Where mutations fall on the genome.
Records: genomic position; ancestral (root) state; ID (implicit).
The result: an encoding of the genomes and all the genealogical trees.
100Mb chromosomes; from Kelleher et al 2018, Inferring whole-genome histories in large population datasets, Nature Genetics
Genotype matrix:
\(N \times M\) things.
Tree sequence:
\(O(N + T + M)\) things
from Ralph, Thornton and Kelleher 2019, Efficiently summarizing relationships in large samples
If we record the tree sequence that relates everyone to everyone else,
after the simulation is over we can put neutral mutations down on the trees.
Since neutral mutations don’t affect demography,
this is equivalent to having kept track of them throughout.
From Kelleher, Thornton, Ashander, and Ralph 2018, Efficient pedigree recording for fast population genetics simulation.
and Haller, Galloway, Kelleher, Messer, and Ralph 2018, Tree‐sequence recording in SLiM opens new horizons for forward‐time simulation of whole genomes
This means recording the entire genetic history of everyone in the population, ever.
It is not clear this is a good idea.
But, with a few tricks…
For example:
Runtime: 8 hours
Hudson 1994; Cutter & Payseur 2013; Corbett-Detig et al 2015
\(N=10,000\) diploids
burn-in for \(10N\) generations
population split, with either:
Murillo Rodrigues
From Widespread selection and gene flow shape the genomic landscape during a radiation of monkeyflowers, Stankowski, Chase, Fuiten, Rodrigues, Ralph, and Streisfeld; PLoS Bio 2019.
Conclusions:
From Widespread selection and gene flow shape the genomic landscape during a radiation of monkeyflowers, Stankowski, Chase, Fuiten, Rodrigues, Ralph, and Streisfeld; PLoS Bio 2019.
from Battey et al 2020
Funding: