Peter Ralph
Center for Genome Research and Biocomputing
Oregon State // 26 May 2021
What forces contribute to the variation in genetic diversity along the genome? (explaining variation in diversity)
Which locations along the genome have been the recent targets of positive natural selection? (identifying sweeps)
Where did this individual come from? (inferring location)
How do organisms disperse across the landscape? (dispersal maps)
Whole genomes, thousands of samples,
from millions of individuals.
Demography:
Geography:
History:
Natural selection:
Genomes:
by Ben Haller and Philipp Messer
Demography:
Geography:
History:
Natural selection:
Genomes:
Idea: if we record how everyone is related to everyone else,
we can put down neutral mutations after the simulation is over instead of carrying them along.
Since neutral mutations don’t affect demography,
this is equivalent to having kept track of them throughout.
For a set of sampled chromosomes, at each position along the genome there is a genealogical tree that says how they are related.
The succinct tree sequence
is a way to succinctly describe this, er, sequence of trees
and the resulting genome sequences.
Who inherits from who.
Records: interval (left, right); parent node; child node.
The ancestors those happen in.
Records: time ago (of birth); ID (implicit).
When state changes along the tree.
Records: site it occured at; node it occurred in; derived state.
Where mutations fall on the genome.
Records: genomic position; ancestral (root) state; ID (implicit).
The result: an encoding of the genomes and all the genealogical trees.
Genotype matrix:
\(N \times M\) things.
Tree sequence:
\(O(N + T + M)\) things
If we record the tree sequence that relates everyone to everyone else,
after the simulation is over we can put neutral mutations down on the trees.
Since neutral mutations don’t affect demography,
this is equivalent to having kept track of them throughout.
This means recording the entire genetic history of everyone in the population, ever.
It is not clear this is a good idea.
But, with a few tricks…
For example:
Runtime: 8 hours
Hudson 1994; Cutter & Payseur 2013; Corbett-Detig et al 2015
\(N=10,000\) diploids
burn-in for \(10N\) generations
population split, with either:
Conclusions:
Funding: