Gametic genealogy
A gametic genealogy is a convenient mathematical
formalism of the genealogy of a population from the perspective of
gametes. Mathematically, it is a quadruple
(Gam,Mate,Par,Fert)
with components
Gam,
the set of underlying gametes,
Mate,
the set of zygotes formed by the fusion of egg gametes and sperm
gametes,
Par,
a mapping from child gametes to parent zygotes, and
Fert,
a mapping from zygotes to fertilization time.
For convenience, given a gametic genealogy,
Gam0
denotes the set of egg gametes,
Gam1
denotes the set of sperm gametes, and
Mate∗
denotes the mapping from gametes to the zygotes they formed during
fertilization.
Formally, a gametic genealogy must satisfy the
following conditions.
Gam0∪Gam1=Gam
and Gam0∩Gam1=∅.
Mate⊂Gam0×Gam1
and forms a one-to-one mapping between
Gam0
and Gam1.
Par
is a function C↦Mate,
where C
is a subset of Gam
representing child gametes.
Fert
is a function Mate↦R
such that for all child gametes g∈domPar,
Fert(Mate∗(g))>Fert(Par(g)) .
domPar
denotes the domain of Par,
that is, the set of child gametes.
Gametic lineage space
A gametic lineage space is a mathematical
formalism representing the lines of transmission of genetic
information via gametes of a population over time. It is a triplet
(Loc,G,Lin)
where
Loc
is the set of all genomic locations,
G
is a gametic genealogy (Gam,Mate,Par,Fert),
and
Lin
is a function Loc×Gam↦2Gam
mapping a genomic position in a gamete to the set of gametes that
transmitted genetic information to that position in that
gamete.
For every location ℓ∈Loc
and gamete g∈Gam,
Lin(ℓ,g)
is the lineage ending at gamete g
via locus ℓ
and it must satisfy the condition Lin(ℓ,g)={g}∪Lin(ℓ,Par(g)i) for either i=0 or i=1
when g∈domPar,
otherwise Lin(ℓ,g)={g}.
Par(g)0
and Par(g)1
are the maternal and paternal gametes, respectively, that fertilized
the parent of g.
An Embedded Ancestral Recombination Graph
An ancestral recombination graph
[1]
[2]
[3]
of a sampled population is embedded in a gametic lineage space. We
formally show the exact embedding using the gARG formalism
[4].
We start by defining the genetic legacy of a
gamete g∈Gam
for sample population S⊆Gam
to be Leg(g,S):={(ℓ,d)∈Loc×S:g∈Lin(ℓ,d)} .
This genetic legacy is the genetic material that survives in the
sample population S
originally copied from ancestral gamete g
(with or without mutations).
QUESTIONS FOR FEEDBACK: Would "gametic legacy" be a
more useful wording than "genetic legacy"? Would some word
other than "legacy" be more clear?
Genetic legacy for a sample population
S
induces the following equivalence relationship over pairs of gametes
g1
and g2
in Gam:
g1≃Sg2 := Leg(g1,S)=Leg(g2,S) .
We denote the resulting equivalence class containing
g∈Gam
as [g]S := {g′:Leg(g′,S)=Leg(g,S)} .
In this equivalence relationship, gametes are considered equivalent
if they have the same genetic legacy for the sample population
S.
A convenient choice for an embedded gARG
[4] is
to set the gARG nodes (vertices) to be the equivalence classes:
Nodes(S):={[g]S:g∈Gam} .
The (unannotated) graph edges of the gARG are chosen as
child-parent node pairs (C,P)∈Nodes(S)2
where Par(g)i∈P for some g∈C and some i∈{0,1} .
In the gARG, annotations are added for each graph edge (pair of
child and parent nodes). This annotation is the set of locations
through which genetic information has been copied from parent to
child. In the following interpretation, the only locations of interest
are those for which genetic information has been transmitted into the
sample population S.
With this interpretation, the annotation for edge
(C,P)
is {ℓ∈Loc:C∪P⊆Lin(ℓ,g) for some g∈S} .
Acknowledgements
Thanks to Daria Shipilina and Nick Barton for sharing their
preprint
[5]
and discussing the conjecture in edition 0.1 of this document relating
to their preprint.
Changes from edition 0.1
References
1.
Griffiths RC, Marjoram P (1997)
An Ancestral Recombination Graph. In: Friedman A, Miller W, Donnelly P, Tavaré S (eds) Progress in Population Genetics and Human Evolution. Springer New York, New York, NY, pp 257–270
2.
Hein J, Schierup MH, Wiuf C (2005) Gene genealogies, variation and evolution: A primer in coalescent theory. Oxford University Press, Oxford ; New York
3.
Wakeley J (2009) Coalescent theory: An introduction. Roberts & Co. Publishers, Greenwood Village, Colo
4.
Wong Y, Ignatieva A, Koskela J, et al (2022) A general and efficient representation of ancestral recombination graphs. https://archive.softwareheritage.org/swh:1:rev:7df4f1995028cc676a6c1b231e8d7a024666b5fc