.. _sec_diversity:

.. jupyter-execute::
    :hide-code:

    import matplotlib, matplotlib.pylab as plt
    plt.rcParams['legend.title_fontsize'] = 'xx-small'
    matplotlib.rc('xtick', labelsize=9)
    matplotlib.rc('ytick', labelsize=9)
    matplotlib.rc('axes', labelsize=12)
    matplotlib.rc('axes', titlesize=12)
    matplotlib.rc('legend', fontsize=10)

================================
Demography and genetic diversity
================================

.. todo:: This module has not been completed.

Intro - intuition about how demography is expected to affect summary statistics
helps in hypothesizing historical scenarios to explain observed patterns of genetic
diversity, or trouble-shooting poor fits of models to data. It's also important
to understand how demographic parameters can be confounded and different evolutionary
scenarios can give rise to similar patterns of genetic diversity.

*****************************
Measures of genetic diversity
*****************************

Many of the common single-site diversity statistics we are familiar with in population
genetics are summaries of the SFS. 

For single populations, diversity within a
population is very often reported as the average heterozygosity (typically denoted
:math:`\pi` or :math:`H`): the probability that two genome copies (i.e. samples) differ
in state at a given locus. Suppose our SFS stores the distribution of allele frequencies
over :math:`L` loci for :math:`n` samples. Then the expected or average
:math:`\pi` can be found by summing across allele frequency bins in the SFS and
computing the probability that two randomly drawn copies carry different alleles for
the given allele frequency:

.. math::

    \mathbf{E}[\pi] = \frac{1}{L} \sum_{i=1}^{n-1} 2\frac{i(n-i)}{n(n-1)} \text{SFS}(i)

Under the standard neutral model with steady-state demography, diversity is expected
to be equal to the scaled mutation rate:

.. jupyter-execute::

    import moments

    theta = 0.001 # the per-base scaled mutation rate, 4*Ne*u
    n = 30 # the haploid sample size
    fs = theta * moments.Demographics1D.snm([n])

    print("Theta:", theta)
    print("Diversity:", f"{fs.pi():0.4f}")


****************************
Single-population demography
****************************

Store values every x generations after instantaneouls double of size:

.. jupyter-execute::

    Ne = 1000

    singletons = []
    doubletons = []
    tripletons = []
    diversity = []

    fs = moments.Demographics1D.snm([20])
    singletons.append(fs[1])
    doubletons.append(fs[2])
    tripletons.append(fs[3])
    diversity.append(fs.pi())

    for gens in range(Ne):
        fs.integrate([2], 4/2/Ne)
        singletons.append(fs[1])
        doubletons.append(fs[2])
        tripletons.append(fs[3])
        diversity.append(fs.pi())

    import matplotlib.pylab as plt
    fig = plt.figure(1)
    ax = plt.subplot(1, 1, 1)
    tt = [4 * t for t in range(Ne + 1)]
    ax.plot(tt, singletons / singletons[0], label="Singletons")
    ax.plot(tt, doubletons / doubletons[0], label="Doubletons")
    ax.plot(tt, tripletons / tripletons[0], label="Tripletons")
    ax.plot(tt, diversity / diversity[0], label="Diversity (pi)")
    ax.set_xlabel("Generations after expansion")
    ax.legend(frameon=False)


- Tajima's D and pi over time with size changes
- dynamics of allele frequency classes with size changes

********************
Multiple populations
********************

- Comparison to some classical result in an IM model?
- m-T confounding in heatmap of Fst
- Fst with small sizes vs large divergence
- pi over time in OOA model