API for demes functions

Computing statistics from demes

moments.Demes.SFS(g, sampled_demes=None, sample_sizes=None, sample_times=None, samples=None, unsampled_n=4, gamma=None, s=None, h=None, theta=None, u=None, reversible=False, L=1)[source]

Compute the SFS from a demes-specified demographic model. demes is a package for specifying demographic models in a user-friendly, human-readable YAML format. This function automatically parses the demographic model and returns the SFS for the specified populations, sample sizes, and (optionally) sampling times.

Selection and dominance can be specified as a single value for all populations, or on a per-deme basis using a dictionary mapping deme name to the coefficient (defaults can also be set if multiple demes have the same selection or dominance coefficient). The mutation rate can be given as either a per-base rate (possibly multiplied by the sequence length), or as a population size-scaled rate. If mutation rates are not given, the SFS is scaled by 4*N_e, so that multiplying the output SFS by u results in a properly scaled SFS.

Parameters:
  • g (demes.DemeGraph) – A demes DemeGraph from which to compute the SFS.

  • sampled_demes (list of strings) – A list of deme IDs to take samples from. We can repeat demes, as long as the sampling of repeated deme IDs occurs at distinct times.

  • sample_sizes (list of ints) – A list of the same length as sampled_demes, giving the sample sizes for each sampled deme.

  • sample_times (list of scalars, optional) – If None, assumes all sampling occurs at the end of the existence of the sampled deme. If there are ancient samples, sample_times must be a list of same length as sampled_demes, giving the sampling times for each sampled deme. Sampling times are given in time units of the original deme graph, so might not necessarily be generations (e.g. if g.time_units is years)

  • unsampled_n (int, optional) – The default sample size of unsampled demes, which must be greater than or equal to 4.

  • gamma (scalar or dict) – The scaled selection coefficient(s), 2*Ne*s. Defaults to None, which implies neutrality. Can be given as a scalar value, in which case all populations have the same selection coefficient. Alternatively, can be given as a dictionary, with keys given as population names in the input Demes model. Any population missing from this dictionary will be assigned a selection coefficient of zero. A non-zero default selection coefficient can be provided, using the key _default. See the Demes exension documentation for more details and examples.

  • h (scalar or dict) – The dominance coefficient(s). Defaults to additivity (or genic selection). Can be given as a scalar value, in which case all populations have the same dominance coefficient. Alternatively, can be given as a dictionary, with keys given as population names in the input Demes model. Any population missing from this dictionary will be assigned a dominance coefficient of 1/2 (additivity). A different default dominance coefficient can be provided, using the key _default. See the Demes exension documentation for more details and examples.

  • theta (scalar or list of length 2) – The scaled mutation rate(s), 4*Ne*u. When simulating under the infinite sites model (the default mutation model), theta should be given as a scalar value greater than zero. If it is not provided, it is computed using the input value of u as 4*Ne*u. If u is not provided, then the SFS is scaled by 4*Ne, and the user can recover a properly scaled SFS by multiplying it by u or u*L. When simulating under the reversible mutation model (with reversible=True), theta may be a list of length 2 and both the forward and backward scaled mutation rates must be less than 1.

  • u (scalar or list of length 2) – The per-base mutation rate. When simulating under the infinite sites model (the default mutation model), u should be a scalar. When simulating under the reversible mutation model (with reversible=True), u may be a list of length 2, and mutation rate(s) must be small enough so that the product of 4*Ne*u is less than 1.

  • L (scalar) – The effective sequence length, which may be used along with u to set the total mutation rate. Defaults to 1, and it must be 1 when using the reversible mutation model.

Returns:

A moments site frequency spectrum, with dimension equal to the length of sampled_demes, and shape equal to sample_sizes plus one in each dimension, indexing the allele frequency in each deme from 0 to n[i], where i is the deme index.

Return type:

moments.Spectrum

moments.Demes.LD(g, sampled_demes, sample_times=None, rho=None, theta=None, r=None, u=None)[source]

Compute LD statistics from a demes-specified demographic model. demes is a package for specifying demographic models in a user-friendly, human-readable YAML format. This function automatically parses the demographic model and returns LD statistics for the specified populations and (optionally) sampling times.

No mutation or recombination rates are required. If recombination rates are omitted, only single-locus allelic diversity statistics will be computed. If mutation rates are omitted, that is, neither theta nor u are provided, the per-base mutation rate is set to 1. Thus, to recover the correctly scaled LD statistics, multiply the two-locus statistics by \(u^2\) and the single-locus statistics by \(u\).

Parameters:
  • g (demes.DemeGraph) – A demes DemeGraph from which to compute the LD.

  • sampled_demes (list of strings) – A list of deme IDs to take samples from. We can repeat demes, as long as the sampling of repeated deme IDs occurs at distinct times.

  • sample_times (list of floats, optional) – If None, assumes all sampling occurs at the end of the existence of the sampled deme. If there are ancient samples, sample_times must be a list of same length as sampled_demes, giving the sampling times for each sampled deme. Sampling times are given in time units of the original deme graph, so might not necessarily be generations (e.g. if g.time_units is years)

  • rho – The population-size scaled recombination rate(s). Can be None, a non-negative float, or a list of values. Cannot be used with Ne.

  • theta – The population-size scaled mutation rate. Cannot be used with u.

  • r (scalar or list of scalars) – The unscaled recombination rate(s). Can be None, a non-negative float, or a list of values. Recombination rates are scaled by Ne to get rho=4*Ne*u, and Ne is determined by the root population size.

  • u (scalar) – The raw per-base mutation rate. The reference effective population size Ne is determined from the demograhic model, after which theta is set to 4*Ne*u.

Returns:

A moments.LD LD statistics object, with number of populations equal to the length of sampled_demes.

Return type:

moments.LD.LDstats

Running SFS inference using demes

moments.Demes.Inference.optimize(deme_graph, inference_options, data, maxiter=1000, perturb=0, verbose=0, uL=None, log=True, method='fmin', fit_ancestral_misid=False, misid_guess=None, output_stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, output=None, overwrite=False)[source]

Optimize demography given a deme graph of initial and fixed parameters and inference options that specify which parameters to fit, and bounds or constraints on those parameters.

Parameters:
  • deme_graph – A demographic model as a YAML file in demes format. This should be given as a string specifying the path and file name of the model.

  • inference_options – A second YAML file, specifying the parameters to be optimized, parameter bounds, and constraints between parameters. Please see the documentation at https://momentsld.github.io/moments/extensions/demes.html#the-options-file

  • data – The SFS to fit, which must have pop_ids specified. Can either be a Spectrum object or the file path to the stored frequency spectrum. The populations in the SFS (as given by sfs.pop_ids) need to be present in the demographic model and have matching IDs.

  • maxiter – The maximum number of iterations to run optimization. Defaults to 1000. Note: maxiter does not seem to work with the Powell method! This appears to be a bug within scipy.optimize.

  • perturb – The perturbation amount of the initial parameters. Defaults to zero, in which case we do not perturb the initial parameters in the input demes YAML. If perturb is greater than zero, the initial parameters in the input YAML are randomly perturbed by up to perturb-fold.

  • verbose – The frequency to print updates to output_stream if greater than 1.

  • uL – The mutuation rate scaled by number of callable sites, so that theta is the compound parameter 4*Ne*uL. If given, we optimize with multinom=False, and theta is determined by taking Ne as the ancestral or root population size, which can be fit. Defaults to None. If uL is not given, we use multinom=True and we likely do not want to fit the ancestral population size.

  • log – If True, optimize over log of the parameters.

  • method – The optimization method. Available methods are “fmin”, “powell”, and “lbfgsb”.

  • fit_ancestral_misid – If True, we fit the probability that the ancestral state of a given SNP is misidenitified, resulting in ancestral/derived labels being flipped. Note: this is only allowed with unfolded spectra. Defaults to False.

  • misid_guess – Defaults to 0.01.

  • output – If given, the filename for the output best-fit model YAML.

  • overwrite – If True, overwrites any existing file with the same output name.

Output_stream:

Defaults to standard output. Can be given an open file stream instead or other output stream.

Returns:

List of parameter names, optimal parameters, and LL

moments.Demes.Inference.uncerts(deme_graph, inference_options, data, bootstraps=None, uL=None, bootstraps_uL=None, log=False, eps=0.01, method='FIM', fit_ancestral_misid=False, misid_fit=None, verbose=0, output_stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, output=None, overwrite=False)[source]

Compute uncertainties for fitted parameters, using the output YAML from moments.Demes.Inference.optimize().

Parameters:
  • deme_graph (str) – The file path to the output demes graph from optimize().

  • inference_options (str) – The same options file used in the original optimization.

  • data (moments.Spectrum) – The same data SFS used in the origional optimization.

  • bootstraps (list) – A list of bootstrap replicates of the SFS. See documentation for examples of computing block-bootstrap replicate data.

  • uL (float) – The sequence length-scaled mutation rate. This must be provided if it was used in the original optimization.

  • bootstraps_uL (list) – If uL was used in the original optimization, this must be provided. It is a list of the same length as the bootstrap replicates containing the sequence length-scaled mutation rates for the corresponding bootstrap SFS replicates, matching the index.

  • log (bool) – Defaults to False. If True, we assume a log-normal distribution of parameters. Returned values are then the standard deviations of the logs of the parameter values, which can be interpreted as relative parameter uncertainties.

  • eps (float) – Fractional stepsize to use when taking finite-difference derivatives. Note that if eps*param is < 1e-6, then the step size for that parameter will simply be eps, to avoid numerical issues with small parameter perturbations. Defaults to 0.01.

  • method (str) – Choose between either “FIM” (Fisher Information) or “GIM” (Godambe Information). Defaults to FIM. If GIM is chosen, bootstrap replicates must be provided.

  • fit_ancestral_misid (bool) – If we fit the ancestral misidentification in the original optimization, set to True.

  • misid_fit (float) – If we fit the ancestral misidentification in the original optimization, provide the best fit value here.

  • verbose (int) – If greater than zero, print the progress of iterations needed to compute uncertainties. Prints every {verbose} number of iterations.

  • output_stream – Default is sys.stdout, but can be changed using this option.

  • output (str) – If given, write the output as a tab-delimited table with parameter names, best-fit values, and standard errors as columns.

  • overwrite (bool) – If True, overwrite the output table of uncertainties.

Running LD inference using demes

moments.Demes.Inference.optimize_LD(deme_graph, inference_options, means, varcovs, pop_ids=None, rs=None, statistics=None, normalization=None, maxiter=1000, perturb=0, verbose=0, log=True, method='fmin', output_stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, output=None, overwrite=False)[source]

Optimize demography given a deme graph of initial and fixed parameters and inference options that specify which parameters to fit, and bounds or constraints on those parameters.

Parameters:
  • deme_graph – A demographic model as a YAML file in demes format. This should be given as a string specifying the path and file name of the model.

  • inference_options – A second YAML file, specifying the parameters to be optimized, parameter bounds, and constraints between parameters. Please see the documentation at https://momentsld.github.io/moments/extensions/demes.html#the-options-file

  • means – The list of average normalized LD and H statistics, as produced by the parsing function moments.LD.Parsing.bootstrap_data(region_data).

  • varcovs – The list of variance-covariance matrices for data within each recombination bin, as produced by the parsing function moments.LD.Parsing.bootstrap_data(region_data).

  • pop_ids – The list of population names corresponding to the data.

  • rs – A list of recombination bin edges, defining the recombination distance bins.

  • statistics – A list of two lists, the first being the LD statistics present in the data, and the second the list of single-locus statistics present in the data. WARNING: This option should only be used if there are missing or masked statistics in your data, aside from the normalizing pi2 statistic! If there are no missing or removed statistics other than the normalization statistic, this option should not be used.

  • normalization – The name of the population that was used to normalize the data. See documentation for examples specifying each of these arguments.

  • maxiter – The maximum number of iterations to run optimization. Defaults to 1000. Note: maxiter does not seem to work with the Powell method! This appears to be a bug within scipy.optimize.

  • perturb – The perturbation amount of the initial parameters. Defaults to zero, in which case we do not perturb the initial parameters in the input demes YAML. If perturb is greater than zero, the initial parameters in the input YAML are randomly perturbed by up to perturb-fold.

  • verbose – The frequency to print updates to output_stream if greater than 1.

  • log – If True, optimize over log of the parameters.

  • method – The optimization method. Available methods are “fmin”, “powell”, and “lbfgsb”.

  • output – If given, the filename for the output best-fit model YAML.

  • overwrite – If True, overwrites any existing file with the same output name.

Output_stream:

Defaults to standard output. Can be given an open file stream instead or other output stream.

Returns:

List of parameter names, optimized parameter values, and LL

moments.Demes.Inference.uncerts_LD(deme_graph, inference_options, means, varcovs, bootstraps=[], pop_ids=None, rs=None, statistics=None, normalization=None, log=False, eps=0.01, method='FIM', verbose=0, output_stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, output=None, overwrite=False)[source]

Compute uncertainties for fitted parameters, using the output YAML from moments.Demes.Inference.optimize_LD().

moments.Demes.Inference.compute_bin_stats(g, sampled_demes, sample_times=None, rs=None)[source]

Given a list of per-base recombination rates defining recombination bin edges, computes expected LD statistics within each bin.

Parameters:
  • g – A demes-formatted demographic model.

  • sampled_demes – List of populations to sample.

  • sample_types – Optional list of sample times for each population.

  • rs – The list of bin edges, as an array of increasing values. Bins are defined using adjacent values, so that if rs has length n, there are n-1 bins.

Returns:

An LDstats object.

Manipulating demes models

moments.Demes.DemesUtil.slice(g, t)[source]

Slice a Demes model at a given time, returning the portion of the demographic model above the given time, and all model times are shifted to the specified slice time.

Parameters:
  • g (demes.Graph) – The input demes model.

  • t (scalar) – The time in the past at which to slice the graph.

Returns:

A new demes model with demography more recent than the specified time removed and time shifted to that time t is time “zero” in the new model.

Return type:

demes.Graph

moments.Demes.DemesUtil.swipe(g, t)[source]

Returns a new demes graph with demography above the given time removed. Demes that existed before that time are removed, and demes that overlap with that time are given a constant size equal to their size at that time extending into the past.

Parameters:
  • g (demes.Graph) – The input demes model.

  • t (scalar) – The time at which to erase preceding demographic events.

Returns:

A new demes model with demographic events above the specified time removed.

Return type:

demes.Graph

moments.Demes.DemesUtil.rescale(g, Q=1)[source]

Rescale a demes model by scaling factor Q. This rescaling is done so that compount parameters (e.g., Ne*m) remain constant. In the new model, population sizes are Ne/Q, model times are ‘T/Q’, and migration rates are m*Q.

For example, setting Q=10 will reduce all population sizes by a factor of \(1/10\), all times will be reduced by that same amount, and migration rates will increase by a factor of 10 so that the product 2*Ne*m remains constant.

When simulating with mutation and recombination rates, or with selection, those values should also be scaled by \(Q\) to ensure that compound parameters remain constant.

Note: As of version 1.2.3, the meaning of Q has been inverted to match standard convention for this scaling parameter.

Parameters:
  • g – A demes demographic model.

  • Q – The scaling factor. Population sizes and times are scaled by dividing values by Q. Migration rates are scaled by multiplying by Q. Admixture and ancestry proportions are unchanged, though the timing of those events are scaled. Generation times and units are unchanged.

Type:

demes.Graph

Type:

scalar

Returns:

A new, rescaled demes demographic model.

Return type:

demes.Graph