## Abstract

Polyploids are organisms whose genomes consist of more than two complete sets of chromosomes. Both autopolyploids and allopolyploids may display polysomic inheritance. A peculiarity of polysomic inheritance is multivalent formation during meiosis resulting in double-reduction, which occurs when sister chromatid fragments segregate into the same gamete. Double-reduction can result in gametes carrying identical-by-descent alleles and slightly increasing homozygosity. This will cause the genotypic frequencies to deviate from expected values and will thus bias the results of standard population genetic analytical methods used in molecular ecology and selective breeding. In this study, we extend existing double-reduction models to account for any even level of ploidy, and derive the symbolic expressions for genotypic frequencies via two methods. Inbreeding coefficients and heterozygosity under double-reduction and inbreeding are also calculated. Numerical solutions obtained by computer simulations are compared with analytical solutions predicted by the model to validate the model.

Polyploids are organisms whose genomes consist of more than two complete sets of chromosomes (Madlung 2013). They represent a significant proportion of plant species, with 30–80% of angiosperm species showing at least some polyploidy (Burow *et al.* 2001) and most lineages showing evidence of paleoploidy (Otto 2007). Polyploid plants can arise spontaneously in nature by several mechanisms, including meiotic or mitotic failures, and fusion of unreduced gametes (Comai 2005).

There are two distinct mechanisms of genome duplication that result in polyploidy: allopolyploidy and autopolyploidy. Autopolyploids are thought to usually arise within a species by the doubling of structurally similar homologous genomes, whereas allopolyploids arise via interspecific hybridization and subsequent doubling of non-homologous genomes (Parisod *et al.* 2010). Both autopolyploids and allopolyploids can be found among both wild and domesticated plant species. Although rare, polyploidy is also found in a few species of vertebrates such as some salmonid fish (Limborg *et al.* 2017), the weather loach (*Misgurnus anguillicaudatus*) (Zhou *et al.* 2016), the common carp (*Cyprinus carpio*) (David *et al.* 2003), and the African clawed frog (*Xenopus laevis*) (Session *et al.* 2016).

In autopolyploids, more than two homologous chromosomes can pair at meiosis, resulting in the formation of multivalents and polysomic inheritance (Rieger *et al.* 1968). A peculiarity of polysomic inheritance is the possibility that a gamete inherits a single gene copy twice, termed double-reduction (Butruille and Boiteux 2000). For example, an autotetraploid individual produces a gamete . In prophase I, crossovers can happen between the locus and the centromere, resulting in an exchange of chromatid fragments between pairing chromosomes. In a multivalent configuration, the separation of chromosomes can be either disjunctional or nondisjunctional (De Silva *et al.* 2005). For nondisjunctional separation, chromosomes that were previously paired in Prophase I segregate into a single secondary oocyte. Double-reduction occurs when fragments of sister chromatids further segregate into a single gamete.

Double-reduction arises from a combination of three major events during meiosis: crossing-over between non-sister chromatids, an appropriated pattern of disjunction, and the subsequent migration of the chromosomal segments carrying a pair of sister chromatids to the same gamete (Darlington 1929; Haldane 1930). Multivalent chromosome pairing during meiosis can also occur in allopolyploids, resulting in a mixed inheritance pattern across loci in the genome, termed segmental allopolyploidy (Stebbins 1950).

Geneticists have developed several mathematical models to simulate double-reduction. For instance, for tetrasomic inheritance, the rate of double-reduction *α* is assumed to have a minimum value of 0 under *random chromosome segregation* (RCS, Muller 1914). This increases to with *pure random chromatid segregation* (PRCS, Haldane 1930), and to with *complete equational segregation* (CES, Mather 1935). A diagram of these models can be found in Figure 1.

Double-reduction results in gametes carrying *identical-by-descent* (IBD) alleles and slightly increased homozygosity (Hardy 2016). Experiments aimed at estimating the frequency of double-reduction in autotetraploids have yielded values ranging from 0 to almost 0.30 (Wu *et al.* 2001). It has a similar effect as inbreeding, and can cause a deviation in genotypic frequencies from the Hardy-Weinberg equilibrium (Luo *et al.* 2006). Fisher (1943) first studied the Mid gene of *Lythrum salicaria*, and produced an analytical method to estimate genotypic frequencies for tetrasomic inheritance for a biallelic locus. Geiringer (1949) extended this work to a multiallelic locus for tetrasomic inheritance. However, the genotypic frequencies for polysomic inheritance for higher levels of ploidy and for a multiallelic locus have been neglected.

Genotypic frequencies are important in population genetics, molecular ecology and molecular breeding, with the *Hardy-Weinberg equilibrium* (HWE) being the basal standard for analytical methods in disomic inheritance. Several methods assume that loci conform to the HWE for the calculations of certain model parameters or statistics, for example, the likelihood of a genotype (population assignment, Bayesian clustering, parentage analysis), the statistics measuring the deviation from expectations (linkage disequilibrium test, differentiation test, heterozygote deficiency/excess test, bottle-neck effect test), and other parameters (*e.g.*, individual inbreeding coefficient, relatedness coefficient, kinship coefficient, QTL mapping). Although some analytical methods have been developed for organisms with polysomic inheritance (*e.g.*, Clark and Jasieniuk 2011; Hardy and Vekemans 2002; Huang *et al.* 2015; Meirmans and van Tienderen 2004; Pritchard *et al.* 2000), none of these methods accounts for double-reduction, and thus in the presence of double-reduction will produce misleading results. For example, in tetrasomic inheritance, may produce an offspring , and both parents are excluded as the true parents in parentage analysis.

Due to the genotypic ambiguity of polyploid species, the correct genotype cannot be obtained because some genotypes share a common electrophoretic band type in PCR-based co-dominant markers (*e.g.*, microsatellites) and the allelic dosage is unknown (Hardy 2016). For example, in autotetraploids, the three genotypes , and have a same electrophoretic band type of . We define the set of alleles within an individual as its phenotype.

Here, we extend an existing double-reduction model to incorporate the recombination rate into CES, and show two methods to derive the symbolic expressions for genotypic frequencies at equilibrium. To enable other researchers make similar calculations, we make available a C++ source code that enables the calculation of genotypic/phenotypic frequencies of gametes/zygotes at equilibrium to a maximum ploidy level of ten.

## Double-reduction models

In this paper, we assume that the population size is sufficiently large and there are no differences in selection between both zygotes and gametes. All zygotes will thus have an equal opportunity to mature and reproduce, and all gametes will have an equal opportunity to become fertilized. Therefore, the zygote frequencies are equal to the genotypic frequencies of reproducing individuals, and as a result, we use the zygote frequencies to indicate the genotypic frequencies of reproducing individuals. The double-reduction rates (*i.e.*, alpha) are assumed to be equal among reproducing individuals (*e.g.*, in females and males). Mutation and migration are not included in our model, whereas inbreeding is included for analyzing the influence of inbreeding and double-reduction on the inbreeding coefficient and for estimating heterozygosity. Organisms with odd levels of ploidy are not considered because of their inherent inability to produce euploid gametes — these organisms are sterile.

### Transitional probability from zygotes to gametes

The transitional probability from a zygote to a gamete is the probability that a zygote produces a gamete of a certain genotype, which can be used to find the double-reduction rates of PRCS and to derive the analytical expression of gamete frequencies in the first step of the non-linear method (see Non-linear method section). Such a probability can also be applied to some population genetics studies such as parentage analysis, and the calculation of segregation ratios in mapping populations (see Discussion).

If the rate of double-reduction is greater than zero, the gametes will carry *identical-by-double-reduction* (IBDR) alleles. For tetrasomic and hexasomic inheritance, there are only two and three allele copies within a gamete, respectively. Hence, it can be found at most one pair of IBDR alleles within a gamete, and a single parameter can be used to measure the rate of double-reduction.

In polysomic inheritance with a higher level of ploidy, there may be more than one pair of IBDR alleles in a gamete. Therefore, it is necessary to add some additional parameters in the segregation ratios. The transitional probability from zygotes to gametes for tetrasomic and hexasomic inheritances at a biallelic locus (the Mid gene in *Lythrum salicaria*) has been studied previously by Fisher and Mather (1943). Here, we extend the work of Fisher and Mather and use a general mathematical expression of this transitional probability. There are at most pairs of IBDR alleles within a gamete, where v is the ploidy level of the resulting zygote. Let *G* and g be the genotypes of a zygote and a gamete, respectively, and let be the probability that the gamete carries *i* pairs of IBDR alleles, then . Suppose that *h* is the number of alleles at target locus, then the probability that *G* produces g is the following weighted sum:(1)Where and are the numbers of copies of the allele (say ) in *G* and g, respectively, and is the number of IBDR allele pairs in g, , then , and .

The symbol in Equation (1) is a binary variable, which is used to limit the range of , so as to avoid some invalid situations. Note that the combination implies , and implies , or equivalently it implies and . Hence if , otherwise .

In Equation (1), for every *i* (), two events need consideration. First, is the event that g carries *i* pairs of IBDR alleles. Second, is the event that those *i* pairs of IBDR alleles are distributed into *h* kinds of alleles. Because both types of event are mutually exclusive, the products of the probabilities of the two events stated above are summed to obtain .

Each chromosome is duplicated during meiosis, so each allele in *G* becomes a pair of duplicated alleles. For the kind of allele, first we sample pairs from the pairs of duplicated alleles , then there are ways (where ), and so the first combination in the numerator of Equation (1) is obtained. We then sample the non-IBDR alleles to form g. The remaining is pairs of duplicated alleles , which still requires copies of . Then we sample pairs of duplicated alleles , and so there are ways (where ), which obtains the second combination in the numerator of Equation (1). Moreover, if we sample one copy of in each pair of the pairs of duplicated alleles, then there are altogether possible combinations. Because each sampling combination is independent for different kinds of alleles, the total number of combinations is the product .

In the following example, we consider the total number of combinations to produce a gamete consisting of *i* pairs of IBDR alleles without accounting for the specific genotypes *G* and g. Similar to the above process, we first sample *i* pairs from the v pairs of duplicated alleles, and these alleles will become the IBD alleles in g. We then sample pairs from the remaining pairs, and then one allele in each pair of the pairs is further sampled to form g. The total number of allele combinations is . Here the coefficient is eliminated in the fraction in Equation (1).

Equation (1) is a general expression of the transitional probability, and can be applied at multiallelic loci and at any even level of ploidy. As an example, the transitional probability from zygotes to gametes in octosomic inheritance at a biallelic locus is shown in Table 1. Another example is shown in Appendix A, which is the derivation of .

### Random chromosome segregation

The random chromosome segregation model ignores the crossover between the target locus and the centromere, and the rate of double-reduction (α) is equal to zero (Muller 1914). In the absence of crossing over, gametes may originate from any combination of homologous chromosomes, and two sister chromatids never sort into the same gamete (Parisod *et al.* 2010, Figure 1A). Therefore, genotypic frequencies concur with the HWE. Here, the HWE is expanded to account for polysomic inheritance, in which the alleles in a genotype are independent and randomly appear according to their frequencies. Thus, the genotypic frequency under the RCS can be expressed asWhere denotes the frequency of . For example, the frequency of a tetraploid genotype is .

When the conditions of the RCS are met, whenever , then Equation (1) can be simplified to

(2)### Pure random chromatid segregation

Pure random chromatid segregation accounts for crossing over and assumes the chromatids behave independently in meiotic anaphases, and randomly segregate into gametes (Haldane, 1930, Figure 1B). Then the probability that a zygote *G* produces a gamete *g* at a multiallelic locus can be derived by sampling chromatids (or allele copies) in a total number of chromatids (or allele copies). Because the number of chromatids is twice the number of chromosomes, the transitional probability can be modified from Equation (2) by duplicating the terms and v in those combinations as follows:(3)The value of alpha of PRCS can be derived by simulating the meiosis, first by sampling IBDR allele pairs out of v IBDR allele pairs, and then by segregating *i* IBDR allele pairs and non-IBDR alleles into the gamete. Therefore, the value of is given byThe values of alpha under PRCS, ranging from tetrasomic to dodecasomic inheritance, are shown in Table 2. The expected number of IBDR allele pairs in a gamete .

### Complete equational segregation

The *complete equational segregation* (CES) assumes that the whole arms of the two pairing chromatids are exchanged between pairing chromosomes (Mather 1935). In Figure 1C, the leftmost two chromosomes in the primary oocyte are paired in the long arms, as well as the rightmost two chromosomes. In Metaphase I, the chromosomes randomly segregate into the secondary oocytes. If the pairing chromosomes segregate in the same secondary oocyte, then the duplicated alleles may further segregate into a single gamete.

The production of double-reduction gametes requires the fulfillment of two conditions: (i) the chromosomes paired at target locus segregate into the same secondary oocyte, and (ii) the duplicated alleles segregate into the same gamete. In order to construct a mathematical model of CES, we simulate this meiosis, and the values of alpha will be derived by using a two-step process.

First, we model the probability, , that *j pairs of chromosomes paired at the target locus* (PCPs) segregate into a secondary oocyte. There are PCPs in the primary oocyte, so we have combinations to sample *j* pairs. Next, for the remaining PCPs, it still requires chromosomes that are not paired with each other at target locus during prophase I to form the secondary oocyte. We therefore (i) sample PCPs from the remaining PCPs (there are combinations), and (ii) sample one chromosome from each PCP in the previously sampled PCPs (there are chromosome combinations). The product of the above three numbers of combinations is divided by the total number of segregation modes in Metaphase I, *i.e.*, , to derive . The process described above can be summarized by the following expression:(4)Second, we model the probability, , that *i* pairs of duplicated alleles segregate into a single gamete. There are *j* PCPs in the second oocyte, and each has four alleles (*e.g.*, the alleles in both chromosomes are *A* and *B* in the top second oocyte of Figure 1C). There are four segregation modes for each PCP in Anaphase II, where two of them ( and ) produce gametes with IBDR alleles and the other two ( and ) pass non-IBDR alleles into the gamete. In order to produce a gamete consisting of *i* pairs of IBDR alleles, *i* PCPs are sampled and segregate IBDR alleles. Following the description above, we have combinations of sampling *i* PCPs that pass IBDR alleles into a gamete from *j* PCPs. Next, the remaining PCPs pass non-IBDR alleles, and so there are combinations. In each of the remaining chromosomes that are not paired with each other during prophase I, one of the two alleles segregates into the gamete, and hence there are allele combinations. The product of these numbers of combinations is divided by the total number of segregation modes in Metaphase II, *i.e.*, , to obtain the expression , *i.e.*, , which is the value of conditional on *j*, where . Because the events that the secondary oocyte contains different numbers of PCPs are mutually exclusive, their probabilities are weighted by to obtain the weighted sum, *i.e.*, , whose expression is as follows:(5)Using Equation (5), the values of alpha of the CES can be derived for different inheritance modes, and these are presented in Table 2. The example deriving the values of alpha under octosomic inheritance is given in Appendix B. The expected number of IBDR allele pairs in a gamete .

### Partial equational segregation

The *recombination fraction* (denoted by *r*) between two loci is defined as the ratio of the number of recombined gametes to the total number of gametes produced. The maximum recombination fraction for diploids is (Xu 2013). However, in autopolyploids, because there are PCPs, the maximum recombination fraction for an autopolyploid organism is , *e.g.*, for autotetraploids.

We used the *single chromatid recombination rate* (denoted by ) between the target locus and the centromere () to evaluate the degree of recombination in polysomic inheritance, which is the probability that an allele in a chromatid is exchanged with the pairing chromatid. Hence, *r* and can be converted to each other by the formula . Moreover, can be calculated by Haldane’s mapping function where *d* is the distance between the target locus and the centromere (the unit of length is the centimorgan).

In the CES model, the two alleles in a pair of pairing chromatids are assumed to be exchanged at a probability of 100%. This ‘ideal’ condition is unlikely to occur in nature. We thus usually consider that the maximum single chromatid recombination rate is %, and that the distance between the target locus and the centromere is extremely long. Although Mather (1935) incorporated the recombination rate into his model, assuming 100% recombination has been widely used in similar studies (*e.g.*, Butruille and Boiteux 2000; Wu *et al.* 2001).

We incorporated the single chromatid recombination rate into the CES to allow a greater generalization, which allows for the situation that only some fragments of the two pairing chromatids are exchanged between pairing chromosomes. Such a model is called the *partial equational segregation* (PES) model.

In the first step for CES, we model the probability that *j* PCPs segregate into a secondary oocyte, which remains unchanged in the PES.

Assuming that the alleles in each PCP are exchanged at a probability of between pairing chromatids and those *j* PCPs are exchanged independently, then the number of *exchanged PCPs* (EPCPs) is drawn from the binomial distribution, and so the probability that the second oocyte contains *k* EPCPs is(6)It is noteworthy that only the EPCPs are able to produce IBDR alleles under a PES model. Following the process of the second step for CES, we derive the alpha for PES by: (i) sampling *i* EPCPs from *k* EPCPs (there are ways to do this); (ii) sampling IBDR alleles from those *i* EPCPs (there are ways to do this); (iii) sampling non-IBDR alleles from the remaining EPCPs (there are ways to do this); (iv) sampling the remaining non-IBDR alleles from the remaining chromosomes (there are ways to do this). The product of the numbers of ways to do the above four steps is now divided by the total number of segregation modes in Metaphase II, *i.e.*, , to obtain the expression , *i.e.*, , which is the value of conditional on *k*, where . Moreover, the events that the secondary oocyte contains different numbers of EPCPs are mutually exclusive. Therefore, the weighted sum of their probabilities with as their weights are calculated to obtain under PES, whose expression is as follows:(7)where is given in Equation (6).

Using Equation (7), the values of alpha in the PES model for tetrasomic to dodecasomic inheritance can be derived, and the results are presented in Table 2. An example deriving the values of alpha under octosomic inheritance is also given in Appendix B. The expected number of IBDR allele pairs in a gamete is .

## The genotypic frequencies at equilibrium

Here we present two methods to derive the genotypic frequencies at a multiallelic locus for higher levels of ploidy. The first method, the non-linear method, is modified from Fisher’s (1943) method and uses the transitional probability from zygotes to gametes to account for the multiallelic loci. Because this method is computationally difficult for a ploidy level higher than six under current conditions, we develop an alternative method to derive the genotypic frequency up to and including decasomic inheritance. The second method, the linear method, uses the solution of the non-linear method at a biallelic locus as the initial solution, and at each step a novel allele is segregated from an existing allele and the genotypic frequencies are updated.

### Non-linear method

For this method, the total probability formula together with Equation (1) are used to derive the gamete frequencies and the multiplication theorem of probability is used to derive the zygote frequencies. If the genotypic frequencies reach equilibrium, they will not change subsequently, and the system of equations that we will construct can be solved by adding some constraints. In the following text, we denote GFG for *genotypic frequencies of gametes*, and GFZ for *genotypic frequencies of zygotes*. Our system of non-linear equations will be presented by the following two steps.

Simulating meiosis. Given a gamete g, if the genotypic frequencies reach equilibrium, then the GFG is determined by the total probability formula,

*i.e.*, the sum of the transition probabilities weighted by GFZ, symbolically(8)Here is taken from all zygote genotypes and can be obtained by Equation (1). Because there are gamete genotypes, Equation (8) represents equations.Simulating fertilization. The GFZ in the next generation is determined by the multiplication theorem of probability with GFG as the weight. Under equilibrium, the GFG and GFZ are constant across generations, then for a zygote

*G*, we can establish an equation as follows:(9)

where *G* is regarded as a multiset, is taken from all possible different subsets of *G* with elements and the symbol ∖ denotes the operation of set difference. Here, a *multiset* is a generalized set, whose elements can be repeated (*e.g.*, if ), it can be regarded as a multiset consisting of four elements, *i.e.*, , and its different subsets containing two elements are and . Moreover, if is regarded as , then . Because there are zygote genotypes, Equation (9) represents equations. If the double-reduction ratios in two parents are different, then the genotypic frequencies of egg and sperm should be treated separately in Equation (9), *e.g.*, and are replaced by the frequencies of an egg and a sperm , respectively, and their frequencies are calculated by substituting different double-reduction parameters into Equation (8).

The process that transforms the allele frequencies into the GFG can be described by a linear substitution in the sense that every allele frequency can be expressed as a linear combination of the GFG. By substituting the equations determined by Equation (9) into the equations determined by Equation (8), we obtain a system of non-linear equations with the GFG as unknowns. This system of equations together with those expressions of the linear combinations mentioned above still forms a system of non-linear equations with the GFG as unknowns and the allele frequencies as parameters. For a lower level of ploidy, the latter system of non-linear equations can be solved, whose solution is with parameters. An example is given in Appendix C, which shows how to find the solution of the latter system of non-linear equations (*i.e.*, how to derive the expressions of GFG with parameters) for tetrasomic inheritance at a triallelic locus.

In Appendix C, for the case of GFG, the whole gamete genotypes are and . We choose and as representatives of the genotype pattern, then each genotype can be classified into one of these patterns. The partial solution determined by the representatives is called the *generalized form* of the solution mentioned above, whose expression is given in Equation (A3). For the case of GFZ, the notion of generalized form can be similarly defined, whose expression is given in Equation (A4).

For each gamete with allele copies, the number of alleles *h* at the target locus should at least be to allow for all genotypic patterns to be displayed. For example, the full gamete heterozygote under octosomic inheritance can be observed at a tetra-allelic locus. Furthermore, if , the generalized form for all genotypic patterns of GFG can be obtained. Because the allele frequencies at each locus sum to unity, the number of degrees of freedom is equal to . The full gamete heterozygotes can therefore be expressed by allele frequencies.

Using the tetrasomic inheritance under RCS as an example, the frequency of a gamete can be solved at a biallelic locus, and the solution is . Moreover, its solution with the generalized form at a triallelic locus can also be obtained, with the answer being .

The number of equations determined by Equation (8) or Equation (9) increases exponentially with increasing levels of ploidy. For example, for the case of Equation (8), the number equals 6, 20, 70, 252 and 924 for tetrasomic inheritance to dodecasomic inheritance, while the corresponding sequence is 15, 84, 495, 3003 and 18564 for the case of Equation (9). The two numerical sequences increase by about 3.5 and 6.0 times for each level, respectively. Moreover, each expression also becomes more complex as the ploidy level increases. For the case of dodecasomic inheritance, it takes multiple gigabytes of space to store these expressions.

Therefore, the system of non-linear equations becomes computationally difficult for higher levels of ploidy. Using this method, we solved the GFG and GFZ for a maximum ploidy level of six on a workstation with two Intel Xeon E5 2696 V3 CPUs, which have 36 cores in total.

### Linear method

An alternative method to solve the genotypic frequencies at equilibrium is via the linear method. For a working example we focus on a protein-coding gene in a panmictic population at equilibrium with an infinite number of individuals. Suppose that there are three alleles with a unique DNA sequence, denoted by *A*, *B* and *C*, and let , and be their frequencies. Due to the degeneration of codons, two alleles (*e.g.*, *B* and *C*) may code for the same protein. If isozyme electrophoresis is used for genotyping, then these two alleles cannot be distinguished, so both of them will, for example, be typed as *B*. However, if DNA sequencing is used, both alleles can be distinguished.

Therefore, it can be inferred that the number of alleles of this gene depends on how it is typed, with both biallelic and triallelic states conforming to equilibrium assumptions (*i.e.*, large population size, random mating, no mutation, no selection and no migration) and genotypic frequencies should concur with the equilibrium.

This scenario can also be expressed as follows: (i) originally, the locus is biallelic and the distribution of genotypes is at equilibrium; (ii) at each step, one novel allele mutates from an existing allele, the mutation rate is constant among different copies of this allele in all genotypes, and the IBD alleles in a genotype mutate simultaneously; (iii) therefore, after the mutation at each step, the genotypic frequency will still concur with the equilibrium; (iv) the mutations are repeated until there are alleles to obtain the generalized form of GFG. The following deductions are therefore derived.

#### Deduction (i):

For both GFZ and GFG, the frequency of a genotype that contains only the unchanged alleles remains unchanged. In fact, the analytical expression of genotypic frequencies can be regarded as a function of the allele frequencies (refer to Equations (A3) and (A4) in Appendix C). Therefore, when a genotype contains only the unchanged alleles, the corresponding expression of the function is invariable in the process of mutation.

#### Deduction (ii):

For both GFZ and GFG, the summations of the frequencies before and after the mutation for each genotype that has the same composition of unchanged alleles are equal (*e.g.*, where the last digit in every subscript denotes the number of alleles observed). This fact can be interpreted as several genotypes being merged into one when the locus switches from a triallelic state to a biallelic state.

#### Deduction (iii):

For both GFZ and GFG, the ratio of frequencies of changed alleles in the genotypes that have the same composition of unchanged alleles is equal to the ratio of frequencies of changed alleles in the population. For example, if the changed alleles are *B* and *C*, we have and . Because the mutation rate is constant in different copies of *B* in all genotypes, a fixed proportion of *B* are mutated to *C*.

#### Deduction (iv):

For two genotypes with the same pattern, we can use the expression of one genotype frequency to derive that of the other. For example, if the analytical expression of the function is given, by replacing with , and with , it follows the expression of , that is .

According to Deduction (iv), if a generalized form for the GFG or GFZ is given, we can write down the whole expressions of the GFG or GFZ.

Using these deductions as stated above, the GFG can be solved step-by-step. At each step, one novel allele is separated from an existing allele. The procedure begins with a biallelic locus, which can be solved by the non-linear method. After the number of alleles reaches , the generalized form of GFG is obtained, and then the GFZ can be derived by using Equation (9).

Using hexasomic inheritance as an example, we assume that the GFG at a biallelic locus is solved by the non-linear method, *i.e.*, the expressions of , and are derived. If a new allele (*e.g.*, *C*) is segregated from *B*, we obtain via Deduction (i), and via Deduction (iii). Additionally, and can be derived from via Deduction (iv). Similarly, , , , and can be derived from . Furthermore, will be obtained from via Deduction (ii). Because the degrees of freedom are currently two, one more step is required. In this step, can be updated by Deduction (iii), and expressed by . Finally, and remain unchanged because of Deduction (i), and the remaining GFG will be obtained via Deduction (iv). The results of GFG and GFZ for hexasomic inheritance are presented in Appendix D.

The linear method can also be characterized by a system of linear equations. Using the first mutation in hexasomic inheritance as an example, the coefficient matrix A is established (see Table 3 for details, in which and *r* denote , and , respectively). The elements in b can be derived from the above four deductions. For example, the first seven equations of areThen the first seven elements in b are . For these seven equations, one is derived by Deduction (i), where one is number of unchanged genotypes; three are derived by Deductions (ii) and (iii), respectively, where 3 is the number of changed genotypes. Therefore, with the first three deductions, we are able to establish the core equations. Furthermore, by Deduction (iv) we can write down the remaining equations and the total number of equations is *h* times of the number of core equations.

The numbers of rows and columns of A are 21 and 10, respectively, which means that there are 21 equations and 10 unknowns (*i.e.*, the whole GFG after mutation, *e.g.*, and and so on.) in this system . There are thus many non-independent equations. The rank of A can be derived by symbolic calculation, whose value is 10, equal to the number of unknowns. Therefore, our system of linear equations has a unique solution: .

The numbers of rows and columns of A are respectively 39 and 28 for dodecasomic inheritance at a triallelic locus, and the rank of A is 27 by symbolic calculation. Unfortunately, because there are 28 unknowns, our system is underdetermined and has an infinite number of solutions.

Using this method, we are able to derive the GFZ and GFG from tetrasomic to decasomic inheritance. For the cases of tetrasomic and hexasomic inheritance, these results are identical to the solutions obtained from the non-linear method. Subsequent expressions become more complex with increasing levels of ploidy, so we only present the results for hexasomic inheritance in Appendix D.

As an alternative, these expressions are placed in a 660 KB C++ source code for a library, and the source code is available in our supplementary material. The phenotypic frequencies can then be calculated by taking the sum of the frequencies of all possible genotypes that yield this phenotype, which are also given in the source code.

### Inbreeding coefficient and heterozygosity

Double-reduction has a similar effect as inbreeding, both not only resulting in IBD alleles in the zygote but also slightly increasing homozygosity (Hardy 2016). Here, we assess the impact of inbreeding and double-reduction on the inbreeding coefficient. The concept of inbreeding is relative to a reference population, which by definition is without inbreeding or relatedness, with all alleles in the reference population defined as not being IBD (Lynch and Walsh 1998).

The *inbreeding coefficient* in polysomic inheritance is defined as the probability of sampling two IBD alleles from a genotype without replacement (Huang *et al.* 2015). In polyploids, gametes also have multiple alleles at a target locus, and the inbreeding coefficient and heterozygosity can also be applied to gametes to measure the degree of IBD or *identical-by-state* (IBS) allele pairs (Barone *et al.* 1995).

After fertilization, there are pairs of alleles in a zygote, where pairs of alleles are from one gamete, and pairs of alleles are from different gametes. The numbers of IBD allele pairs in both types of allele pairs are derived as follows.

The number of IBDR allele pairs in the allele pairs from the same gamete is the expected value λ of allele pairs, where . In the remaining allele pairs, the number of IBD allele pairs inherited from the parent is a proportion *F* of those remaining, where *F* is the inbreeding coefficient in zygotes of the previous generation. Therefore, there are pairs of IBD alleles in each gamete. Hence the inbreeding coefficient in the gamete is:(10)For the allele pairs from two different fertilizing gametes, the mating individuals in the presence of inbreeding may also share IBD alleles. The co-ancestry coefficient is used to measure the degree of kinship in mating individuals, where the *co-ancestry coefficient*, denoted by *θ*, is defined as the probability that two alleles, one randomly sampled from each individual, are IBD (Jacquard 1972). Then there is a value expected IBD allele pairs between fertilizing gametes.

From the above derivation, the number of pairs of IBD alleles in the zygote is in total . Therefore, the inbreeding coefficient for zygotes of the current generation is as follows:The formula is the transition function of inbreeding coefficients. If the genotypic frequencies reach equilibrium, these inbreeding coefficients will be constant across generations. By replacing with *F* in the transition function, the inbreeding coefficient *F* at equilibrium can be solved, whose expression is as follows:(11)For an outcrossed population, . Then, by substituting Equation (11) for Equation (10), the inbreeding coefficients *F* and *f* can be obtained, whose values at equilibrium for different double-reduction models are shown in the rightmost two columns of Table 2.

If the relationships between mating individuals can be classified into several types (*e.g.*, self: selfing, parent-offspring: backcross), then the co-ancestry coefficient *θ* can be derived by a weighted average of the co-ancestry coefficients of those types of relationships. An example to derive the expressions of these co-ancestry coefficients as well as the inbreeding coefficient *F* is given in Appendix E.

Following the modification of the definition of inbreeding coefficients, the *heterozygosity* in polysomic inheritance is defined as the probability of sampling two non-IBS alleles from a genotype without replacement (Hardy 2016). For the two alleles sampled, if they are IBD, then these alleles cannot be IBS; otherwise they are independent and are non-IBS alleles at a probability of . Let *H* be the heterozygosity of zygotes and *h* be that of gametes, then:(12)Where *F* is the inbreeding coefficient of zygotes and *f* is that of gametes. These expressions are identical to Nei’s (1977) estimator of inbreeding coefficients, *i.e.*, , where and are the observed and expected heterozygosities, respectively.

### Simulation

The numerical solutions of genotypic and phenotypic frequencies of gametes and zygotes calculated by computer simulations will be used to compare with the analytical solutions so as to validate our models.

A population is generated with its initial genotypic frequency drawn from the HWE. Because we do not store the specific genotype for each individual, but only the frequency of each genotype, the population size can be considered as infinite. To display all kinds of genotype patterns, the number of alleles *h* should be greater than or equal to the ploidy level. For example, the complete heterozygote in tetrasomic inheritance cannot be observed at a triallelic locus. For simplicity, suppose that the allele frequencies are uniformly distributed, *i.e.*, the frequency of any allele is equal to , so that the frequencies of genotypes or phenotypes with a same pattern are equal (*e.g.*, ).

The double-reduction models are simulated to calculate the gamete frequencies. First, for PRCS, the alleles in the zygotes are duplicated and then randomly segregate into four gametes. Second, for CES, the chromosomes are randomly paired and the alleles are exchanged between pairing chromosomes, and then the chromosomes randomly segregate into two secondary oocytes, finally the alleles within the same chromosomes randomly segregate into two gametes. Third, for PES, the chromosomes are randomly paired and the alleles are exchanged between paired chromosomes at a probability of , and the remaining steps are the same as CES.

For a zygote, there are many possible ways to generate gametes. We enumerate each possibility and calculate the weighted average of gamete frequencies according to the probability of each possibility. The frequencies of identical gametes produced from different zygotes are weighed again by the zygote frequencies so as to obtain the gamete frequencies in the population.

The gametes are randomly merged to simulate fertilization, then the zygote frequency in the next generation can be calculated. The population is reproduced for 100 generations to converge the genotypic frequencies. The simulation program can be found in the electronic supplementary materials.

There are many potential genotypes and phenotypes in a population, and it is thus impractical to show all of these frequencies in the manuscript. As an alternative, the phenotypic frequencies of zygote patterns are shown in Table 4, and the genotypic frequencies of gamete patterns are shown in Table 5. As a comparison, the corresponding analytical solutions of model predictions are also shown. It is clear from the data shown in Tables 4 and 5 that both of numerical and analytical solutions of phenotypic frequencies of zygotes are approximately equal and their differences are due to the rounding-off error, as are the genotypic frequencies of gametes obtained via different approaches.

## Data Availability

### Material S1

The appendices.

### Material S2

The simulation computer program to obtain the numerical solutions of genotypic and phenotypic frequencies of zygotes and the genotypic frequencies of gametes. The program can be run on Windows platforms and requires the .Net Framework V3.5 runtime library.

### Material S3

The C++ source code of the genotypic and phenotypic frequencies of polysomic inheritance under double-reduction. The prefix of functions is GFZ (genotypic frequencies of zygotes), GFG (genotypic frequencies of gametes), PFZ (phenotypic frequencies of zygotes), or PFG (phenotypic frequencies of gametes). An example that is the function in the second case (*i.e.*, ) in Equation (A4) is given by

`double GFZ4_iiij(double a1, double pi, double pj)`

`{`

`double pi2 = pi*pi;`

`return (8*(-1+a1)*pi2*(-3*a1+2*(-1+a1)*pi)*pj)/pow(2+a1,2);`

`}`

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. Supplemental material available at Figshare: https://doi.org/10.25387/g3.7883150.

## Discussion

### Double-reduction models

Here, we describe existing double-reductions models and extend them to the systems that account for any even level of ploidy. However, some of these models assume ideal situations that are unrealistic in nature. The CES model assumes that the single chromatid recombination rate is one at any locus, but the maximum value of is 50% if the locus is located extremely far from the centromere. We incorporate the single chromatid recombination rate into CES, and define this as the partial equational segregation (PES).

Although PES probably reflects more closely to natural patterns of occurrence and may thus more accurately explain real genotypic frequencies, other models (RCS, PRCS and CES) are still useful. Researchers are able to apply them to calculate the genotypic frequencies in their applications without estimating alpha. In contrast, the PES needs an additional parameter to calculate the genotypic frequencies, and thus has one more degree of freedom if is estimated from the same data. Although PES may perform better in calculating the likelihood from genotypic/phenotypic data, it is also penalized more via the AIC (Akaike information criterion, Akaike 1974) or the BIC (Bayesian information criterion, Schwarz 1978).

In addition, all of these models (RCS, PRCS, CES and PES) assume that the chromosomes must form multivalents during prophase I. However, meiosis of polyploids can be much more complex. Some allopolyploid species display a mode of inheritance intermediate between disomic and polysomic at some loci (segmental allopolyploids, Stift *et al.* 2008), and some autopolyploid species also form bivalent, univalent and other types of valent during meiosis (Qu *et al.* 1998; Lloyd and Bomblies 2016). The formation of different types of valent may influence the sterility of the gametes or seeds (Swaminathan and Sulbha 1959; Crowley and Rees 1968; Solís Neffa and Fernández 2000). Such a model can hardly be established if we consider so many factors, *e.g.*, fertility, proportion and double-reduction ratio for each kind of valent.

Regardless of how complex the nature of meiosis, different models will yield the same result: the IBDR alleles are in the fertile gamete. Therefore, all of these models can be incorporated into a generalized framework, *i.e.*, using the values of alpha to express a model. Meanwhile, comparing with RCS, PRCS and CES models, the number of degrees of freedom of such a model increases only by .

### Genotypic frequencies, inbreeding coefficient and heterozygosity

Two methods are proposed to solve the genotypic equilibrium under double-reduction at a locus with an arbitrary number of alleles. The first method is the non-linear method, which can theoretically be applied to any even level of ploidy. However, it becomes computationally difficult for organisms with high levels of ploidy because these organisms have too many potential genotypes (*e.g.*, there are 19494 non-linear equations in dodecasomic inheritance). Although the system of non-linear equations for a higher level of ploidy is computationally difficult and we can currently only solve the genotypic frequencies for hexasomic inheritance, it will be possible with more powerful computers and more advanced algorithms.

The second method is the linear method, which requires more steps but less computational power. This uses the solution obtained by the non-linear method at a biallelic locus as the initial solution, and at each step it is assumed that one novel allele mutates from an existing allele. Some novel genotypes will be separated from existing genotypes, while the genotypic frequencies will still concur with the equilibrium. Using the four deductions, the equations of the genotypic frequencies of gametes are initially established by the total probability formula, and then the equations of the genotypic frequencies of zygotes are established by the multiplication theorem of probability. Unfortunately, the constraints are insufficient to obtain a unique solution for ploidy levels greater than or equal to 12. Although this is not perfect, our method is more than adequate to answer most current research questions on this subject.

Double-reduction results in gametes carrying IBD alleles and slightly increased homozygosity (Hardy 2016). It has similar effects as inbreeding, and can cause a deviation in genotypic frequencies from the Hardy-Weinberg equilibrium (Luo *et al.* 2006). By modifying the definition of the inbreeding coefficient and heterozygosity in polysomic inheritance, the symbolic expressions measuring how double-reduction and inbreeding influence the inbreeding coefficient and heterozygosity are derived. We found that double-reduction can cause a maximum inbreeding coefficient being 0.0769 (in CES) under tetrasomic inheritance in an outcrossed population. For the same double-reduction model, the double-reduction ratio is increased while the inbreeding coefficient is reduced with increasing levels of ploidy (Table 2).

In the presence of double-reduction and/or inbreeding, the theoretical observed heterozygosity can be calculated from the inbreeding coefficient, and is identical to Nei’s (1977) estimator of the inbreeding coefficient. This means that the inbreeding coefficient in polysomic inheritance can still be estimated by previous estimators, the same as for other *F*-statistics, although some modifications should be made to account for the finite sample size (*e.g.*, Robertson and Hill 1984; Weir and Cockerham 1984).

### Applications of zygote frequencies

Genotypic frequencies can be used for many applications, such as for population genetics, molecular ecology, molecular breeding, and so on. Some of the applications of zygote frequencies are listed as follows.

Estimating the allele frequency from phenotypes: the probability that observing a genotype conditional on a phenotype can be derived from the GFZ and Bayes formula. By counting the number of copies of each allele in each possible genotype and using the conditional probability as a weight, the allele frequencies can be updated from an initial solution with an Expectation Maximization algorithm (

*e.g.*, De Silva*et al.*2005; Kalinowski and Taper 2006).Based on the allele frequency, many subsequent analyses can be performed. For example, genetic diversity analysis (

*e.g.*, Rousset 2008), genetic distance analysis (*e.g.*, Nei 1972), population differentiation analysis (*e.g.*, Cockerham 1973), analysis of molecular variances (Excoffier and Lischer 2010), principal coordination analysis (*e.g.*, Peakall and Smouse 2012), and hierarchy clustering (*e.g.*, Odong*et al.*2011).Population assignment and Bayesian clustering: by calculating the product of multilocus genotypic frequencies, the probability of randomly sampling an individual with certain multilocus genotypes from a population can be obtained (

*i.e.*, likelihood). The individual is assigned to the population with maximum likelihood (*e.g.*, Peakall and Smouse 2012). In the non-admixture model of Bayesian clustering, the population origin of each individual is randomly drawn from the populations according to the posterior probability that the individual originated from each population, and is updated iteratively in a Markov Chain Monte Carlo (MCMC) algorithm (Pritchard*et al.*2000). The posterior probability is calculated from the multilocus genotypic frequencies of the individual from each population by the Bayes formula (Pritchard*et al.*2000).Equilibrium test: as a Hardy-Weinberg equilibrium test, this test is able to evaluate whether the distribution of genotypes/phenotypes concurs with the equilibrium state for a particular double-reduction model. The observed number of each genotype/phenotype can be obtained by counts from the data, and the expected number can be obtained by the GFZ or PFZ. A Chi-square goodness-of-fit test or a Fisher’s

*G*test can be used for this application (*e.g.*, Rousset 2008).Linkage disequilibrium test: the contingency table of a pair of loci is established, with each row denoting a genotype of the first locus with each column denoting that of the second locus. The expected number in each cell under the hypothesis that the two loci are under linkage equilibrium is the product of the GFZ at these two loci (

*e.g.*, Rousset 2008).Estimating individual inbreeding coefficient, relatedness coefficient or kinship coefficient from phenotypes: method-of-moment estimators (

*e.g.*, Hardy and Vekemans 2002; Huang*et al.*2014) or the maximum-likelihood estimator (*e.g.*, Huang*et al.*2015) can be used to estimate each of these coefficients between possible genotypes, with the estimates being weighted according to the conditional probability in application (i) above.

### Applications of gamete frequencies

There are also some potential applications of the transitional probability from zygotes to gametes and the gamete frequencies, in additional to solve the values of alpha for PRCS (Equation (3)) and to derive the genotypic frequencies at equilibrium (Equation (8)).

Parentage analysis: the transitional probability that a parent or a pair of parents produce an offspring is used in the calculation of likelihoods of the following two hypotheses: (a) the alleged father is the true father of the offspring and (b) the alleged father is unrelated to the offspring and is randomly sampled from the population (Marshall

*et al.*1998; Kalinowski*et al.*2007). This transitional probability is a summation of the products of two gamete frequencies that form the offspring’s genotype, where the frequency is equal to if the mother’s genotype is known, or equal to if the mother’s genotype is unknown, in which , and denote the genotypes of the offspring, alleged father and true mother, respectively.Deriving the segregation ratio in mapping populations, or deriving the genotypic frequencies for populations which are not at equilibrium: we use an F2 population in tetrasomic inheritance as an example. Assuming the genotypes of F1 individuals are , using Equation (1), the gamete ratio is . From the multiplication theorem, the segregation ratio of the F2 population is

Estimating the values of alpha from mapping populations: by equating the observed segregation ratio with the expected value, a system of non-linear equations can be established. The values of alpha can be solved by the least-squares method.

Perform an equilibrium test or a linkage disequilibrium test: the deviation of the observed gamete frequencies from expected can be measured by either a or a

*G*statistic by using either a Chi-square test or a Fisher’s*G*-test, respectively. With the cumulative distribution function of a Chi-squared distribution, the significance can be calculated.

## Acknowledgments

We would like to thank the two anonymous reviewers for their suggestions and comments. This study was funded by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31020302), the Natural Science Foundation of China (31770411, 31730104, 31572278 and 31770425), and the Young Elite Scientists Sponsorship Program by CAST (2017QNRC001), the National Key Program of Research and Development, Ministry of Science and Technology (2016YFC0503200), and the Natural Science Basic Research Plan in Shaanxi Province of China (2018JM3024). DWD is supported by a Shaanxi Province Talents 100 Fellowship.

## Footnotes

Supplemental material available at Figshare: https://doi.org/10.25387/g3.7883150.

*Communicating editor: Y. Kim*

- Received January 26, 2019.
- Accepted March 20, 2019.

- Copyright © 2019 Huang
*et al.*

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.