Modeling X-Linked Ancestral Origins in Multiparental Populations

The models for the mosaic structure of an individual’s genome from multiparental populations have been developed primarily for autosomes, whereas X chromosomes receive very little attention. In this paper, we extend our previous approach to model ancestral origin processes along two X chromosomes in a mapping population, which is necessary for developing hidden Markov models in the reconstruction of ancestry blocks for X-linked quantitative trait locus mapping. The model accounts for the joint recombination pattern, the asymmetry between maternally and paternally derived X chromosomes, and the finiteness of population size. The model can be applied to various mapping populations such as the advanced intercross lines (AIL), the Collaborative Cross (CC), the heterogeneous stock (HS), the Diversity Outcross (DO), and the Drosophila synthetic population resource (DSPR). We further derive the map expansion, density (per Morgan) of recombination breakpoints, in advanced intercross populations with L inbred founders under the limit of an infinitely large population size. The analytic results show that for X chromosomes the genetic map expands linearly at a rate (per generation) of two-thirds times 1 – 10/(9L) for the AIL, and at a rate of two-thirds times 1 – 1/L for the DO and the HS, whereas for autosomes the map expands at a rate of 1 – 1/L for the AIL, the DO, and the HS.

There have been recently designed quantitative trait locus (QTL) mapping populations with either multiple parents to increase the genetic diversity of the founder population, or many intercross generations to improve the mapping resolution by accumulating historical recombination events. Some examples include the Collaborative Cross (CC) (Churchill et al. 2004), the advanced intercross lines (AIL) (Darvasi and Soller 1995), the heterogeneous stock (HS) (Mott et al. 2000), the diversity outcross (DO) (Svenson et al. 2012), and the Drosophila synthetic population resource (DSPR) (King et al. 2012).
The CC can be regarded as a set of eight-way recombinant inbred lines (RIL) by sibling mating, where eight founders of each line are permuted.
The genomes of individuals in QTL mapping populations are random mosaics of the founders' genomes. The QTL mapping generally necessitates the reconstruction of these genome blocks along two homologous chromosomes of a sampled individual from available genotype data. Such reconstruction is often performed under a hidden Markov model (HMM) with the latent state being the pair of ancestral origins at a locus, where the transition probability of ancestral origins between two loci, or the two-locus diplotype (two-haplotype) probabilities are required.
Modeling ancestral origins along a pair of autosomal chromosomes has been well developed recently. Broman (2012a) extended the approach of Haldane and Waddington (1931) from the two-way to four-and eight-way RIL by sibling mating and provided recipes for calculating autosomal two-locus diplotype probabilities numerically. Johannes and Colome-Tatche (2011) derived autosomal two-locus diplotype probabilities for the two-way RIL by selfing. Zheng et al. (2014) described a general modeling framework for ancestral origins that can be applied to autosomes in various mapping populations such as the RIL by selfing or sibling mating and the AIL.
A special treatment is required for modeling ancestral origins along a pair of X chromosomes. Haldane and Waddington (1931) derived the recurrence relations of the X-linked two-locus diplotype probabilities for the two-way RIL by sibling mating and the bi-parental repeated parent-offspring mating, and their closed form solutions for the final homozygous lines. Broman (2005) extended the solutions to the twoand three-locus haplotype probabilities for the two, four, or eight-way RIL by sibling mating. Broman (2012b) derived the X-linked two-locus haplotype probabilities in advanced intercross populations including the AIL, the HS, and the DO, assuming an infinitely large population size.
In this paper, we extend our previous work (Zheng et al. 2014) to model the ancestral origins along a pair of X chromosomes in a finite mapping population. This extension also builds on the theory of junctions in inbreeding (Fisher 1949(Fisher , 1954. A junction is defined as a boundary point of genome blocks on chromosomes where two distinct ancestral origins meet, and the boundary points that occur at the same location along multiple chromosomes are counted as a single junction. The map expansion is the expected junction density (per Morgan) on a maternally or paternally derived X chromosome, denoted by R m or R p , respectively. We denote by r mp the overall junction density along the XX chromosomes of a female, and it can be used as a measure of X-linked QTL mapping resolution (Darvasi and Soller 1995;Weller and Soller 2004).
The key feature of this extension is to account for the asymmetry between maternally and paternally derived X chromosomes because the latter did not experience any crossover events with Y chromosomes. We first present a model framework for X-linked ancestral origins, where the recurrent relations are derived for various junction densities including the map expansions R m and R p . Then, we derive the closed form solutions for these expected densities in mapping populations including the RIL by sibling mating, the AIL, the HS, the DO, and the DSPR; they are evaluated by forward simulation studies. Lastly, we discuss the model assumptions and the implications of the analytic results on haplotype reconstructions and breeding designs.
A MODEL FOR X-LINKED ANCESTRAL ORIGINS Assumptions and notation Consider a dioecious population with two separate sexes: homogametic females with sex chromosomes XX and heterogametic males with sex chromosomes XY. There are no recombination events between X and Y, and thus we ignore the pseudoautosomal regions on the XY chromosomes. As in most mammals and some insects (Drosophila), some flowering species, such as white campion (Silene latifolia), papaya (Carica papaya), and asparagus (Asparagus officianalis), have the XY sex determination system (Ming and Moore 2007). The dioecious population was founded in generation 0, and it has nonoverlapping generations. There are no natural or artificial selections since the founder population. The mating schemes of producing the next generation are random, and they may vary from one generation to the next. The assignments of offspring genders are assumed to be independent of mating schemes.
The ancestral origins along two homologous autosomes have been modeled as a continuous time Markov chain (CTMC) (Zheng et al. 2014). We extend the approach to account for the asymmetry of XX chromosomes, using superscript m (p) for maternally (paternally) derived genes or chromosomes. See Supporting Information, Table S1 for a list of symbols used in this paper. Let OðxÞ ¼ ðO m ðxÞ; O p ðxÞÞ be the ordered pair of the ancestral origins at location x along the two X chromosomes of a randomly sampled female. The ancestral origin process OðxÞ is assumed to follow a CTMC, where x is the time parameter in unit of Morgan. We assign a unique ancestral origin to the X chromosomes of each inbred founder, or to each X chromosome of each outbred founder. Multiple genes, within or between loci, are identical by descent (IBD) if they have the same ancestral origins. Let L be the number of possible ancestral origins that O m ðxÞ or O p ðxÞ may take. L may be less than the number of inbred founders if some male founders did not produce daughters to pass down their X chromosomes. For example, L ¼ 3 for the four-way RIL by sibling mating since one of the founder mating pairs produces only one son ( Figure 1A).
The L possible ancestral origins are assumed to be exchangeable, so that we focus on the changes of ancestral origins. See Figure 1B and the relevant part of Discussion on the exchangeability assumption. The initial distribution of Oð0Þ at the leftmost locus x ¼ 0 is specified by a mp ð11Þ, a probability that the two ancestral origins are the same (IBD) at a locus. Let a mp ð12Þ ¼ 1 2 a mp ð11Þ be the non-IBD probability. Given either IBD or non-IBD at the locus, the ancestral origin pair Oð0Þ takes each of the possible combinations with equal probability.
The transition rate matrix of the CTMC can be constructed from the expected densities J mp ðabcdÞ of all the junction types ðabcdÞ along the two X chromosomes of a female. The junction type ðabcdÞ denotes the four-gene IBD configuration ðabcdÞ on both sides of a junction, where ab (cd) is on the left-hand (right-hand) side, haplotype ac (bd) is on the first (second) chromosome, and the same integers denote IBD. Figure 1C illustrates the seven types of junctions: ð1112Þ, ð1121Þ, ð1122Þ, ð1211Þ, ð1213Þ, ð1222Þ, and ð1232Þ for L $ 3, where the two types ð1213Þ and ð1232Þ do not exist for L ¼ 2. We do not define junction types for the eight two-locus configurations ð1111Þ, ð1123Þ, ð1212Þ, ð1221Þ, ð1223Þ, ð1231Þ, ð1233Þ, and ð1234Þ, because there are either zero or no less than two junctions between the two loci. Figure 1D shows the transition rate matrix of the CTMC in the four-way RIL by sibling mating. Figure 1E shows the relationships between the expected densities J mp ðabcdÞ and the transition rates, and they are derived based on the interpretation that J mp ðabcdÞDd is the two-locus diplotype probability, in the limit that the genetic distance Dd (in Morgan) between two loci goes to zero.
The map expansions R m and R p and the overall expected junction density r mp are given by R m ¼ J mp ð1121Þ þ J mp ð1122Þ þ J mp ð1222Þ þ J mp ð1232Þ; (1) similar to those for autosomes (Zheng et al. 2014) except that R m 6 ¼ R p for X chromosomes. We have J mp ð1112Þ ¼ J mp ð1211Þ and J mp ð1121Þ ¼ J mp ð1222Þ, since the junction densities do not depend on the direction of chromosomes. In contrast to the singlelocus two-gene non-IBD probability a mp ð12Þ, the ordering of the superscripts in J mp ðabcdÞ generally does matter, that is, J mp ðabcdÞ 6 ¼ J pm ðabcdÞ except for the junction type ð1122Þ. In addition, we have J mp ð1213Þ ¼ J pm ð1232Þ (see Figure 1C). Thus, the CTMC of X-linked ancestral origins can be described by one non-IBD probability a mp ð12Þ and the five expected junction densities R m , R p , J mp ð1122Þ, J mp ð1232Þ, and J pm ð1232Þ, under the exchangeability assumption of the L possible ancestral origins.

Single-locus non-IBD probabilities
The calculation of the expected junction densities necessitates the introduction of the probabilities for the two-and three-gene IBD configurations at a single locus. All the following derivations of the recurrence relations for these probabilities are based on the Mendelian inheritance of X-linked genes: a paternally derived gene must be a copy of the maternally derived gene in a male of the previous generation, and a maternally derived gene has equal probability of being a copy of either the maternally derived gene or the paternally derived gene in a female of the previous generation. In a dioecious mapping population, the single-locus two-gene probabilities of IBD configuration ðabÞ depend on whether or not the two homologous genes are in a single individual. Thus, we denote by b mm ðabÞ, b mp ðabÞ, and b pp ðabÞ the two-gene probability of IBD configuration ðabÞ, given that the two homologous genes are in two distinct individuals in generation t and have parental origins mm, mp, and pp, respectively ( Figure 2A); it holds that b pm ðabÞ ¼ b mp ðabÞ.
The recurrence relations of the two-gene non-IBD probabilities are derived by tracing the parental origins of two homologous genes from generation t $ 1 into the previous generation, and they are given by where equation (4d) holds immediately after one generation of random mating, although it may not hold in the founder population at t ¼ 0. In equation (4a), the first term on the right-hand side refers to the scenario that the two genes with parent origins mm in gen-eration t come from a single female of the previous generation with the probability s m t , and with probability 1=2 that they come from different genes of the female. In equation (4b), the two genes with parental origins mp cannot merge because they must come from one male and one female of the previous generation. In equation (4c), the two genes with parental origins pp in generation t come from a single male of the previous generation with the probability s p t ; if so, they must merge because there is only one X chromosome in a male.
We introduce the single-locus three-gene probabilities of IBD configuration ðabcÞ. Let b mmm ðabcÞ, b mmp ðabcÞ, b mpp ðabcÞ, and b ppp ðabcÞ be the probabilities of IBD configuration ðabcÞ, given that the three homologous genes are in three distinct individuals in generation t and have parental origins mmm, mmp, mpp, and ppp, respectively ( Figure 2B). Similarly, we define a mmp ðabcÞ and a mpp ðabcÞ for three homologous genes in two distinct individuals. The ordering of the superscripts does not matter for these three-gene probabilities, The recurrence relations of the three-gene non-IBD probabilities are derived by tracing the parental origins of three homologous genes from generation t $ 1 into the previous generation, and they are given by ! (5a) Figure 1 The continuous time Markov chain (CTMC) of X-linked ancestry blocks in the four-way recombinant inbred lines (RILs) by sibling mating. (A) One realization of ancestry blocks in the four-way RIL with generation up to F 3 . The sex chromosomes of the four inbred founders are represented by different colors and labeled as A, B, C, and D. The short bars denote Y chromosomes. The ancestral origin D is impossible in the X chromosomes of generation t $ 1. (B) Evaluation of the exchangeability assumption by one-locus genotype probabilities. The gray dashed line refers to the average genotype probability for one particular non-IBD genotype AB, AC, or BC; the black dashed line is for one particular IBD genotype AA, BB, or CC. Note that the ancestral origins A and B are exchangeable, but the ancestral origin C is not exchangeable with either A or B. (C) Schematics of the seven junction types along the maternally (left) and paternally (right) derived X chromosomes. (D) The rate matrix of the CTMC for the four RILs in (A). The diagonal elements are given so that row sums are zero. The rate matrix is determined by the seven basic rates, each corresponding to one of the seven junction types. The subscripts of the basic rates denote the IBD (1) or non-IBD (0) states on the left-and right-hand sides of the junctions, and the rates with superscript Ã refer to the transitions on the paternally derived chromosome. (E) The general relationships between the basic rates and the expected densities for the seven types of junctions, with L ¼ 3 for the four-way RIL in (A).
where q m t is the coalescence probability of three maternally derived genes in generation t that a particular pair of genes come from a single female of the previous generation and the third comes from another female of the previous generation, and similarly q p t for three paternally derived genes. The equations (5e, 5f) hold immediately after one generation of random mating, although they may not hold in the founder population at t ¼ 0.
The derivations of the recurrence equations (5a-5d) for the threegene non-IBD probabilities are similar to equations (4a-4c) for the two-gene non-IBD probabilities. In equation (5a), the pre-factor 3 denotes that each of the three possible pairs of genes may come from a single female of the previous generation; the term ð1 2 s m t 2 2q m t Þ is the probability that the three maternally derived genes in generation t come from three distinct females of the previous generation, and it is obtained by the probability 1 2 s m t that one pair of genes come from two distinct females minus the probability 2q m t that the third gene and either gene of the pair come from a single female of the previous generation. Similarly, the term ð1 2 s p t 2 2q p t Þ in equation (5d) is the probability that the three paternally derived genes in generation t come from three distinct males of the previous generation.

Expected junction densities
We derive the recurrence relations for R m , R p , J mp ð1122Þ, J mp ð1232Þ, and J pm ð1232Þ. The recurrence relation for R m follows from the theory of junctions (Fisher 1954): a new junction is formed whenever a recombination event occurs between two X chromosomes that are non-IBD at the location of a crossover. The recurrence relations for the map expansions R m and R p are given by where equation (6b) follows directly from no recombination events occurring between the XY chromosomes in a male of the previous generation.
To measure differential map expansions between maternally and paternally derived chromosomes, we define , and their recurrence relations are given by according to the recurrence equations (6a, 6b). If there are equal numbers of males and females in the population, a randomly chosen X chromosome is maternally derived with probability 2=3, and it is paternally derived with probability 1=3. Thus R X t can be interpreted as the map expansion on a randomly chosen X chromosome. Figure 2 Schematics of (A) the probabilities of the twogene IBD configurations, (B) the probabilities of the three-gene IBD configurations, and (C) the expected junction densities. Circles denote females, and dashed rectangles for males or females. Black vertical lines denote the maternally derived X chromosomes, and gray vertical lines for the paternally derived. Dots denote genes on chromosomes.
For comparisons, we denote by R A t the map expansion on a random chosen autosome, and and its recurrence relation is given by (MacLeod et al. 2005;Zheng et al. 2014) where a AA t ð12Þ refers to the non-IBD probability between two homologous autosomal genes in an individual. The equations (7a, 8) show that the map expansion R X t for an X chromosome is two-thirds R A t for an autosome if the non-IBD probability a AA t ð12Þ for autosomes is the same as a mp t ð12Þ for XX chromosomes, and the sex ratio is 1.
In addition to J mp t ðabcdÞ and J pm t ðabcdÞ, we define K mm t ðabcdÞ, K mp t ðabcdÞ, K pm t ðabcdÞ, and K pp t ðabcdÞ for haplotypes ac and bd that are in two distinct individuals and have parental origins mm, mp, pm, and pp, respectively ( Figure 2C). The contributions to the junctions in the current generation come from either the existing junctions at the previous generation, or a new junction via a crossover event. In the following, we focus on the formation of a new junction, because the contributions of the existing junctions in the previous generation are similar to those for the two-gene non-IBD probabilities in the recurrence equations (4a-4c).
The schematics of the recurrence relations for junction types ð1232Þ and ð1122Þ are shown in Figure S1. The ancestry transitions of type ð1122Þ occur on both haplotypes ac and bd at exactly the same location, and thus a new junction of type ð1122Þ can be formed only by duplicating a chromosome segment. It holds that J mp t ð1122Þ ¼ J pm t ð1122Þ and K mp t ð1122Þ ¼ K pm t ð1122Þ because of the symmetry of type ð1122Þ. We have ! ; (9a) for t $ 1, where equation (9d) may not hold in the founder population at t ¼ 0, the first term on the right-hand side of equation (9a) refers to the scenario that both haplotypes ac and bd come from a single female of the previous generation, and the first term on the right-hand side of equation (9c) refers to the scenario that both haplotypes are the duplicated copies of the maternally derived X chromosome in a male of the previous generation ( Figure S1A). According to equations (6a, 6b) and equations (9a-9d), the overall expected density r mp in equation (3) does not depend on the threegene non-IBD probabilities.
The ancestry transition of type ð1232Þ occurs on haplotype ac. A new junction of type ð1232Þ is formed whenever the two parental chromosomes of haplotype ac and the parental chromosome of haplotype bd are distinct and have the IBD configuration ð123Þ at the location of the crossover. We have i ' ; (10a) i ; (10b) and R 2 t measure the asymmetry between maternally and paternally derived X chromosomes.

Model evaluation by simulations
To evaluate the theoretical predications of non-IBD probabilities and expected junction densities, we perform simulation studies with the same model assumptions: random mating with discrete generations, no natural selections, and no genetic interferences, except that the ancestral origins along chromosomes do not follow Marker assumptions. Instead, the genome ancestral origins are simulated forwardly by first generating a pedigree according to a given breeding design, and then dropping on the pedigree the distinct founder genome labels (ancestral origins) that are assigned to the whole X chromosomes of each complete inbred founder or to each X chromosome of each outbred founder. The X chromosomes of each descendant gamete are specified as a list of the labeled segments determined by chromosomal crossovers.
For a mapping population with the particular breeding design, the realized junction densities and IBD probabilities are saved for all individuals in each generation in each simulation replicate, and they are averaged over in total 2 · 10 4 replicates. Various mating schemes are used in simulating breeding pedigrees. We denote by RM1 the random mating where each sampling of two randomly chosen individuals with opposite genders produces one offspring, and RM2 the random mating where each sampling of two randomly chosen individuals with opposite genders produces two offspring. We combine these mating schemes with -NE if each parent contributes a Poisson distributed number of gametes to the next generation, and -E if each parent contributes exactly two gametes. Thus, we have four random mating schemes, RM1-NE, RM1-E, RM2-NE, and RM2-E. The sibling mating belongs to RM2-E with population size 2, and the exclusively pairing in 2 n -way (n $ 1) crosses can be regarded as a special case of random mating without inbreeding. The genders are assigned randomly, independent of mating schemes.

Multistage populations
For mapping populations with stage-wise constant mating schemes, we derive analytic expressions of the non-IBD probabilities and the expected junction densities for constructing CTMC of X-linked ancestral origins, according to the recurrence relations. The closed form solutions are obtained by linking results of each subsequent stage via the initial conditions. The general results for a population with constant random mating are derived in Appendix A, where three scenarios are considered: finite population of size $6, sibling-mating population of size 2, and large population of size »6. Table S2 gives the coalescence probabilities of X chromosomes for various mating schemes, similar to Table 1 of Zheng et al. (2014) for autosomes. Table S3 summarizes the results for X chromosomes in a siblingmating population, and Table S4 for autosomes; they are necessary for dioecious breeding populations with a stage of inbreeding by sibling mating such as the CC and the DSPR. We use the superscripts of A denoting the quantities for autosomes.
We derive the analytic expressions of a mp t ð12Þ, R X t , R m t , J mp t ð1122Þ, J mpþ t ð1232Þ, and J mp 2 t ð1232Þ in the mapping populations of the RIL, the AIL, and the DO, and they are given in Table 1, Table 2, and Table  3, respectively. These results are necessary for constructing the CTMC of ancestral origins along the XX chromosomes of a female; only the expression of R m t is needed for the maternal derived X chromosome of a male. For comparisons, the autosomal results for a AA t ð12Þ, R A t , J AA t ð1122Þ, and J AA t ð1232Þ are included. The results for the AIL, the DO, and the DSPR are derived under the assumption of a large population size in the intercross stage. We evaluate this assumption in the DSPR, because the evaluation results hold similarly for the AIL and the DO. In addition, the map expansions R X t and R A t are given explicitly under the assumption of an infinitely large intercross population size, which may be used as a simple measure of QTL mapping resolution.
Many breeding populations can be divided into three stages: mixing, intercross, and inbreeding, such as the RIL by sibling mating, the CC, and the DSPR. There is no inbreeding stage for the AIL, the HS, and the DO. We denote by U the number of intercross generations, V the number of inbreeding generations, and N the intercross population size. Let M F and M I denote the random mating schemes for mixing and intercross stages, respectively. We choose the mixing stage to consist of one generation of random mating, so that the non-IBD probabilities and the expected junction densities in the F 1 population do not depend on whether genes or haplotypes are in distinct individuals.
n Table 1 Results for X chromosomes in the 2 n -way RIL by sibling mating in the last generation Theoretical Prediction (A) 2 ways sibling a mp g ð12Þ The eigenvalues l 1 ¼ ð1 þ ffiffiffi 5 p Þ=4 and l 2 ¼ ð1 2 ffiffiffi 5 p Þ=4. The map expansions R p Uþ1 and R m Uþ1 are given by equations (11a, 11b). The conjugate is given by replacing ffiffiffi 5 p with 2 ffiffiffi 5 p from the terms involving l 1 . For example, the conjugate term for R X g in (A) is given by 2ð20 2 8 ffiffiffi The general derivation procedure is as follows. First, we derive the initial conditions in the F 1 population for the intercross stage, according to the genetic compositions of the founder population F 0 . Second, we substitute the obtained initial conditions into the theorems of Appendix A3 under the assumption of a large intercross population size. Alternatively, the theorems of Appendix A1 may be used for a finite intercross population. Lastly, if there is a stage of inbreeding by sibling mating, we substitute analytic expressions in the F Uþ1 population into the theorems of Appendix A2 to obtain the results in the last generation g ¼ U þ V þ 1.

RIL
The 2 n -way RIL by sibling mating can be regarded as a three-stage mapping population without the intercross stage for n # 2. All the founders are fully inbred, and the intercross mating scheme is exclusively pairing so that inbreeding is completely avoided. Thus and , where the indicator d n$2 ¼ 1 if n $ 2 and 0 otherwise, since the two maternally derived genes at t ¼ 1 must come from the inbred female founder for the two-way RIL.
Substituting the initial conditions in the F Uþ1 population into Table S3, we obtain the results for the RIL in the last generation t ¼ U þ V þ 1 shown in Table 1. The non-IBD probabilities a mp t ð12Þ for X chromosomes are the same as those for autosomes (Table 2 of Zheng et al. 2014). Thus, we show analytically that the map expansion R X for the X chromosome is two-thirds that of the autosome for the 2 n -way (n $ 1) RIL, according to equations (7a, 8). Broman (2012a) has verified this two-thirds rule via Maxima for the 2 n -way RIL up to n ¼ 98. Figure 3 shows that these theoretical predictions fit very well with the forward simulation results for the two-and eight-way RIL by sibling mating. The differential densities R 2 t and J mp 2 t ð1232Þ decay very fast with generation t and show some oscillations in the beginning generations. The overall expected junction density r mp t reaches the maximum in the same generation for autosomes.

AIL
We consider a multiparental AIL population that is founded by L=2 inbred females and L=2 inbred males. A unique ancestral origin is assigned to each inbred founder's genomes so that the two-gene non-IBD probabilities a  Table 2 Results for the AIL in the last generation g ¼ U þ 1
The F 1 population of size N is produced by mating scheme M F ¼ RM1-NE or RM2-NE. According to Table S2, the coalescence probabilities s m 1 ¼ s p 1 ¼ 2=L and q m 1 ¼ q p 1 ¼ ð2=LÞð1 2 2=LÞ for mating scheme RM1-NE, and they hold approximately for RM2-NE with large population size N » 6. Thus, the two-gene non-IBD probabilities at t ¼ 1 are given by b mm 1 ð12Þ ¼ b pp 1 ð12Þ ¼ 1 2 2=L and a mp 1 ð12Þ ¼ b mp 1 ð12Þ ¼ 1 according to the recurrence equations (4a-4d), and the three-gene non-IBD probabilites at t ¼ 1 are given by b mmm ¼ 1 2 2=L according to the recurrence equations (5a-5f). In addition, no junctions can be formed from inbred founders so that it holds that R m The F 1 population is maintained for U generations with constant size N and sex ratio 1. Assuming that the intercross population size is large (N » 6), all the two-and three-gene coalescence probabilities at t $ 2 are approximately equal and are denoted by s, and they are determined by the intercross mating scheme M I according to Table  S2. Substituting the initial conditions in the F 1 population into the theorems of Appendix A3, we obtain in Table 2 the results for X chromosomes in the AIL in the last generation t ¼ U þ 1. Table 2 also shows the results for autosomes, which are derived according to Zheng et al. (2014). Table 2, the non-IBD probabilities a mp t ð12Þ for Xchromosomes are unequal to those for autosomes, and thus the map expansions generally do not satisfy the two-thirds rule. According to the map expansions R X t and R A t in Table 2, we derive their approximations under the limit of an infinitely large population size (N /N) so that the coalescence probability goes to zero (s /0),

As shown in
where the last two terms for R X t in Table 2 are small and thus ignored. The equations (12a, 12b) show that the two-thirds rule is approximately valid for a large number L of founder lines. The map expansion of equation (12b) for L ¼ 2 is consistent with the previous results (Darvasi and Soller 1995;Liu et al. 1996;Winkler et al. 2003;Broman 2012b).
The left panels of Figure 4 show for the AIL that the theoretical predictions fit very well with the forward simulation results, where M F ¼ RM1-NE, M I ¼ RM1-E, L ¼ 8, and N ¼ 100. Within U ¼ 20 intercross generations, the non-IBD probability a mp t ð12Þ decreases slowly with generation t, the differential map expansion n Table 3 Results for the DO in the last generation g ¼ U þ 1

Quantity
Theoretical Prediction The eigenvalues l 1 ¼ 1 2 s=3 and l 4 ¼ 1 2 s for X chromosomes, and for autosomes l A 1 ¼ 1 2 s A =2 and l A 4 ¼ 1 2 3s A =2. DO, diversity outcross. R 2 t remains almost constant after a few generations of oscillations, and the map expansions in equations (12a, 12b), shown as thick red lines in Figure 4, are very good approximations.

HS and DO
The HS and the DO differ from the AIL only in the genetic compositions of the founder population. The N progenitors of the DO at t ¼ 0 were sampled independently from pre-CC lines at a variety of different generations. Each pre-CC line is produced by the RIL by sibling mating from L ¼ 8 randomly permuted founder strains. Let q k denote the proportion of the pre-CC progenitors that were in generation k. Thus, for a random progenitor, it holds a mp 0 ð12Þ ¼ ¼ ð1 2 1=LÞð1 2 2=LÞ, and because recombination crossovers are independent among different pre-CC lines, the between-individual expected junction densities at t ¼ 0 are given by K mm 2 2=LÞ, where 1 2 2=L refers to the probability that the third ancestral origin on haplotype bd is different from the two ancestral origins on haplotype ac where the ancestry transition occurs. The within-individual expected junction densities at t ¼ 0 are not required in the following derivations.
The F 1 population of size N is produced by random mating with equal sex ratio. Assuming that the population size N » 6, the coalescence probabilities at t ¼ 1 are approximated to be zero. According to the recurrence equations for the two-and three-gene non-IBD probabilities, the between-individual probabilities did not change and the within-individual non-IBD probabilities at t ¼ 1 equal to the corresponding between-individual probabilities. In addition, we have R m Similar to the intercross stage of the AIL, we obtain in Table 3 the results for X chromosomes in the DO in the last generation t ¼ U þ 1 by substituting the initial conditions in the F 1 population into the theorems of Appendix A3. Table 3 also shows the results for autosomes, which are derived according to Zheng et al. (2014). Under the limit of an infinitely large population size (N/N), we obtain from Table 3 R showing that the two-thirds rule is valid under such an approximation since a mp 0 ð12Þ ¼ a AA 0 ð12Þ and R X 0 ¼ 2R A 0 =3 for progenitors drawn from the RIL ( Table 1). The map expansion in equation (13b) for L ¼ 8 is the same as the one obtained by Broman (2012b).
The right panels of Figure 4 show for the HS that the theoretical predictions fit very well with the forward simulation results, where M F ¼ M I ¼ RM1-E, the N ¼ 100 individuals in the F 0 population were sampled independently from CC funnels at the same generation t ¼ 3. The results are similar to those for the AIL with the same L shown in the left panels of Figure 4. For X chromosomes, the non-IBD probabilities in the DO are larger than those in the AIL, and thus in the DO the map expands at a higher rate than that for the AIL, see equations (12a, 13a). Figure 3 Results of the 2 n -way recombinant inbred lines (RILs) with by sibling mating for n ¼ 1 (left panels) and n ¼ 3 (right panels). The filled symbols refer to the results for X chromosomes, the empty symbols for autosomes, and lines for the theoretical predictions in Table  1. The non-IBD probabilities a mp t ð12Þ for X chromosomes and autosomes are overlapped with each other. The brown filled diamonds refer to J mp 2 t ð1232Þ in (C) and (D) and r DSPR The DSPR RILs were derived from two synthetic populations, each created independently by adding the multiparental AIL with an inbreeding stage by sibling mating (King et al. 2012). For example, we derive the analytic expressions of the map expansions in one synthetic population with L founder strains. We assume that b mm Uþ1 ð12Þ ¼ b mp Uþ1 ð12Þ, which holds in a non-inbreeding population and approximately in a large population (e.g., N $ 100) with a large number of intercross generations (e.g., U $ 6). According to the map expansions in Table S3, we have " 6 2 15 þ 7 ffiffi ffi 5 p , and R X Uþ1 and a mp Uþ1 ð12Þ are given in Table 1, Table 2, or Table 3 if the F Uþ1 population is the last generation of the RIL, the AIL, or the DO, respectively.
We evaluate the large size assumption for various random mating schemes by simulation studies of the DSPR. Figure 5 shows the fitting of the theoretical predictions with the forward simulation results for the intercross size N ¼ 20, 50, and 100, where the mating schemes M F ¼ RM1-NE and M I ¼ RM1-E (RM1-NE) for the left (right) panels. The theoretical predictions are obtained by combining the results for the AIL (Table 2) with those for the sibling-mating pop-ulation (Table S3), assuming the large size (N » 6). The relative worse fitting for the differential densities R 2 t and J mp 2 t ð1232Þ is probably attributable to the limited number (2 · 10 4 ) of simulation replicates. The theoretical fitting becomes improved with increased size N, and it is very good for N = 100 within the range of U = 20 intercross generations. The fitting for RM1-E is better than RM1-NE because in the former case the two-gene coalescence probabilities are always equal to the three-gene probabilities (Table S2), independent of the size N. Figure S2 shows similar results for the random mating scheme RM2, except that the expected junction densities are slightly smaller. Figure S3 and Figure S4 show that the large size assumption is less sensitive for autosomes, and the fittings are very good even for N = 20.

DISCUSSION
We have extended our previous framework of modeling ancestral origin processes from autosomes to X chromosomes, and thus the same assumptions such as exchangeability of ancestral origins, Markov properties and random mating also apply (Zheng et al. 2014). The deviations from Markov properties result in larger variances in the IBD-tract length and the junction densities, which have been shown to be acceptable (Chapman and Thompson 2003;Martin and Hospital 2011). The random mating indicates that our approach does not apply to breeding populations with marker-assisted selections.
In contrast to the previous approaches (Haldane and Waddington 1931;Broman 2012a), the exchangeability assumption of ancestral origins greatly reduces model complexity, because the number of possible junction types does not depend on the number of founders for L $ 3 whereas the number of diplotype states increases very fast with L. The assumption affects the rate matrix of the Markov model, but not the expected junction densities where only changes of ancestral origins matter. The exchangeability is a good approximation for the AIL-or the multiparent advanced generation inter-cross (i.e., MAGIC)-type Figure 4 Results of the AIL (left panels) and the HS (right panels) with L = 8 and N = 100. The random mating schemes M F = RM1-NE for the AIL and RM1-E for the HS, and M I = RM1-E for both populations. The symbols and lines are the same as those in Figure 3. The theoretical predictions refer to Table 2 for the AIL and Table 3 for the DO. The additional red lines denote the map expansions under the large size assumption, given by equations (12a, 12b) for the AIL and equations (13a, 13b) for the HS.
populations with random mating, but it does not hold for the multiway RIL by sibling mating.
However, the exchangeability assumption is not critical for the application of our results to haplotype reconstructions from genotype data. The genomes of the individuals collected in the last generation have been well mixed by random chromosomal segregations over many generations. This is demonstrated in Figure 1A for the four-way RIL by sibling mating, where a female A and a male B was crossed, and a female C and a male D was crossed, and then a daughter from A · B and a son from C · D was crossed. The X chromosome of the founder D is lost in F 1 . The genotype probabilities for AB and AC are different and given in the Table 2 of Broman (2012a), although the sum of the genotype probabilities for AB, AC, and BC is equal to a mp t ð12Þ in Table 1. Figure 1B shows that the genotype probability for AB or AC becomes close to the average probability a mp t ð12Þ=3 as generation t increases. Furthermore, in the beginning generations when the asymmetry among ancestral origins is large, there are fewer number of recombination breakpoints, and thus more marker data per genome block are available to estimate ancestral origins. As a result, a priori equal weights of ancestral origins have little effects.
An HMM is under development for reconstructing ancestral origins for both autosomes and X chromosomes from marker data, using the present model and the previous one (Zheng et al. 2014) as the prior distribution. The previously implemented HMM methods, such as GAIN (Liu et al. 2010) and HAPPY (Mott et al. 2000), were developed for autosomes, and they do not account for the asymmetry between maternally and paternally derived X chromosomes.
The closed form expressions for non-IBD probabilities and various expected junction densities have been derived for stage-wise mapping populations. They provide the complete information for constructing the CTMC along two X chromosomes but also the guides for designing a new population in terms of X-linked QTL mapping resolutions. For advanced intercross populations such as the AIL, the HS, and the DO under the assumption of a large intercross size, the map expands linearly at a rate proportional to the inverse of the number L of inbred founders, which is robust to intercross mating schemes. For the RIL and the inbreeding stage of the DSPR, the map expansion slows down with increasing level of inbreeding. The overall junction density r mp for the DSPR decreases after one generation of the inbreeding stage by sibling mating, whereas for the RIL it reaches the maximum in the middle of inbreeding by sibling mating. These conclusions can also be applied to autosomes. Thus the most effective way of improving mapping resolutions is to increase the number U of intercross generations in a large population (N $ 5U, empirically).

ACKNOWLEDGMENTS
I thank George O. Agogo, Rianne Jacobs, Martin P. Boer, Fred A. van Eeuwijk, and the two anonymous reviewers for their helpful comments. This research was supported by the Stichting Technische Wetenschappen (STW) -Technology Foundation, which is part of the Nederlandse Organisatie voor Wetenschappelijk Onderzoek -Netherlands Organization for Scientific Research, and which is partly funded by the Ministry of Economic Affairs. The specific grant number was STW-Rijk Zwaan project 12425.

RESULTS FOR CONSTANT RANDOM MATING POPULATIONS
We introduce some matrix-vector notations to facilitate the derivations. Denote by A⊙B the element-by-element multiplication of the two matrices A and B, and by A⊘B the element-by-element division of the two matrices. Denote by x t i2j ¼ ðx t i ; x t iþ1 ; . . . ; x t j Þ the element-wise power where the subscripts of the natural numbers i # j, and by default x i2j ¼ ðx i ; x iþ1 ; . . . ; x j Þ. Let 1 be a vector with appropriate length and all the elements being 1. Let LðxÞ be the diagonal matrix with the diagonal elements being the vector x. Denote by ½x; . . . ; y the matrix with row vectors x, . . ., y of equal length. Denote by superscript T the transpose of a vector or matrix.
The closed form expressions for the two-and three-gene non-IBD probabilities and the expected junction densities are derived for populations with constant size and random mating schemes. The coalescence probabilities are thus constant, and set s m t ¼ s m , s p t ¼ s p , q m t ¼ q m , and q p t ¼ q p . We first consider a finite population, number of males N m $ 1 and number of females N f $ 3, so that all the two-and three-gene non-IBD probabilities exist. Then consider an example of small population size, a sibling-mating population with one male and one female (N f ¼ N m ¼ 1), where the non-IBD probabilities b pp ð12Þ, b mmm ð123Þ, b mmp ð123Þ, b mpp ð123Þ, and b ppp ð123Þ do not exist. Lastly, we consider a large population under the limit that the size N f ¼ N m ) 3.

A1 Finite population
Definition A1.1. The finite population refers to a population of constant number N m $ 1 of males and number N f $ 3 of females, maintained by random mating, and the initial population satisfies a Premise A1.3. The eigenvalues of T 12 in a finite population, denoted by l k , k 2 f1; 2; 3g in the decreasing order of their absolute values, are distinct with multiplicities 1, and none of them is 0, 1, or 21=2. Theorem A1.4. The two-gene non-IBD probability b t ð12Þ in a finite population is given by where the constant coefficients B 123 , C 123 , and D 123 are to be solved. Substituting b mm t ð12Þ and b mp t ð12Þ of equations (A1.6a, A1.6b) into the recurrence equation (4b), we obtain B k ¼ ð2l k 2 1ÞC k ; k 2 f1; 2; 3g: (A1.7) Substituting b mm t ð12Þ and b pp t ð12Þ of equations (A1.6a, A1.6c) into the recurrence equation (4c), we obtain D k ¼ ð1 2 s p Þ B k l k ; k 2 f1; 2; 3g; (A1.8) Substituting B k and D k of equations (A1.7, A1.8) into equation (A1.6a-A1.6c), we obtain where P is given by in equation (A1.4), and the constant coefficient C 123 is determined by the initial condition b 0 ð12Þ and it is given by equation ( ppp t ð123ÞÞ T the three-gene non-IBD probability in a finite population. According to equations (5a-5d), it holds Premise A1.6. The eigenvalues of T 123 in a finite population, denoted by l k ; k 2 f4; 5; 6; 7g in the decreasing order of their absolute values, are distinct with multiplicities 1, and none of them is 0, 1, 2 1=2, or l k ; k 2 f1; 2; 3g. Theorem A1.7. The three-gene non-IBD probability b t ð123Þ in a finite population is given by and for k 2 f4; 5; 6; 7g ð123Þ of equations (A1.16a-A1.16c) into the recurrence equation (5b), and substituting B k of equation (A1.18), we obtain A k ¼ a k C k ; k 2 f4; 5; 6; 7g; (A1.19) where a k is given by equation (A1.15). Substituting A k , B k , and D k of equations (A1.17-A1.19) into equations (A1.16a-A1.16d), we obtain where Q is given by equation (A1.13), and the constant coefficient C 427 is determined by the initial condition b 0 ð123Þ and it is given by equation (A1.14). Theorem A1.8. The map expansions in a finite population are given by where C 123 is given by equation (A1.5), and Proof. According to equation (7a), R X t can be obtained from the accumulative summation of the non-IBD probability b mp t of equation (A1.6b), and we have which is equivalent to equation (A1.21) with the stationary map expansion C 8 being given by equation (A1.24). The eigenvalues for the transition matrix of the linear recurrence equations (4a-4c) and equations (6a, 6b) are 1, 2 1=2, l 123 , and thus the map expansions R m and R p can be expressed in the forms of equations (A1.22, A1.23), where the constant coefficients C 10212 are determined by calculating from equations (A1.22, A1.23) and comparing the result with equation (A1.21), and C 9 is determined by the initial condition R m 0 . h Theorem A1.9. Denote by K t ð1122Þ ¼ ðK mm t ð1122Þ; K mp t ð1122Þ; K pp t ð1122ÞÞ T the expected density of junction type ð1122Þ in a finite population, and it holds where C 8 2 12 are given by equations (A1.24-A1.26), P is given by equation (A1.4), Proof. The eigenvalues for the transition matrix of the linear recurrence equations (4a-4c), equations (6a, 6b), and equations (9a-9c) are 1, 21=2, and duplicated l 123 . It holds where the constant coefficients B 13220 , C 13 2 20 , and D 13220 are to be solved. Substituting K mm t ð1122Þ and K mp t ð1122Þ of equations (A1.33a, A1.33b) into the recurrence equation (9b), we obtain Substituting K mm t ð1122Þ and K pp D 13 ¼ C 13 þ s p ðC 8 2 C 13 Þ; (A1.36a) The constant coefficient C 15217 is determined by the initial condition K 0 ð1122Þ. Definition A1.10. Denote by where P is given by equation (A1.4), and for k 2 f4; 5; 6; 7g where f 12 ðxÞ is the characteristic polynomial of the transition matrix T 12 of equation (A1.2).
Proof. The corollary follows by substituting R m t and R A2 Sibling-mating population Definition A2.1. The sibling-mating population refers to a population of constant size 2, one male and one female, maintained by sibling mating, and the initial population satisfies a mp 0 ð12Þ ¼ b mp 0 ð12Þ. In such a population, the coalescence probability s m ¼ 1, and the coalescence probabilities s p , q m , q p are set to zero.
Definition A2.2. In a sibling-mating population, the non-IBD probabilities b (A2.1) and its eigenvalues are given by Definition A2.4. In a sibling-mating population, the conjugate is obtained by replacing ffiffi ffi 5 p with 2 ffiffi ffi 5 p from the terms involving l 1 . Theorem A2.5. The two-gene non-IBD probability b t ð12Þ in a sibling-mating population is given by Proof. Similar to Theorem A1.4, it holds The theorem follows by substituting P an C 1 2 2 into equation (A2.5). h Theorem A2.6. The three-gene non-IBD probability in a sibling-mating population is given by Theorem A2.7. The map expansions in a sibling-mating population are given by ! ðl 1 Þ t þ conjugate (A2.14) where C 8 and C 9 are given by Á ; (A2.16) Á ; (A2.17) Proof. Similar to Theorem A1.8, the map expansions in a sibling-mating population are given by The theorem follows by substituting l 122 of equation (A2.2) and C 122 of equation (A2.7) into the aforementioned equations. Theorem A2.8. The expected density K t ð1122Þ ¼ ðK mm t ð1122Þ; K mp t ð1122ÞÞ T in a sibling-mating population is given by where C 8 and C 9 are given by equations (A2.16, A2.17), respectively.