# Haplotype Probabilities in Advanced Intercross Populations

- Karl W. Broman
^{1}

- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin 53706

- 1Corresponding author: Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, 1300 University Avenue, Room 4710 MSC, Madison, WI 53706. E-mail: kbroman{at}biostat.wisc.edu

## Abstract

Advanced intercross populations, in which multiple inbred strains are mated at random for many generations, have the advantage of greater precision of genetic mapping because of the accumulation of recombination events across the multiple generations. Related designs include heterogeneous stock and the diversity outcross population. In this article, I derive the two-locus haplotype probabilities on the autosome and X chromosome with these designs. These haplotype probabilities provide the key quantities for developing hidden Markov models for the treatment of missing genotype information. I further derive the map expansion in these populations, which is the frequency of recombination breakpoints on a random chromosome.

- advanced intercross lines
- heterogeneous stock
- diversity outcross
- map expansion
- Collaborative Cross
- Mouse Genetic Resource

Advanced intercross populations, in which multiple inbred strains are mated at random for many generations, have the advantage of greater precision of genetic mapping because of the accumulation of recombination events across the multiple generations. The most commonly used form, which begins with two inbred strains, was formally introduced by Darvasi and Soller (1995) and called advanced intercross lines (AIL). A closely related design is that of heterogeneous stock (HS; see Mott *et al.* 2000), in which eight inbred strains are randomly mated for many generations. Svenson *et al.* (2012) developed the diversity outcross population (DO), which was formed with progenitors that were partially inbred individuals drawn from intermediate generations in the development of the Collaborative Cross (so-called pre-CC mice; see Aylor *et al.* 2011).

The mapping of quantitative trait loci in such populations, whether by interval mapping (Lander and Botstein 1989) or Haley-Knott regression (Haley and Knott 1992), generally requires conditional genotype probabilities at putative quantitative trait loci, given the available marker genotype data. Such probabilities are often calculated using a hidden Markov model (HMM; see Broman and Sen 2009, App. D). An HMM for this purpose formally requires the calculation of two-locus diplotype probabilities, although if the populations are formed with a large number of mating pairs, the two haplotypes within an individual are independent, and so it is sufficient to calculate two-locus haplotype probabilities.

Darvasi and Soller (1995) derived the two-locus haplotype probabilities for the autosome in AIL. I am not aware of any work considering the X-chromosome. In this article, I derive the two-locus haplotype probabilities for the autosome and X-chromosome in AIL, HS, and the DO. The calculations for the DO rely on recent results on haplotype probabilities in pre-CC mice (Broman 2012). Throughout, I assume an effectively infinite set of mating pairs at each generation, no sex difference in recombination, and no selection or mutation.

Let us first revisit the two-locus autosomal haplotype probabilities in AIL, as they serve as a simple example of the technique used in these calculations (see also Bulmer 1980, Ch. 3). Let *p _{s}* denote the frequency of the

*AA*haplotype at generation F

_{s}. Then and we have the recurrence relation(1)where

*r*is the recombination fraction (in one meiosis) between the two loci. Equation (1) is derived by noting that an

*AA*haplotype drawn from generation F

_{s+1}is either an intact

*AA*haplotype at generation F

_{s}, transmitted without recombination, or it is a recombinant haplotype bringing two independent

*A*alleles together. Note that the frequency of the

*A*allele is at every generation.

The solution of this recurrence relation (see Graham *et al.* 1994) is, for *s* ≥ 2,(2)The frequency of recombinant haplotypes at generation F_{s} is 1 − 2*p _{s}*.

For the X-chromosome in AIL, I will first consider a balanced case, begun with equal proportions of F_{1} individuals from reciprocal crosses, *A* × B and *B* × *A*, so that the F_{1} males are equally likely to be hemizygous *A* or *B*. Let *m _{s}* and

*f*denote the frequency of the AA haplotype in males and females, respectively, at generation F

_{s}_{s}. Then and we have(3)

This recurrence relation is derived in a similar way to that for the autosome, noting that the male haplotype was drawn from his mother, with a chance for recombination, and a random female haplotype is equally likely to have been drawn from her father, without recombination, or from her mother, with the potential for recombination. I again make use of the fact that the frequency of the *A* allele is in both males and females at every generation. The solution to this relation is, for *s* ≥ 2,(4)where , *w* = (1 − *r* + *z*)/4, and *y* = (1 − *r* − *z*)/4. Note that the frequencies of recombinant haplotypes in males and females are 1 − 2*m _{s}* and 1 − 2

*f*, respectively, and that the overall frequency is 1 − (2

_{s}*m*+4

_{s}*f*)/3.

_{s}Now I turn to the unbalanced case for the X-chromosome, in which all F_{1} individuals are derived from the cross female *A* × male *B*, so that all F_{1} males are hemizygous *A*. This appears to be widely used in practice (*e.g.,* Norgard *et al.* 2008; Kelly *et al.* 2010). The calculations are more difficult, because the allele frequencies are different in males and females and across generations.

I first calculate the single-locus allele frequencies. Let *q _{s}* be the frequency of the

*A*allele in females at generation F

_{s}. Note that the frequency in males at F

_{s}is

*q*

_{s}_{−1}. The initial values are

*q*

_{0}= 1 and , and we have the recurrence relation , which comes from the fact that a random allele drawn from the female at generation F

_{s+1}is equally likely to be an allele from the female or male at generation F

_{s}, and the allele in the male at F

_{s}is a random allele from the female at F

_{s−1}. The solution of the recurrence relation is , for

*s*≥ 0.

I now turn to the two-locus haplotype probabilities. Let and denote the frequencies of the *AA* haplotype on the X chromosome in males and females at generation F_{s} in an unbalanced AIL, and note that and . The haplotype probabilities satisfy a recurrence relation similar to that in equation (3):(5)

Note the distinction between equations (3) and (5): if a recombinant haplotype is transmitted from the F_{s} female, the chance that it brings two *A* alleles together depends on the frequency of the *A* allele in males and females in the *F _{s}*

_{−1}generation. In the balanced case, these are each ; in the unbalanced case, they are different from each other and vary across generations.

I have been unable to obtain closed-form solutions for and . However, the values can be quickly calculated numerically, using equation (5). Note that .

Haplotype probabilities in the DO are calculated similarly. The progenitors for the DO were pre-CC mice. I assume a large number of progenitors, that they were drawn from independent lines, and that the order of the crosses that generated the different lines were random, giving complete balance across the eight alleles.

In a potential abuse of notation, I will redefine the *q*, *p*, *m*, and *f* variables used previously. Let *q _{k}* denote the frequency of the

*AA*haplotype at generation G

_{2}:F

_{k}in the pre-CC; this is times the haplotype probability in Table 4 of Broman (2012). Let

*p*be the probability of the

_{s}*AA*haplotype at generation

*s*of the diversity outcross.

The pre-CC progenitors of the DO were drawn from independent lines at a variety of different generations along the course to inbreeding. Let α* _{k}* denote the proportion of the pre-CC progenitors that were at generation G

_{2}: F

_{k}, and note that a pre-CC progenitor at generation G

_{2}: F

_{k}will transmit the

*AA*haplotype with frequency

*q*

_{k}_{+1}(that is, the frequency of the

*AA*haplotype at generation G

_{2}: F

_{k}). Thus, the frequency of the

*AA*haplotype at the first generation of the DO is .

The recurrence relation for the *p _{s}* is like that in equation (1):

*p*

_{s}_{+1}= (1 −

*r*)

*p*+

_{s}*r*/64. The solution is(6)

Note that the recombinant haplotypes are all equally likely, due to the random order of the initial crosses, and so each has probability (1 − 8*p _{s}*)/56.

HS corresponds to the DO with α_{1} = 1 (that is, *k* ≡ 1), in which case *p*_{1} = *q*_{2} = 7 − 24*r* + 24*r*^{2} − 8*r*^{3}.

I now turn to the X-chromosome. Let *m _{s}* and

*f*denote the frequency of the

_{s}*AA*haplotype on the X chromosome in males and females in the DO at generation

*s*. Assuming random orders of crosses to generate the pre-CC progenitors,(7)where and are the frequencies of the

*AA*and

*CC*haplotypes, respectively, on the X-chromosome in females at generation G

_{1}: F

_{k+1}in the construction of four-way RIL by sibling mating (see Broman 2012, Table 4).

*m*

_{1}is calculated in the same way. The recurrence relations are much like equation (3):(8)

The solutions are the following:(9)where *w*, *y*, and *z* are as in equation (4).

Again, HS corresponds to DO with α_{1} = 1, in which case *f*_{1} = (4 − 5*r* + *r*^{2})/32 and, *m*_{1} = (2 − 3*r* + *r*^{2})/16.

In Figure 1, the probabilities of recombinant two-locus haplotypes are displayed for the different populations. For the DO, I used the distribution of *k* as in Figure 1 of Svenson *et al.* (2012) and *s* = 5. For HS and AIL, I used *s* = 10 and 12, respectively, to match the total number of generations with recombination—the average *k* in Svenson *et al.* (2012) was six. Recombinant haplotypes are more frequent on the autosome, and are more frequent in HS than in the DO; inbreeding in the pre-CC progenitors of the DO is accompanied by a loss of recombinants.

It is particularly interesting to consider the map expansion in these populations, which is the frequency of recombination breakpoints on a random chromosome. Let *R* denote the probability of a recombinant haplotype; then the map expansion is (see Teuscher and Broman 2007). The map expansion on an autosome in AIL is *s*/2. For the DO, on an autosome, the map expansion satisfies , where *M*_{1} is the weighted average (with weights α_{k}) of the map expansion in the pre-CC at generation G_{2}: F_{k+1} (see Broman 2012, Table 4). For the particular progenitors detailed in Svenson *et al.* (2012, Figure 1), this is approximately (7*s* +37)/8. For HS, we have *M*_{1} = 3 and .

For the X-chromosome in balanced AIL, HS and DO, the map expansion is that of the autosome. For the case of the X-chromosome in unbalanced AIL, in which all F_{1} males are hemizygous *A*, I cannot derive a closed-form solution, but taking the derivatives of the recurrence relations in equation (5), I can derive a simple recurrence relation for the map expansion. (Note that the overall map expansion on the X-chromosome can be obtained as the average of the sex-specific map expansions, with weight given to the female, since two-thirds of the X-chromosomes are in females.) Let denote the map expansion at F_{s}, and again let *q _{s}* be the frequency of the

*A*allele in females at F

_{s}. Then we have(10)with the initial conditions and . Although I have not been able to derive a closed-form solution for , it is easily calculated numerically.

The aforementioned haplotype probabilities provide the key quantities for developing HMMs for advanced intercross populations. However, it should be noted that there are other approaches to handling such data. For example, Besnier *et al.* (2011) used a variance components model to analyze outbred chicken AIL data, with identity-by-descent probabilities calculated using a modified version of the method of Pong-Wong *et al.* (2001), for general pedigree data.

The aforementioned result for HS differs from that in Mott *et al.* (2000) and incorporated into the HAPPY software. They had assumed that the map expansion in HS was , whereas I show it to be . In the first three of generations with recombination, individuals are fully heterozygous, and so all recombination events can be seen; in the subsequent *s* − 1 generations, there is a 1/8 chance of homozygosity and so only 7/8 of recombination events can be seen.

Mott *et al.* (2000) further assumed that the transition probabilities along an HS chromosome are a function of genetic distance, but that requires knowledge of the map function. It is more direct to express the transition probabilities in terms of the recombination fraction at meiosis.

The green curve in Figure 1 displays the probability of a recombinant haplotype assumed in Mott *et al.* (2000) for HS with *s* = 10 when the map function corresponding to the gamma model with the level of crossover interference estimated for the mouse in Broman *et al.* (2002) is used. The probability is slightly smaller than that from my calculations; at *r* = 0.01, the equation in Mott *et al.* (2000) gives 0.099, whereas I obtain 0.103.

I have assumed an effectively infinite number of mating pairs at each generation. In practice, with a finite number of mating pairs, there will be some inbreeding and so an increased frequency of homozygosity and a decreased frequency of recombination. In addition, the individuals at the final generation will include siblings, and the relationships among individuals might be used to improve the genotype reconstruction. In practice, for computational efficiency, both the inbreeding and the relationships among individuals would probably be ignored in the genotype reconstruction, and with dense genotype data, there will be little loss of information.

## Acknowledgments

James Crow generously provided comments for improvement of the manuscript. This work was supported in part by National Institutes of Health grant GM074244.

- Received September 16, 2011.
- Accepted November 10, 2011.

- Copyright © 2012 Broman

This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.