Abstract

Advanced intercross populations, in which multiple inbred strains are mated at random for many generations, have the advantage of greater precision of genetic mapping because of the accumulation of recombination events across the multiple generations. Related designs include heterogeneous stock and the diversity outcross population. In this article, I derive the two-locus haplotype probabilities on the autosome and X chromosome with these designs. These haplotype probabilities provide the key quantities for developing hidden Markov models for the treatment of missing genotype information. I further derive the map expansion in these populations, which is the frequency of recombination breakpoints on a random chromosome.

Advanced intercross populations, in which multiple inbred strains are mated at random for many generations, have the advantage of greater precision of genetic mapping because of the accumulation of recombination events across the multiple generations. The most commonly used form, which begins with two inbred strains, was formally introduced by Darvasi and Soller (1995) and called advanced intercross lines (AIL). A closely related design is that of heterogeneous stock (HS; see Mott et al. 2000), in which eight inbred strains are randomly mated for many generations. Svenson et al. (2012) developed the diversity outcross population (DO), which was formed with progenitors that were partially inbred individuals drawn from intermediate generations in the development of the Collaborative Cross (so-called pre-CC mice; see Aylor et al. 2011).

The mapping of quantitative trait loci in such populations, whether by interval mapping (Lander and Botstein 1989) or Haley-Knott regression (Haley and Knott 1992), generally requires conditional genotype probabilities at putative quantitative trait loci, given the available marker genotype data. Such probabilities are often calculated using a hidden Markov model (HMM; see Broman and Sen 2009, App. D). An HMM for this purpose formally requires the calculation of two-locus diplotype probabilities, although if the populations are formed with a large number of mating pairs, the two haplotypes within an individual are independent, and so it is sufficient to calculate two-locus haplotype probabilities.

Darvasi and Soller (1995) derived the two-locus haplotype probabilities for the autosome in AIL. I am not aware of any work considering the X-chromosome. In this article, I derive the two-locus haplotype probabilities for the autosome and X-chromosome in AIL, HS, and the DO. The calculations for the DO rely on recent results on haplotype probabilities in pre-CC mice (Broman 2012). Throughout, I assume an effectively infinite set of mating pairs at each generation, no sex difference in recombination, and no selection or mutation.

Let us first revisit the two-locus autosomal haplotype probabilities in AIL, as they serve as a simple example of the technique used in these calculations (see also Bulmer 1980, Ch. 3). Let ps denote the frequency of the AA haplotype at generation Fs. Then p1=12 and we have the recurrence relation
ps+1=(1r)ps+r1212
(1)
where r is the recombination fraction (in one meiosis) between the two loci. Equation (1) is derived by noting that an AA haplotype drawn from generation Fs+1 is either an intact AA haplotype at generation Fs, transmitted without recombination, or it is a recombinant haplotype bringing two independent A alleles together. Note that the frequency of the A allele is 12 at every generation.
The solution of this recurrence relation (see Graham et al. 1994) is, for s ≥ 2,
ps=14[1+(12r)(1r)s2].
(2)
The frequency of recombinant haplotypes at generation Fs is 1 − 2ps.
For the X-chromosome in AIL, I will first consider a balanced case, begun with equal proportions of F1 individuals from reciprocal crosses, A × B and B × A, so that the F1 males are equally likely to be hemizygous A or B. Let ms and fs denote the frequency of the AA haplotype in males and females, respectively, at generation Fs. Then m1=f1=12 and we have
ms+1=(1r)fs+r4fs+1=(12)ms+(1r2)fs+r8
(3)
This recurrence relation is derived in a similar way to that for the autosome, noting that the male haplotype was drawn from his mother, with a chance for recombination, and a random female haplotype is equally likely to have been drawn from her father, without recombination, or from her mother, with the potential for recombination. I again make use of the fact that the frequency of the A allele is 12 in both males and females at every generation. The solution to this relation is, for s ≥ 2,
ms=18[2+(12r)(ws2+ys2)+(35r+2r2z)(ws2ys2)]fs=18[2+(12r)(ws2+ys2)+(36r+r2z)(ws2ys2)]
(4)
where z=(1r)(9r), w = (1 − r + z)/4, and y = (1 − rz)/4. Note that the frequencies of recombinant haplotypes in males and females are 1 − 2ms and 1 − 2fs, respectively, and that the overall frequency is 1 − (2ms +4fs)/3.

Now I turn to the unbalanced case for the X-chromosome, in which all F1 individuals are derived from the cross female A × male B, so that all F1 males are hemizygous A. This appears to be widely used in practice (e.g.,  Norgard et al. 2008; Kelly et al. 2010). The calculations are more difficult, because the allele frequencies are different in males and females and across generations.

I first calculate the single-locus allele frequencies. Let qs be the frequency of the A allele in females at generation Fs. Note that the frequency in males at Fs is qs−1. The initial values are q0 = 1 and q1=12, and we have the recurrence relation qs+1=12qs+12qs1, which comes from the fact that a random allele drawn from the female at generation Fs+1 is equally likely to be an allele from the female or male at generation Fs, and the allele in the male at Fs is a random allele from the female at Fs−1. The solution of the recurrence relation is qs=23+(13)(12)s, for s ≥ 0.

I now turn to the two-locus haplotype probabilities. Let ms and fs denote the frequencies of the AA haplotype on the X chromosome in males and females at generation Fs in an unbalanced AIL, and note that m1=1 and f1=12. The haplotype probabilities satisfy a recurrence relation similar to that in equation (3):
ms+1=(1r)fs+rqs1qs2fs+1=(12)ms+(1r2)fs+(r2)qs1qs2
(5)

Note the distinction between equations (3) and (5): if a recombinant haplotype is transmitted from the Fs female, the chance that it brings two A alleles together depends on the frequency of the A allele in males and females in the Fs−1 generation. In the balanced case, these are each 12 ; in the unbalanced case, they are different from each other and vary across generations.

I have been unable to obtain closed-form solutions for ms and fs. However, the values can be quickly calculated numerically, using equation (5). Note that limsfs=limsms=49.

Haplotype probabilities in the DO are calculated similarly. The progenitors for the DO were pre-CC mice. I assume a large number of progenitors, that they were drawn from independent lines, and that the order of the crosses that generated the different lines were random, giving complete balance across the eight alleles.

In a potential abuse of notation, I will redefine the q, p, m, and f variables used previously. Let qk denote the frequency of the AA haplotype at generation G2:Fk in the pre-CC; this is 1r2 times the haplotype probability in Table 4 of Broman (2012). Let ps be the probability of the AA haplotype at generation s of the diversity outcross.

The pre-CC progenitors of the DO were drawn from independent lines at a variety of different generations along the course to inbreeding. Let αk denote the proportion of the pre-CC progenitors that were at generation G2: Fk, and note that a pre-CC progenitor at generation G2: Fk will transmit the AA haplotype with frequency qk+1 (that is, the frequency of the AA haplotype at generation G2: Fk). Thus, the frequency of the AA haplotype at the first generation of the DO is p1=kαkqk+1.

The recurrence relation for the ps is like that in equation (1): ps+1 = (1 − r)ps + r/64. The solution is
ps=164+(1r)s1(p1164)
(6)

Note that the recombinant haplotypes are all equally likely, due to the random order of the initial crosses, and so each has probability (1 − 8ps)/56.

HS corresponds to the DO with α1 = 1 (that is, k ≡ 1), in which case p1 = q2 = 7 − 24r + 24r2 − 8r3.

I now turn to the X-chromosome. Let ms and fs denote the frequency of the AA haplotype on the X chromosome in males and females in the DO at generation s. Assuming random orders of crosses to generate the pre-CC progenitors,
f1=kαk(18)[(2r)hk+1AA+(1r)hk+1CC]
(7)
where hk+1AA and hk+1CC are the frequencies of the AA and CC haplotypes, respectively, on the X-chromosome in females at generation G1: Fk+1 in the construction of four-way RIL by sibling mating (see Broman 2012, Table 4). m1 is calculated in the same way. The recurrence relations are much like equation (3):
ms+1=(1r)fs+r64fs+1=(12)ms+(1r2)fs+r128
(8)
The solutions are the following:
ms=1128{2+[(64m1256f1+3)(1r)z](ys1ws1)(164m1)(ws1+ys1)}fs=1128{2+[64f1(1r)128m1+3rz](ys1ws1)(164f1)(ws1+ys1)}
(9)
where w, y, and z are as in equation (4).

Again, HS corresponds to DO with α1 = 1, in which case f1 = (4 − 5r + r2)/32 and, m1 = (2 − 3r + r2)/16.

In Figure 1, the probabilities of recombinant two-locus haplotypes are displayed for the different populations. For the DO, I used the distribution of k as in Figure 1 of Svenson et al. (2012) and s = 5. For HS and AIL, I used s = 10 and 12, respectively, to match the total number of generations with recombination—the average k in Svenson et al. (2012) was six. Recombinant haplotypes are more frequent on the autosome, and are more frequent in HS than in the DO; inbreeding in the pre-CC progenitors of the DO is accompanied by a loss of recombinants.

Figure 1 

Frequency of a two-locus haplotype being recombinant, as a function of the recombination fraction at meiosis, for the diversity outcross population at s = 5 (solid curves), heterogeneous stock at s = 10 (dashed curves), and balanced AIL at s = 12 (dotted curves), for the autosome (black), male X (blue), and female X (red). The green dashed curve is the recombinant frequency for HS at s = 10 assumed in Mott et al. (2000).

It is particularly interesting to consider the map expansion in these populations, which is the frequency of recombination breakpoints on a random chromosome. Let R denote the probability of a recombinant haplotype; then the map expansion is dRdr|r=0 (see Teuscher and Broman 2007). The map expansion on an autosome in AIL is s/2. For the DO, on an autosome, the map expansion satisfies Ms=78(s1)+M1, where M1 is the weighted average (with weights αk) of the map expansion in the pre-CC at generation G2: Fk+1 (see Broman 2012, Table 4). For the particular progenitors detailed in Svenson et al. (2012, Figure 1), this is approximately (7s +37)/8. For HS, we have M1 = 3 and Ms=7s+178.

For the X-chromosome in balanced AIL, HS and DO, the map expansion is 23 that of the autosome. For the case of the X-chromosome in unbalanced AIL, in which all F1 males are hemizygous A, I cannot derive a closed-form solution, but taking the derivatives of the recurrence relations in equation (5), I can derive a simple recurrence relation for the map expansion. (Note that the overall map expansion on the X-chromosome can be obtained as the average of the sex-specific map expansions, with 23 weight given to the female, since two-thirds of the X-chromosomes are in females.) Let Ms denote the map expansion at Fs, and again let qs be the frequency of the A allele in females at Fs. Then we have
Ms+1=Ms+43(qsqs1qs2)
(10)
with the initial conditions M1=0 and M2=23. Although I have not been able to derive a closed-form solution for Ms, it is easily calculated numerically.

The aforementioned haplotype probabilities provide the key quantities for developing HMMs for advanced intercross populations. However, it should be noted that there are other approaches to handling such data. For example, Besnier et al. (2011) used a variance components model to analyze outbred chicken AIL data, with identity-by-descent probabilities calculated using a modified version of the method of Pong-Wong et al. (2001), for general pedigree data.

The aforementioned result for HS differs from that in Mott et al. (2000) and incorporated into the HAPPY software. They had assumed that the map expansion in HS was 78(s+2), whereas I show it to be 78(s1)+3. In the first three of generations with recombination, individuals are fully heterozygous, and so all recombination events can be seen; in the subsequent s − 1 generations, there is a 1/8 chance of homozygosity and so only 7/8 of recombination events can be seen.

Mott et al. (2000) further assumed that the transition probabilities along an HS chromosome are a function of genetic distance, but that requires knowledge of the map function. It is more direct to express the transition probabilities in terms of the recombination fraction at meiosis.

The green curve in Figure 1 displays the probability of a recombinant haplotype assumed in Mott et al. (2000) for HS with s = 10 when the map function corresponding to the gamma model with the level of crossover interference estimated for the mouse in Broman et al. (2002) is used. The probability is slightly smaller than that from my calculations; at r = 0.01, the equation in Mott et al. (2000) gives 0.099, whereas I obtain 0.103.

I have assumed an effectively infinite number of mating pairs at each generation. In practice, with a finite number of mating pairs, there will be some inbreeding and so an increased frequency of homozygosity and a decreased frequency of recombination. In addition, the individuals at the final generation will include siblings, and the relationships among individuals might be used to improve the genotype reconstruction. In practice, for computational efficiency, both the inbreeding and the relationships among individuals would probably be ignored in the genotype reconstruction, and with dense genotype data, there will be little loss of information.

Acknowledgments

James Crow generously provided comments for improvement of the manuscript. This work was supported in part by National Institutes of Health grant GM074244.

Literature Cited

Aylor
D L
,
Valdar
W
,
Foulds-Mathes
W
,
Buus
R J
,
Verdugo
R A
 et al. ,
2011
Genetic analysis of complex traits in the emerging Collaborative Cross
.
Genome Res.
 
21
:
1213
1222
.

Besnier
F
,
Wahlberg
P
,
Rönnegård
L
,
Ek
W
,
Andersson
L
 et al. ,
2011
Fine mapping and replication of QTL in outbred chicken advanced intercross lines
.
Genet. Sel. Evol.
 
43
:
3
.

Broman
K W
,
2012
Genotype probabilities at intermediate generations in the construction of recombinant inbred lines
.
Genetics
 
190
:
403
412
.

Broman
K W
,
Rowe
L B
,
Churchill
G A
,
Paigen
K
,
2002
Crossover interference in the mouse
.
Genetics
 
160
:
1123
1131
.

Broman
K W
,
Sen
S
,
2009
A Guide to QTL Mapping With R/qtl
.
Springer
,
New York
.

Bulmer
M G
,
1980
The Mathematical Theory of Quantitative Genetics
.
Clarendon Press
,
Gloucestershire
.

Darvasi
A
,
Soller
M
,
1995
Advanced intercross lines, an experimental population for fine genetic mapping
.
Genetics
 
141
:
1199
1207
.

Graham
R L
,
Knuth
D E
,
Patashnik
O
,
1994
Concrete Mathematics
, Ed. 2.
Addison-Wesley
,
Boston
.

Haley
C S
,
Knott
S A
,
1992
A simple regression method for mapping quantitative trait loci in line crosses using flanking markers
.
Heredity
 
69
:
315
324
.

Kelly
S A
,
Nehrenberg
D L
,
Peirce
J L
,
Hua
K
,
Steffy
B M
 et al. ,
2010
Genetic architecture of voluntary exercise in an advanced intercross line of mice
.
Physiol. Genomics
 
42
:
190
200
.

Lander
E S
,
Botstein
D
,
1989
Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps
.
Genetics
 
121
:
185
199
.

Mott
R
,
Talbot
C J
,
Turri
M G
,
Collins
A C
,
Flint
J
,
2000
A method for fine mapping quantitative trait loci in outbred animal stocks
.
Proc. Natl. Acad. Sci. USA
 
97
:
12649
12654
.

Norgard
E A
,
Roseman
C C
,
Fawcett
G L
,
Pavlic
M
,
Morgan
C D
 et al. ,
2008
identification of quantitative trait loci affecting murine long bone length in a two-generation intercross of LG/J and SM/J mice
.
J. Bone Miner. Res.
 
23
:
887
895
.

Pong-Wong
R
,
George
A W
,
Woolliams
J A
,
Haley
C S
,
2001
A simple and rapid method for calculating identity-by-descent matrices using multiple markers
.
Genet. Sel. Evol.
 
33
:
453
471
.

Svenson
K L
,
Gatti
D M
,
Valdar
W
,
Welsh
C E
,
Cheng
R
 et al. ,
2012
High resolution genetic mapping using the Mouse Diversity Outbred Population
.
Genetics
 
190
:
437
447
.

Teuscher
F
,
Broman
K W
,
2007
Haplotype probabilities for multiple-strain recombinant inbred lines
.
Genetics
 
175
:
1267
1274
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)