Abstract

Multiparent populations (MPP) have become popular resources for complex trait mapping because of their wider allelic diversity and larger population size compared with traditional two-way recombinant inbred (RI) strains. In mice, the collaborative cross (CC) is one of the most popular MPP and is derived from eight genetically diverse inbred founder strains. The strategy of generating RI intercrosses (RIX) from MPP in general and from the CC in particular can produce a large number of completely reproducible heterozygote genomes that better represent the (outbred) human population. Since both maternal and paternal haplotypes of each RIX are readily available, RIX is a powerful resource for studying both standing genetic and epigenetic variations of complex traits, in particular, the parent-of-origin (PoO) effects, which are important contributors to many complex traits. Furthermore, most complex traits are affected by >1 genes, where multiple quantitative trait locus mapping could be more advantageous. In this paper, for MPP-RIX data but taking CC-RIX as a working example, we propose a general Bayesian variable selection procedure to simultaneously search for multiple genes with founder allelic effects and PoO effects. The proposed model respects the complex relationship among RIX samples, and the performance of the proposed method is examined by extensive simulations.

Most human genes have functional mouse counterparts, and genomes of both organisms are organized similarly. Thus, mouse serves as a good model organism for complex human diseases. Recombinant inbred (RI) mice are among the major mouse resources in biomedical and genetic research. However, traditional mouse RI lines are derived from only two inbred parental strains, with limited a number of lines available, resulting in a low percentage (15%) of genetic variation across all mouse inbred strains (Yuan et al. 2011) and extensive blind spots where a sizable proportion of the genome is identical by descent. These limitations make traditional RI insufficient for studying complex traits. With the need for more powerful resources, multiparent populations (MPP) (de Koning and McIntyre 2017), a set of inbred lines using multiple lines as founders, can overcome the limitations of traditional RI lines and have become an innovative tool for fine quantitative trait locus (QTL) mapping. In the past 15 yr, different kinds of MPP have been established in plants and animals, such as nested association mapping design (Yu et al. 2008) and multiparent advanced generation intercross (MAGIC) populations (Cavanagh et al. 2008; Kover et al. 2009; Huang et al. 2015; Ladejobi et al. 2016) in plants, and the Drosophila Synthetic Population Resource (King et al. 2012) in animals. The mouse MPP include the collaborative cross (CC) (Complex Trait Consortium 2004), in which a genetically diverse set of eight inbred strains were selected as breeding founders (Iraqi et al. 2008) to generate a large number of RI lines. The eight founder strains were predicted to represent ∼90% of the genetic variation presented in laboratory mice (Threadgill and Churchill 2012). The CC thus greatly overcomes the limitations of the traditional mouse RI lines, and is the only mammalian resource with high genome-wide genetic variation that is uniformly distributed across a large, heterogeneous, and infinitely reproducible population (Threadgill et al. 2011).

Through the generation of RI intercrosses (RIX) of RI lines, a large number of potential “outbred” RIX samples can be generated. That is, given L RI lines, L(L1) or L(L1)/2 genetically distinct reciprocal or nonreciprocal F1 individuals, or RIX, can be produced. An example of using CC-RIX to illustrate a distinct response to Ebola virus infection, by Rasmussen et al. (2014), was summarized by the editor: “the CC-RIX mice could prove valuable for preliminary screens of candidate therapeutics and vaccines.” Since all parental RI lines are isogenic at each locus, genotypes of RIX can be imputed in advance from those of their parental RI lines. Additional advantages of RIX can be found in Threadgill and Churchill (2012), in particular, its power in support of analysis of parent-of-origin (PoO) effects, where effects of certain alleles are different depending on whether those alleles are inherited maternally or paternally. PoO makes a significant contribution to the heritability of most complex traits (Mott et al. 2014). In addition, genomic PoO effects provide a great model to study epigenetic regulation of gene expression (Barlow 2011).

During the past 20 yr, QTL mapping methods, including analysis of variance (ANOVA), interval mapping (Lander and Botstein 1989), composite interval mapping (Jansen and Stam 1994; Zeng 1994), and multiple interval mapping (Kao and Zeng 1997; Kao et al. 1999), have been well developed using experimental crosses, such as backcross, F2, and RI (see Broman (2001) for reviews), and many excellent open software packages, such as QTLCart (Basten et al. 1999), MapManager (Manly and Olson 1999), and R/qtl (Broman et al. 2003), are freely available online. For diallel data obtained from the CC founder strains, Lenarcic et al. (2012) employed a general Bayesian model for decomposing phenotypic variance into biologically intuitive components. Crowley et al. (2014) also applied a Bayesian method to a diallel cross of the eight founder strains to estimate genome-wide genetic and PoO effects (not QTL mapping) on responses to haloperidol, an antipsychotic drug. Zhang et al. (2014) proposed a general single QTL Bayesian framework for MAGIC data to coherently estimate haplotype and diplotype effects of founder alleles. Similarly, for MAGIC data, Wei and Xu (2016) developed a single QTL mapping method with a random-effects model by treating the founder allelic effects of each locus as random, and scanning the entire genome one locus at a time using a likelihood ratio test. For RIX mice, a mixed-effects model was developed for modeling the unbalanced relatedness among them (Tsaih et al. 2005; Zou et al. 2005). Gong and Zou (2012) proposed a more flexible, nonparametric single QTL mapping method for detecting QTL with time-varying coefficients. Hallin et al. (2016) applied a random-effects model to map single QTL by partitioning the variance of growth traits across different environments of yeast strains into additive, dominance, and pairwise epistatic components. However, many complex traits are affected by >1 gene, and multiple QTL mapping may be more powerful than locus-by-locus analysis. For CC-RIX data, Yuan et al. (2011) constructed a mixed-effects model to simultaneously map multiple QTL, but treated the eight CC founder alleles as standard biallelic single-nucleotide polymorphisms (SNPs). For many complex traits, it is arguable that modeling the effects of the eight founder alleles could lead to improved mapping power (Collaborative Cross Consortium 2012; Vered et al. 2014). In addition, for complex traits where PoO effects are suspected, modeling such effects may further improve QTL mapping power (Lawson et al. 2013) and understanding of etiologies of complex traits (Threadgill and Churchill 2012).

In this paper, for MPP-RIX data, we develop a Bayesian variable selection procedure to simultaneously map multiple genes with founder allelic effects and PoO effects. We demonstrate our method with CC-RIX data, but the method is general enough for other MPP-RIX populations. We place parameter-expanded Gaussian (PeG) priors on both the random founder allelic effects and PoO effects for variable selection.

The paper is organized as follows. In the Statistical Method section, we first introduce the CC-RIX experiment, then propose a random-effects model and describe a Bayesian variable selection procedure. In the Simulation Study section, we perform extensive simulation to examine the proposed method. Summary comments are given in the Discussion section.

Statistical Method

The CC-RIX panel is the RI intercross of CC lines. For L CC RI lines, a total of L(L1) reciprocal CC-RIX and L(L1)/2 nonreciprocal CC-RIX can be generated [see Figure 1 of Zou et al. (2005), Yuan et al. (2011)]. Let n be the total number of CC-RIX samples and p be the total number of genetic markers. Further, let the phenotype of individual i (the dependent variable) be yi (i=1,,n).

Figure 1

ROC curves. (A–C) for cases 1, 2, and 3, respectively, and (D–F) for cases 1*,2*, and 3*, respectively. The legends “POE”, “Mixed”, “Yuan” and “LMM” represent our proposed model (2), the mixed model (3), Yuan’s model (5), and the LMM model (4), respectively.

Model

In order to account for the unbalanced relatedness between the CC-RIX mice and model the eight founder alleles and PoO effects, we extend the mixed additive random-effects model of Gong and Zou (2012) and Yuan et al. (2011) as follows:
yi=μ+j=1pγQjxijTβj+j=1pγPjzijTξj+l=1Lailαl+ei
(1)
where γQj and γPj are binary {0, 1} variables used to decide whether the jth QTL and PoO should be included in or excluded from the model and µ is the overall mean; βj=(βj1,,βj8)T and the kth element of xijT equals 2, 1, or 0, depending on whether CC-RIXi inherits 2, 1, or 0 copies of the kth founder allele (k = 1, ⋯, 8) at the jth candidate locus; ail=# of parents of CC-RIXi that are equal to CC-RIl. Let A be an n × L matrix whose (i,l)th element equals ail. Clearly, l=1Lail2 for all i = 1, 2, ⋯, n, since each CC-RIX has two and only two parents. Therefore, βjk(j=1,,p) represents the kth founder allelic effect of locus j, and αl(l=1,,L) represents the random polygenic effect of founder strain l. We let αl follow N(0,σa2)(l=1,2,,L), and the random error ei follow N(0,σe2)(i=1,2,,n).

At a given locus, PoO effects can only be estimated from individuals with heterogeneous genotypes, since for those with homozygote genotypes, the parental contributions cannot be distinguished. Label the eight-founder alleles A, B, C, D, E, F, G, and H by numbers 1–8, and let the kth element of zij=(zij1,,zij8)T be equal to 1 or −1 if the maternal or paternal allele of the ith subject at the jth locus equals the kth founder allele, and 0 otherwise. For those CC-RIX samples with a homozygous genotype at a given locus, such as AA or BB, we let zij=01×8. Therefore, ξj=(ξj1,,ξj8)T represents the PoO effects of the jth locus.

Model (1) can be expressed by the following matrix form:
y=μ+j=1pγQjxjβj+j=1pγPjzjξj++e
(2)
where y=(y1,,yn)T,μ=μ1n,xj=(x1j,,xnj)T,zj=(z1j,,znj)T,α=(α1,,αL)T,e=(e1,,en)T, and 1n=(1,,1)T.

To enable selection of QTL and PoO effects, we reexpress the eight-dimensional vectors βj and ξj as the expanded parameters βj and ξj for j=1,,p, respectively, such that βj=γQjβj and ξj=γPjξj. This parameter-expansion idea was first proposed by Kuo and Mallick (1998) and has an advantage in selecting important variables and shrinking coefficients.

In the above model (2), the observed data are y, marker genotypes x=(x1,,xp), PoO information matrix z=(z1,,zp), and parental CC-RI information matrix A. The unobserved variables include μ, β=(β1T,,βpT)T,ξ=(ξ1T,,ξpT),α,σe2,σa2, and the indicators γQ={γQ1,,γQp} and γP={γP1,,γPp}. We assign the following conjugate Gaussian priors to the jth QTL and jth PoO effects as:
βjN8(0,σQj2I8)andξjN8(0,σPj2I8)forj=1,,p.
For convenience, we name the above hierarchical priors for QTL and PoO coefficients as PeG priors in the sequel. We assign conjugate noninformation hyper-priors to the variance parameters
P(σQj2)1σQj2,P(σPj2)1σPj2,P(σa2)(σa2)δ1andP(σe2)1σe2
where δ(0<δ1/2) is used to ensure that the posterior distribution is proper (ter Braak et al. 2005). The above PeG priors are equivalent to “spike and slab” point mass mixture Gaussian priors (Mitchell and Beauchamp 1988; George and McCulloch 1993) for the expanded parameters
βjγQjδ0+(1γQj)N8(0,σQj2I8)forj=1,,p.
and
ξjγPjδ0+(1γPj)N8(0,σPj2I8)forj=1,,p.
It is well known that these priors can achieve variable selection. However, compared with mixture priors, the PeG priors can employ a block Gibbs sampler to update the two blocks of parameters, one for βs and ξs (corresponding to the selected predictors) and one for βu and ξu (corresponding to the unselected predictors) in turn, and therefore can dramatically reduce computational time, especially for high-dimensional data with large p.
For indicator variables γQj and γPj, we specify Bernoulli priors with inclusion probability 0<ηQj<1 and 0<ηPj<1 for j=1,,p, respectively. To be more flexible, we further apply hierarchical uniform priors to ηQj and ηPj:
ηQjU[0,1],andηPjU[0,1],forj=1,,p.
Lastly, we specify a flat prior on μ as P(μ)1.

Block Gibbs sampling algorithm for posterior computation

The specific priors above result in known marginal conditional distributions for all variables. The blockwise Gibbs sampling algorithm that we employ can be summarized as follows:

First, we initiate σa2,σe2,σQ2={σQ12,,σQp2},σP2={σP12,,σPp2},ηQ={ηQ1,,ηQp}, and ηP={ηP1,,ηPp} from uniform distribution U(0,1), then sample other parameters β,ξ,α, and indicators γQ and γP from their priors. We then perform the following block Gibbs sampling procedures. Superscripts (t) and (t+1) signify the Markov chain Monte Carlo (MCMC) iterations, and t=0 refers to the initial iteration.

  • Step 1. Updating μ: μ(t+1) is drawn from the normal distribution.
    N(1n1n(yj=1pγQj(t)xjβj(t)j=1pγPj(t)zjξj(t)Aα(t)),σe2(t)n)
  • Step 2. Updating β: we divide the long vector β(t) into two blocks βs(t)=((βs1(t)),,(βsm1(t))) and βu(t)=((βu1(t)),,(βum0(t))) corresponding to the selected (γQj(t)=1) and unselected (γQj(t)=0) predictors, respectively. We then sample βuh(t+1)(h=1,,m0) from their priors, i.e., the eight-dimensional multivariate normal distribution with zero mean and covariance matrix σQ,uh2(t)I8, and sample βsh(t+1)(h=1,,m1) from the eight-dimensional multivariate normal distribution with mean 1/σe2(t)Σβsh(t)(xsh)(yμ(t+1)1nj<hγQ,sj(t)xsjβsj(t+1)j>hγQ,sj(t)xsjβsj(t)j=1pγPj(t)zjξj(t)Aα(t)) and covariance matrix Σβsh(t)=σQ,sh2(t)(σQ,sh2(t)/σe2(t)(xsh)xsh+I8)1.

  • Step 3. Updating ξ: similar to Step 2, we divide ξ(t) into two parts, ξs(t)=((ξs1(t)),,(ξsm1(t))) and ξu(t)=((ξu1(t)),,(ξum0(t))) corresponding to the selected (γPj(t)=1) and unselected (γPj(t)=0) predictors, respectively. Then ξuh(t+1)(h=1,,m0) is sampled from their priors, i.e., the eight-dimensional multivariate normal distribution with zero mean and covariance matrix σP,uh2(t)I8, and ξsh(t+1)(h=1,,m1) is sampled from the eight-dimensional multivariate normal distribution with mean1/σe2(t)Σξsh(t)(zsh)(yμ(t+1)1nj=1pγQj(t)xjβj(t+1)j<hγP,sj(t)zsjξsj(t+1)j>hγP,sj(t)zsjξsj(t)Aα(t)) and covariance matrix Σξsh(t)=σP,sh2(t)(σP,sh2(t)/σe2(t)(zsh)zsh+I8)1.

  • Step 4. Updating α:α(t+1) is drawn from the multivariate normal distribution with mean 1/σe2(t)Σα(t)A(yμ(t+1)1nj=1pγQj(t)xjβj(t+1)j=1pγPj(t)zjξj(t+1)) and covariance matrix Σα(t)=σa2(t)(σa2(t)/σe2(t)AA+IL)1.

  • Step 5. Updating σQj2(1jp):σQj2(t+1) is sampled from the scale-inverted χ2 distribution, βj(t+1)2/χ82.

  • Step 6. Updating σPj2(1jp):σPj2(t+1) is sampled from the scale-inverted χ2 distribution, ξj(t+1)2/χ82.

  • Step 7. Updating σa2: the random-effects variance σa2(t+1) is sampled from the scale-inverted χ2 distribution, α(t+1)2/χL2δ2.

  • Step 8. Updating σe2: the residual variance σe2(t+1) is sampled from the scale-inverted χ2 distribution, yμ(t+1)1nj=1pγQj(t)xjβj(t+1)j=1pγPj(t)zjξj(t+1)Aα(t+1)2/χn2

  • Step 9. Updating γQj(1jp):γQj(t+1) is sampled from the Bernoulli distribution with success probability pjQ(t)=ηQj(t)/(ηQj(t)+(1ηQj(t))pj0Q(t)), where pj0Q(t)=exp[1/2σe2(t+1)(2(KjQ(t))xjβj(t+1)(βj(t+1))(xjxj)βj(t+1))], and KjQ(t)=yμ(t+1)1nk<jγQk(t+1)xkβk(t+1)k>jγQk(t)xkβk(t+1)j=1pγPj(t)zjξj(t+1)Aα(t+1).

  • Step 10. Updating γPj(1jp):γPj(t+1) is sampled from the Bernoulli distribution with success probability pjP(t)=ηPj(t)/(ηPj(t)+(1ηPj(t))pj0P(t)), where pj0P(t)=exp[1/2σe2(t+1)(2(KjP(t))zjξj(t+1)(ξj(t+1))(zjzj)ξj(t+1))], and KjP(t)=yμ(t+1)1nj=1pγQj(t+1)xjβj(t+1)k<jγPk(t+1)zkξk(t+1)k>jγPk(t)zkξk(t+1)Aα(t+1).

  • Step 11. Updating ηQj(1jp):ηQj(t+1) is sampled from the beta distribution Beta(1+γQj(t+1),2γQj(t+1)).

  • Step 12. Updating ηPj(1jp):ηPj(t+1) is sampled from the beta distribution Beta(1+γPj(t+1),2γPj(t+1)).

Data availability

Supplemental Material, File S1 contains five supplemental figures. The MATLAB code used to analyze the simulated data is provided in File S2.

Simulation Study

In this section, we run extensive simulations to evaluate the performance of the proposed Bayesian method. We apply the loop design in Zou et al. (2005) and Yuan et al. (2011) by ordering the L CC-RI lines randomly and forming them into a circle, and then mating each CC-RI line (clockwise) with the next three CC-RI lines after it (i.e., CC-RI1 mated with CC-RI2, CC-RI3, and CC-RI4; CC-RI2mated with CC-RI3, CC-RI4, and CC-RI; and so on). For L=100, in this way, we can generate a total of n=300 CC-RIX samples. Parental CC-RI genotypes are simulated in R/qtl (Broman et al. 2003). All the simulation results are based on total 100 replications for each simulation setup.

In model (2), we set the overall mean μ = 1, the variance of random errors σe2=1, the polygenic effect variance σa2=1, and δ=103. Nineteen chromosomes each with a length of 70 cM are simulated, on which total p = 133, 266, and 1330 evenly spaced markers (resulting in 10-, 5-, and 1-cM intervals between nearby markers on each chromosome, named cases 1–3) are generated, corresponding to three marker density cases. Among the total p simulated markers, we randomly select five markers, and let the first three markers have QTL effects the last three markers have PoO effects. That is, the first two genes only have QTL effects, and the last two only have PoO effects. However, the middle-selected marker has both QTL and PoO effects. The corresponding variances of QTL or PoO effects of the selected markers are all set to 1.

To better assess the performance of our method in situations where multiple nearby SNPs jointly affect the outcome, for cases 1–3, we now let the number of causal SNPs in each of the first two QTL be 2 instead of 1. Specifically, we randomly select two nearby SNPs for each of the first two simulated QTL and denote the alleles of the two SNPs A and a. Based on the haplotype frequencies fAA,fAa,faA, and faa (without loss of generality, we assume that fAAfAafaAfaa), we create three haplotype allelic groups, where haplotype AA is group 1, Aa is group 2, and the other two are group 3, and set the genetic effects of the three groups to 1, 2, and 3, respectively. The three new simulation setups are labeled as cases 1 to 3.

For each simulated data point, we generate a single long chain with 20,000 cycles, of which the first 10,000 cycles are discarded as burn-in, resulting in a total of 10,000 samples for post-MCMC analysis. All the analysis is done in MATLAB, and the MATLAB source code is submitted as supplemental material (File S2).

For comparison, we fit each simulated data point with the following three models.

The same model as (2), but without the PoO terms

y=μ+j=1pγQjxjβj++e
(3)

All the priors are set to be the same as their counterparts in model (2). For convenience, we subsequently call model (3) the “mixed model”.

Linear mixed-effects model (LMM) for single gene mapping

y=xjβj+zjξj++e,1jp.
(4)
Here, random effects αN(0,σa2IL) and random errors eN(0,σe2In), and the random effects α are also assumed to be independent of the random errors e as before. Moreover, βj=(βj1,,βj8) and ξj=(ξj1,,ξj8) are the fixed QTL and PoO effects, respectively, of the jth tested locus. Given the constraints that k=18xijk=2 and k=18zijk=0 for i=1,,n,j=1,,p, we force the intercept μ in model (4) to be 0 to overcome the identifiability problem, and jointly test the effects of the jth locus as:
H0j:βj1=βj2==βj8,& ξj1=ξj2==ξj8.
In contrast to H1j, some of the equations in H0j are not satisfied. After obtaining the maximum likelihood estimates of the parameters in model (4), we perform a test with the following log-odds ratio (LOD):
LODj=log10L1jL0j
which is equivalent to the log-likelihood ratio test, where L0j and L1j are the likelihood of model (4) under null hypothesis H0j and alternative hypothesis H1j, respectively. Since the above hypothesis test is performed p times, it is necessary to find an appropriate significance threshold to control the multiple testing, which can be obtained, for example, by modified permutation procedures (Zou et al. 2005). However, in our comparison, we evaluate the receiver operating characteristic (ROC) curve, which only requires the use of LOD scores.

Yuan’s Bayesian method

yi=μ+j=1pzijaj+l=1Lailαl+ei
(5)

Here, aj is the effect of the jth putative QTL with zij=2mij, where mij(i=1,,n,j=1,,p) equals −1, 0, or 1 if the putative QTL genotype is aa,Aa, or AA, respectively. The other parameters are set the same as those in the mixed model. The prior of aj is set to N(0,σj2), 1 ≤ jp in Yuan et al. (2011).

To compare the methods, ROC curves (Fawcett 2006), where true positive rates (also known as sensitivity) are plotted against false positive rates (also known as 1-specificity) evaluated at various threshold cut-offs. To estimate the sensitivity (the proportion of positives that are correctly identified as such) and specificity (the proportion of negatives that are correctly identified as such), we define false and true positive findings as follows. If a detected locus falls no more than 10 cM apart from any simulated genes, we call it a true positive finding, otherwise a false positive finding. Then, combining the outputs of each model for the 100 data sets, we can calculate the corresponding sensitivity and specificity. Specifically, for the jth marker, we record the LOD scores, LODj, for the LMM model; the maximum posterior frequency between γQj and γPj for our proposed method; the posterior frequency γQj for the mixed model; and the posterior mean of σj2 for Yuan’s model. The corresponding area under the ROC curve (AUC) is also calculated for each of the four models, and the results are presented in Table 1. Generally speaking, a model with a higher AUC value indicates on average a better performance compared with those with lower AUC scores (Fawcett 2006).

Simulation settings and AUC values

Table 1
Simulation settings and AUC values
CaseISapAUCPbAUCMcAUCYdAUCLe
1101330.96420.84990.51550.9436
252660.97220.90220.50290.9463
3113300.99150.94890.50000.9525
1*101330.94990.68850.50850.9264
2*52660.96240.90560.49980.9238
3*113300.99080.93930.50400.9482
CaseISapAUCPbAUCMcAUCYdAUCLe
1101330.96420.84990.51550.9436
252660.97220.90220.50290.9463
3113300.99150.94890.50000.9525
1*101330.94990.68850.50850.9264
2*52660.96240.90560.49980.9238
3*113300.99080.93930.50400.9482
a

Interval space (IS; cM) between nearby markers.

b

AUC values of our proposed model (2).

c

AUC values of the mixed model (3).

d

AUC values of Yuan’s model (5).

e

AUC values of the LMM model (4).

Table 1
Simulation settings and AUC values
CaseISapAUCPbAUCMcAUCYdAUCLe
1101330.96420.84990.51550.9436
252660.97220.90220.50290.9463
3113300.99150.94890.50000.9525
1*101330.94990.68850.50850.9264
2*52660.96240.90560.49980.9238
3*113300.99080.93930.50400.9482
CaseISapAUCPbAUCMcAUCYdAUCLe
1101330.96420.84990.51550.9436
252660.97220.90220.50290.9463
3113300.99150.94890.50000.9525
1*101330.94990.68850.50850.9264
2*52660.96240.90560.49980.9238
3*113300.99080.93930.50400.9482
a

Interval space (IS; cM) between nearby markers.

b

AUC values of our proposed model (2).

c

AUC values of the mixed model (3).

d

AUC values of Yuan’s model (5).

e

AUC values of the LMM model (4).

From Table 1, it is clear that our proposed method outperforms the other three methods for all cases, regardless of whether there is a single causal SNP or multiple causal SNPs in each QTL. Yuan’s method fails in all the simulated cases, which is expected as the method only models biallelic SNP effects instead of the founder allelic effects that we simulate. The LMM model outperforms the mixed model, in particular when the marker density is sparse. This phenomenon is further confirmed by Figure 1, where the ROC curves of the proposed method are always higher than the ROC curves of the other three methods; the ROC curves of Yuan’s model fall along the 45-degree line, indicating its low power for mapping genes with founder allelic effects. The ROC curves of the LMM model fall between those of our proposed model and the mixed model in cases 1 and 2, and cases 1* and 2*, but cross with the ROC curves of the mixed model in cases 3 and 3*. Figure 1 and Table 1 show that the mapping precisions of all the methods increase as the number of markers increases, as expected.

Figure 2 presents the Manhattan plots of the four models based on the average estimate of a total of 100 replications across the whole genome for case 1. Clearly, our proposed method is the winner in mapping genes with founder allelic effects (marked by an asterisk) and PoO effects (marked by a circle). The Manhattan plots for cases 2 and 3, and cases 1*, 2*, and 3*, are included in Figures S1–S5 in File S1, respectively, and show similar patterns.

Figure 2

Estimate plot for case 1. X-axis: * indicates QTL location, and o indicates PoO location. Y-axis: (A) max{γQj,γPj} (1 ≤ jp) in our proposed model (2); (B) γQj (1 ≤ jp) in the mixed model (3); (C) σj2 (1 ≤ jp) in Yuan’s model (5); and (D) LOD scores for linear mixed-effects model (LMM) (4).

Discussion

The CC (Threadgill and Churchill 2012), a panel of newly emerged multiparent RI mouse strains, was developed, similar to other MPPs, to provide greater genetic diversity than the traditional RI populations and thereby to improve our power of understanding of complex traits. CC-RIX, F1 crosses that are generated from parental CC RI lines (Zou et al. 2005) can serve as excellent mouse models for mapping genes with both traditional genetic effects and epigenetic effects, such as imprinting. This study extends the model of Gong and Zou (2012) and Yuan et al. (2011) and, to the best of our knowledge, it is the first use of Bayesian variable selection methods to jointly map multiple QTL with both founder allelic effects and PoO effects for MPP-RIX data, in particular, CC-RIX data.

In this article, it is assumed that the RI lines are equally distanced from each other in terms of genetic distance which is a sensible assumption given the funnel design used for generating CC RI lines. However, we do observe that some RI lines share more or fewer founder alleles than expected. Our limited simulation shows that such genetic unbalance has a negligible effect on mapping genes, but this issue deserves further investigation. Alternatively, as genotypes of the parental RI lines are available, we could modify the design matrices xj,zj, and A in model (1) and replace them with local and genome-wide similarity matrices as done in sequence kernel association tests (SKAT) (Ionita-Laza et al. 2013) and genome-wide complex trait analysis (GCTA) (Yang et al. 2011). In addition, in our model we assume additive founder allelic effects. This assumption can be easily relaxed to allow genes with both additive and dominance effects (Woods et al. 2012; Zhang et al. 2014). Because dominant genetic effects are orthogonal to PoO effects, missing the dominant genetic effects does not cause any bias in the PoO effects estimate.

One of the drawbacks of Bayesian methods is their computational cost, especially for data with large numbers of samples and markers. Our model employs a block Gibbs sampler, which dramatically reduces computational time. However, further improvements may be possible. For example, instead of modeling each marker separately, we could jointly model multiple nearby markers and reduce the magnitude of p. Similar ideas have been proposed for genome-wide association study data (Wu et al. 2010; Lu et al. 2015). This approach is also biologically meaningful, since for some complex traits multiple causal SNPs may be located in a single region, and our simulations (cases 1*–3*) have shown that the proposed method is powerful in mapping regions with multiple causal SNPs. However, grouping nearby markers or SNPs may offer further help in improving mapping power, which deserves more thorough investigation.

Acknowledgments

We thank two anonymous referees for their valuable comments on our manuscript. This research was partially supported by NIH grant R01 GM105785, and by the scholarship from China Scholarship Council (CSC) under the Grant CSC No.201506270042.

Footnotes

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.117.300483/-/DC1.

Communicating editor: D. J. de Koning

Literature Cited

Barlow
D P
,
2011
Genomic imprinting: a mammalian epigenetic discovery model.
Annu. Rev. Genet.
45
:
379
403
.

Basten
C J
,
Weir
B S
,
Zeng
Z-B
,
1999
QTL Cartographer: A Reference Manual and Tutorial for QTL Mapping.
Department of Statistics, North Carolina State University
,
Raleigh, NC
.

Broman
K W
,
2001
Review of statistical methods for QTL mapping in experimental crosses.
Lab Anim. (NY)
30
:
44
52
.

Broman
K W
,
Wu
H
,
Sen
Ś
,
Churchill
G A
,
2003
R/qtl: QTL mapping in experimental crosses.
Bioinformatics
19
:
889
890
.

Cavanagh
C
,
Morell
M
,
Mackay
I
,
Powell
W
,
2008
From mutations to MAGIC: resources for gene discovery, validation and delivery in crop plants.
Curr. Opin. Plant Biol.
11
:
215
221
.

Collaborative Cross Consortium
,
2012
The genome architecture of the collaborative cross mouse genetic reference population.
Genetics
190
:
389
401
.

Complex Trait Consortium
,
2004
The collaborative cross, a community resource for the genetic analysis of complex traits.
Nat. Genet.
36
:
1133
1137
.

Crowley
J J
,
Kim
Y
,
Lenarcic
A B
,
Quackenbush
C R
,
Barrick
C J
et al. ,
2014
Genetics of adverse reactions to haloperidol in a mouse diallel: a drug–placebo experiment and Bayesian causal analysis.
Genetics
196
:
321
347
.

de Koning
D-J
,
McIntyre
L M
,
2017
Back to the future: multiparent populations provide the key to unlocking the genetic basis of complex traits.
Genetics
206
:
527
529
.

Fawcett
T
,
2006
An introduction to ROC analysis.
Pattern Recognit. Lett.
27
:
861
874
.

George
E I
,
McCulloch
R E
,
1993
Variable selection via Gibbs sampling.
J. Am. Stat. Assoc.
88
:
881
889
.

Gong
Y
,
Zou
F
,
2012
Varying coefficient models for mapping quantitative trait loci using recombinant inbred intercrosses.
Genetics
190
:
475
486
.

Hallin
J
,
Märtens
K
,
Young
A I
,
Zackrisson
M
,
Salinas
F
et al. ,
2016
Powerful decomposition of complex traits in a diploid model.
Nat. Commun.
7
:
13311
.

Huang
B E
,
Verbyla
K L
,
Verbyla
A P
,
Raghavan
C
,
Singh
V K
et al. ,
2015
MAGIC populations in crops: current status and future prospects.
Theor. Appl. Genet.
128
:
999
1017
.

Ionita-Laza
I
,
Lee
S
,
Makarov
V
,
Buxbaum
J D
,
Lin
X
,
2013
Sequence kernel association tests for the combined effect of rare and common variants.
Am. J. Hum. Genet.
92
:
841
853
.

Iraqi
F A
,
Churchill
G
,
Mott
R
,
2008
The collaborative cross, developing a resource for mammalian systems genetics: a status report of the Wellcome trust cohort.
Mamm. Genome
19
:
379
381
.

Jansen
R C
,
Stam
P
,
1994
High resolution of quantitative traits into multiple loci via interval mapping.
Genetics
136
:
1447
1455
.

Kao
C-H
,
Zeng
Z-B
,
1997
General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm.
Biometrics
53
:
653
665
.

Kao
C-H
,
Zeng
Z-B
,
Teasdale
R D
,
1999
Multiple interval mapping for quantitative trait loci.
Genetics
152
:
1203
1216
.

King
E G
,
Merkes
C M
,
McNeil
C L
,
Hoofer
S R
,
Sen
S
et al. ,
2012
Genetic dissection of a model complex trait using the Drosophila synthetic population resource.
Genome Res.
22
:
1558
1566
.

Kover
P X
,
Valdar
W
,
Trakalo
J
,
Scarcelli
N
,
Ehrenreich
I M
et al. ,
2009
A multiparent advanced generation inter-cross to fine-map quantitative traits in Arabidopsis thaliana.
PLoS Genet.
5
:
e1000551
.

Kuo
L
,
Mallick
B
,
1998
Variable selection for regression models.
Sankhya Ser. B
60
:
65
81
.

Ladejobi
O
,
Elderfield
J
,
Gardner
K A
,
Gaynor
R C
,
Hickey
J
et al. ,
2016
Maximizing the potential of multi-parental crop populations.
Appl. Transl. Genomics
11
:
9
17
.

Lander
E S
,
Botstein
D
,
1989
Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps.
Genetics
121
:
185
199
.

Lawson
H A
,
Cheverud
J M
,
Wolf
J B
,
2013
Genomic imprinting and parent-of-origin effects on complex traits.
Nat. Rev. Genet.
14
:
609
617
.

Lenarcic
A B
,
Svenson
K L
,
Churchill
G A
,
Valdar
W
,
2012
A general Bayesian approach to analyzing diallel crosses of inbred strains.
Genetics
190
:
413
435
.

Lu
Z
,
Zhu
H
,
Knickmeyer
R C
,
Sullivan
P F
,
Stephanie
W N
et al. ,
2015
Multiple SNP-sets analysis for genome-wide association studies through Bayesian latent variable selection.
Genet. Epidemiol.
39
:
664
677
.

Manly
K F
,
Olson
J M
,
1999
Overview of QTL mapping software and introduction to map manager QT.
Mamm. Genome
10
:
327
334
.

Mitchell
T J
,
Beauchamp
J J
,
1988
Bayesian variable selection in linear regression.
J. Am. Stat. Assoc.
83
:
1023
1032
.

Mott
R
,
Yuan
W
,
Kaisaki
P
,
Gan
X
,
Cleak
J
et al. ,
2014
The architecture of parent-of-origin effects in mice.
Cell
156
:
332
342
.

Rasmussen
A L
,
Okumura
A
,
Ferris
M T
,
Green
R
,
Feldmann
F
et al. ,
2014
Host genetic diversity enables Ebola hemorrhagic fever pathogenesis and resistance.
Science
346
:
987
991
.

ter Braak
C J F
,
Boer
M P
,
Bink
M C A M
,
2005
Extending Xu’s Bayesian model for estimating polygenic effects using markers of the entire genome.
Genetics
170
:
1435
1438
.

Threadgill
D W
,
Churchill
G A
,
2012
Ten years of the collaborative cross.
Genetics
190
:
291
294
.

Threadgill
D W
,
Miller
D R
,
Churchill
G A
,
de Villena
F P-M
,
2011
The collaborative cross: a recombinant inbred mouse population for the systems genetic era.
ILAR J.
52
:
24
31
.

Tsaih
S-W
,
Lu
L
,
Airey
D C
,
Williams
R W
,
Churchill
G A
,
2005
Quantitative trait mapping in a diallel cross of recombinant inbred lines.
Mamm. Genome
16
:
344
355
.

Vered
K
,
Durrant
C
,
Mott
R
,
Iraqi
F A
,
2014
Susceptibility to Klebsiella pneumonaie infection in collaborative cross mice is a complex trait controlled by at least three loci acting at different time points.
BMC Genomics
15
:
865
.

Wei
J
,
Xu
S
,
2016
A random-model approach to QTL mapping in multiparent advanced generation intercross (MAGIC) populations.
Genetics
202
:
471
486
.

Woods
L C S
,
Holl
K L
,
Oreper
D
,
Xie
Y
,
Tsaih
S-W
et al. ,
2012
Fine-mapping diabetes-related traits, including insulin resistance, in heterogeneous stock rats.
Physiol. Genomics
44
:
1013
1026
.

Wu
M C
,
Kraft
P
,
Epstein
M P
,
Taylor
D M
,
Chanock
S J
et al. ,
2010
Powerful SNP-set analysis for case-control genome-wide association studies.
Am. J. Hum. Genet.
86
:
929
942
.

Yang
J
,
Lee
S H
,
Goddard
M E
,
Visscher
P M
,
2011
GCTA: a tool for genome-wide complex trait analysis.
Am. J. Hum. Genet.
88
:
76
82
.

Yu
J
,
Holland
J B
,
McMullen
M D
,
Buckler
E S
,
2008
Genetic design and statistical power of nested association mapping in maize.
Genetics
178
:
539
551
.

Yuan
Z
,
Zou
F
,
Liu
Y
,
2011
Bayesian multiple quantitative trait loci mapping for recombinant inbred intercrosses.
Genetics
188
:
189
195
.

Zeng
Z-B
,
1994
Precision mapping of quantitative trait loci.
Genetics
136
:
1457
1468
.

Zhang
Z
,
Wang
W
,
Valdar
W
,
2014
Bayesian modeling of haplotype effects in multiparent populations.
Genetics
198
:
139
156
.

Zou
F
,
Gelfond
J A L
,
Airey
D C
,
Lu
L
,
Manly
K F
et al. ,
2005
Quantitative trait locus analysis using recombinant inbred intercrosses: theoretical and empirical considerations.
Genetics
170
:
1299
1311
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)