Abstract
Genomic selection (GS) offers the possibility to estimate the effects of genome-wide molecular markers, which can be used to calculate genomic estimated breeding values (GEBVs) for individuals without phenotypes. GEBVs can serve as a selection criterion in recurrent GS, maximizing single-cycle but not necessarily long-term genetic gain. As simple genome-wide sums, GEBVs do not take into account other genomic information, such as the map positions of loci and linkage phases of alleles. Therefore, we herein propose a novel selection criterion called expected maximum haploid breeding value (EMBV). EMBV predicts the expected performance of the best among a limited number of gametes that a candidate contributes to the next generation, if selected. We used simulations to examine the performance of EMBV in comparison with GEBV as well as the recently proposed criterion optimal haploid value (OHV) and weighted GS. We considered different population sizes, numbers of selected candidates, chromosome numbers and levels of dominant gene action. Criterion EMBV outperformed GEBV after about 5 selection cycles, achieved higher long-term genetic gain and maintained higher diversity in the population. The other selection criteria showed the potential to surpass both GEBV and EMBV in advanced cycles of the breeding program, but yielded substantially lower genetic gain in early to intermediate cycles, which makes them unattractive for practical breeding. Moreover, they were largely inferior in scenarios with dominant gene action. Overall, EMBV shows high potential to be a promising alternative selection criterion to GEBV for recurrent genomic selection.
- genetic gain
- doubled haploid
- optimal haploid value
- expected maximum haploid breeding value
- GenPred
- Shared Data Resources
- Genomic Selection
The identification, selection and propagation of superior individuals builds the foundation of all breeding efforts. The breeding potential of a candidate is classically determined by its breeding value (BV), the sum of all additive effects at quantitative trait loci (QTL) affecting a complex trait (Lynch and Walsh 1998). While BVs have been estimated in progeny tests, Meuwissen et al. (2001) proposed genomic selection (GS) to predict BVs prior to phenotypic evaluation. The principle is to use genome-wide marker data and phenotypes of training individuals to calculate locus-specific allele substitution effects. Genomic estimated breeding values (GEBVs) are then calculated as predictors of BVs. Selecting individuals ranked according to their BVs maximizes the population mean of the next cycle when they are recombined, but a repeated application of this selection strategy does not necessarily maximize long-term genetic gain over several generations (Wray and Goddard 1994; Liu et al. 2015). GEBVs, as predictors of BVs, are subject to the same constraints. This suboptimal behavior can be explained by the fact that GEBVs are simple genome-wide sums of estimates of allele substitution effects, which can conceal the contribution of favorable alleles with small effects. The later are less relevant for short-term gain and can be easily lost, especially if their frequency is low, but can play an important role for long-term gain by maintaining useful genetic variance (Jannink 2010; Liu et al. 2015).
To prevent the loss of rare favorable alleles, Goddard (2009) proposed a modified GEBV that weights estimated allele substitution effects using the frequencies of favorable alleles, such that rare alleles receive a higher weight. This criterion does not take into account the magnitude of the effects based on the premise that, for optimal long-term genetic gain, all favorable alleles should ultimately be fixed. Later, Jannink (2010) suggested a modification called weighted GS, herein referred to as weighted GEBV (wGEBV). This considers also the magnitude of the effects, because especially for QTL with small effects determining which allele is the favorable one is problematic. wGEBV proved to be superior to GEBVs in terms of long-term genetic gain (19 cycles) in spring barley (Hordeum vulgare L., Jannink 2010).
Differential weighting of substitution effects does not take into account other important information often available at no extra cost, such as genetic map positions of loci and linkage phases of alleles at different loci. New selection criteria utilizing this information could be defined based on the prospects that candidates produce superior gametes with favorable combinations of haplotypes. While the average performance of progeny of such candidates might be inferior to that of individuals selected on GEBVs alone, the top-performing individuals in the progeny are expected to be superior, which boosts the genetic gain achievable in future generations.
In this vein, Daetwyler et al. (2015) recently proposed a criterion called optimal haploid value (OHV), which aims at predicting the theoretically optimal combination of haplotypes in a gamete produced from a heterozygous candidate. The criterion was tested in simulations of a bread wheat breeding program and showed increased genetic gain compared to selection on GEBVs. In their study, genetic progress was measured as the performance of the best doubled haploid (DH) line (generated by chromosome doubling of a gamete) produced from selected individuals. By definition, OHV does not take into account the finite size of the breeding population; hence, it merely considers the possibility of a superior gamete, disregarding its probability (Han et al. 2017). Moreover, OHV requires the genome to be partitioned into haplotypes, and it is yet an unsolved problem how this should be optimally accomplished.
In view of these limitations, we herein propose a novel selection criterion called expected maximum haploid breeding value (EMBV). It characterizes the breeding potential of a candidate in terms of the performance of the top gametes it is able to produce. If a candidate is selected for recombination, it will contribute a certain number of gametes to the next generation. This number can be directly ascertained under controlled matings or easily estimated under random mating conditions. The EMBV is then defined as the expected GEBV of the best among all DH lines derived from these gametes. Hence, EMBV takes the finite population size into account and it is not necessary to partition the genome into haplotypes.
The objectives of our study were (i) to evaluate in silico the potential of EMBV as an alternative selection criterion in a generic recurrent selection (RS) program and (ii) compare it to the criteria OHV, wGEBV and GEBV with respect to genetic gain and genetic diversity across 50 selection cycles. The performance of OHV was assessed under optimal conditions with respect to the partitioning of the genome into haplotypes. In order to evaluate the effect of gene action on the comparison of the selection criteria, we compared purely additive gene action with completely dominant gene action at all loci. Furthermore, we considered the effect of population size, the number of selected individuals and the number of chromosomes on the relative performance of the different selection criteria.
Material and Methods
Genetic model
We considered a quantitative trait with additive and dominant gene action at all L loci. Each locus was bi-allelic with alleles and
and possible unordered genotypes
and
We assumed that the locations of loci and of alleles on homologous chromosomes are known (phased genotypic data). For each locus of a diploid, heterozygous individual i, let
be an indicator variable indicating absence or presence of the
allele at the l-th locus on the j-th haploid genome (
referring to the maternal and paternal genome). Then
is a genotypic score counting the number of
alleles. Following the genetic model of Lynch and Walsh (1998), we assumed that the genotypic value of an individual i with genotype
at locus l is given by
(1)where
is the homozygous effect (half the difference between the two homozygous genotypes) and
the dominance coefficient (deviation of the heterozygote from the mean of the homozygotes in units of
). Effects
were independently drawn from a gamma distribution
following Meuwissen et al. (2001), such that
was implicitly defined as the favorable allele. We assumed two extreme scenarios where gene action at all QTL was either purely additive,
or completely dominant/recessive, where
was either 1 or
with equal probability. Dominance coefficients
were assumed to be stochastically independent of additive effects, following Zeng et al. (2013). The genome-wide genotypic value of an individual i was computed as
The average effect of an allelic substitution
at a single locus l was computed as
where
is the frequencies of allele
at the respective locus. The BV of individual i was computed as
(Vitezica et al. 2013). Following Daetwyler et al. (2015), we assumed that QTL genotypes and effects are known without error, i.e., marker loci are identical to QTL and their associated allele substitution effects are identical to the simulated substitution effects for QTL. This was done in order to assess the performance of the investigated selection criteria under optimal conditions. Consequences in the practical case where the trait genetic architecture is unknown and (marker) allele substitution effects can be only estimated with some degree of precision are addressed in the discussion.
Selection criteria
GEBV:
The GEBV of individual i is canonically computed as(2)which is the genome-wide sum of all substitution effects for the respective alleles.
EMBV:
The EMBV measures the breeding potential of a candidate in terms of the expected GEBV of the best out of DH lines produced by it (visualized in Figure 1), where
denotes the number of gametes the candidate is expected to contribute to the next generation, if it is selected. If
denotes the GEBV of a random DH line produced by candidate i, the EMBV is formally defined as
where
is the largest order statistic (maximum) of a random sample of size
An alternative formulation of EMBV using a normal approximation for the distribution of GEBVs of DH lines produced by i is provided in File S2 and discussed below.
Illustration of the computation of EMBV for a heterozygous selection candidate. A (conceptually) infinite population of gametes is generated in silico from the candidate by simulating meiosis events. The corresponding doubled haploid (DH) lines are evaluated for their GEBVs, yielding a distribution of GEBVs (blue curve). The candidate’s GEBV corresponds to the mean GEBV of the DH lines. The EMBV is defined as the expected value of the maximum GEBV of a random sample of DH lines of size where
is the expected number of gametes the candidate will contribute to the next generation.
OHV:
For the computation of OHV, the entire set of loci ordered along the genome, is partitioned into N disjoint non-empty subsets
(corresponding to haplotypes), such that
and
for all
and all
According to Daetwyler et al. (2015), the OHV of a selection candidate i is computed as
(3)i.e., for each haplotype, the maximum breeding value over all haploid genomes is determined and twice the sum of these values is taken as OHV.
wGEBV:
In selection criterion wGEBV marker effects are weighted by a coefficient that depends on the frequency of the favorable allele
The associated locus weights were computed according to Goddard (2009) as
and wGEBVs were calculated with the modification proposed by Jannink (2010) as
(4)For all criteria, allele frequency
was freshly computed as the sample frequencies of allele
in each cycle of the breeding program; accordingly, allele substitution effects
(and locus weights
for wGEBV) varied between cycles. It is important to note that the EMBV and OHV of a completely homozygous individual are identical to its GEBV; hence, these selection criteria only differ for heterozygous genotypes. The computation of GEBV and wGEBV only requires genome-wide co-dominant bi-allelic markers with effect estimates, whereas both EMBV and OHV additionally require a genetic map and phased marker genotypes of the candidates. Criterion EMBV further requires software for simulating meiosis events (e.g., Müller and Broman 2017).
Simulation of the base population and genome structure
We considered a diploid species with a constant genome length of cM. The genome was subdivided into
segregating chromosomes with equal length (i.e., 400, 100 and 50 cM, respectively). Bi-allelic QTL were uniformly distributed along the genome with a density of 2 QTL per cM, corresponding to a total of
QTL. The simulation of the base population was conducted as in Müller et al. (2017). Briefly, a historical population of
diploid individuals was subject to random mating for
generations. A population bottleneck was simulated by arbitrarily selecting 40 individuals that were further randomly mated for 15 generations to build up extensive linkage disequilibrium, as often observed in elite germplasm in plant breeding (e.g., Van Inghelandt et al. 2011). The population was then expanded to 5,000 individuals and randomly mated for three more generations to remove close family relationships and establish the base population. Finally, all monomorphic loci were removed from the genotypic data. The base population was simulated only once for each value of
The distribution of allele frequencies (data not shown) and linkage disequilibrium (Figure S1 in File S3) in terms of
(Hill and Robertson 1968) was similar for different
Breeding program
From the base population, individuals were randomly sampled without replacement and constituted the candidates in cycle
We considered four distinct breeding programs, starting from the same set of individuals in
that only differed in the selection criterion, namely the use of GEBV, wGEBV, OHV or EMBV to select
candidates for establishing the next generation. For criterion EMBV, the number of gametes
contributed, on average, by one selected individual to the next generation of
new candidates was estimated as
rounded to the nearest integer. In a given cycle
all candidates were evaluated and ranked for the applied selection criterion and the best
candidates were selected. For creating cycle
the selected individuals were randomly mated, i.e., both parents of each future individual were randomly drawn with replacement from the selected candidates, allowing for self-fertilization (father = mother). One gamete per parent was produced and both gametes united to form the new progeny. In cycle
the population mean (average of all genotypic values) and the standard deviation of BVs (
) were calculated. In each later cycle
all individuals were genotyped and the difference between the population mean in
and
was computed. This difference was then scaled by
and the result was recorded as the genetic gain (R), analogous to Jannink (2010). Hence, R is measured as the progress of the population mean in units of
relative to
Note that
can vary between the different scenarios and among samples of founder individuals from the base population within scenarios. Scaling by
aims to correct for this difference in the initially available additive variance, but does not affect comparisons between the four selection criteria. In each selection cycle t, genetic diversity was calculated as the variance of the BVs of all candidates (
), divided by
The breeding program was continued for a total of 50 selection cycles. The factors investigated in our simulations (Table 1) were (i) the number of candidates in each cycle,
(ii) the number of selected individuals as parents for the next generation,
(iii) the number of chromosomes,
and (iv) the level of dominance,
(no dominance) or
(complete dominance). The breeding program was replicated at least 600 times for each scenario, starting with sampling the initial candidates in
from the base population and the simulation of homozygous effects and sampling of the signs of dominance coefficients. The homozygous effects were always scaled to achieve unit additive genetic variance in the base population. Summary statistics are generally reported as arithmetic means across all replicates.
Computation of OHV and estimation of EMBV
The estimation of OHVs requires the specification of haplotypes. The most straightforward way, which we pursued, is to agree on a number of equidistant breakpoints that partition each chromosome into
haplotypes of equal length (Daetwyler et al. 2015). We explored different values for
starting from 1 (i.e., entire chromosomes) and following the geometric sequence
as long as the haplotypes had a length
cM. An overview of
and segment lengths for different
is shown in Table S1-1 in File S1 and results for R obtained with criterion OHV are described in File S1. In the following, we only show those results for criterion OHV where
was found to yield maximum R after 50 cycles of selection.
While GEBVs, OHVs and wGEBVs can be directly computed from genotypic data and allele substitution effects, the estimation of EMBVs is computationally demanding, because an overall large number of DHs has to be generated per individual. We estimated EMBVs by repeatedly producing gametes, determining the maximum GEBV among them as described above, and taking the arithmetic mean of the maxima over all replicates. The number of replicates was dynamically adapted such that the empirical standard error was smaller than 0.01 (but at least 10 replicates were taken). This strategy was chosen to balance estimation accuracy and computation time, but in practical applications, computation time is not a bottleneck. We developed a C++ routine for the fast estimation of EMBVs, which is publicly available via a wrapper R package embvr (Müller 2017). A possible alternative approach for rapid analytical computation of EMBVs is described in File S2.
Data availability
Datasets and source code used in our simulations are publicly available from https://doi.org/10.5281/zenodo.1161723. File supplental_figures contains supplementary figures. File supplement_1 contains results on the optimal number of haplotypes for selection criterion OHV. File supplement_2 presents an approximation of EMBV using the normal distribution.
Results
The genetic gain R generally approached a plateau (selection limit) for all selection criteria as the breeding program proceeded (Figure 2a). During selection, an increasing number of causal polymorphisms became fixed, such that in late stages of the breeding program, individuals were nearly homozygous and the genetic variance was depleted (Figure 2c). An exception was selection criterion wGEBV, where still considerable genetic progress was achieved after 50 cycles of selection. The rate of genetic progress and the selection limit depended on the selection pressure via the number of selected individuals If only a single candidate was selected (
), which corresponds to recurrent selfing, genetic progress was initially very fast, but R quickly reached a low selection limit after about 10 cycles. Conversely, under mild selection pressure with
genetic progress was slow at the beginning, but endured over the entire breeding program and R generally did not fully reach the selection limit, even after 50 cycles.
(A) Genetic gain (R), (B) relative genetic gain and (C) additive variance () for selection criteria genomic-estimated breeding value (GEBV), expected maximum haploid breeding value (EMBV), optimal haploid value (OHV) and weighted GEBV (wGEBV) under recurrent selection. Results refer to
and
number of chromosomes;
number of selection candidates;
number of selected individuals.
Genetic gain
Additive gene action:
Selection criterion EMBV was, in advanced selection cycles, clearly superior to GEBV in terms of genetic gain (Figure 2a), but minimally weaker in the first cycles (until about cycle 5). After this point, surpassed and strictly increased relative to
during selection. After 50 cycles,
reached a genetic gain of
(
),
(
) and
(
) larger than
With selection criterion OHV,
increased at a lower rate than
in early cycles. However,
generally caught up to
and eventually surpassed it. The larger
the more cycles it took for
to surpass
(9 cycles for
compared to 38 cycles for
). After 50 cycles,
was
higher than
for
but
and
higher for
and 10, respectively, exceeding the performance of EMBV. Criterion wGEBV showed a unique behavior. In general,
increased slower than
in the first few cycles, similar to OHV, and plateaued for
and 3 at a level
and
respectively, below
However, for
although
also initially slowly increased, it surpassed
after 25 cycles and eventually reached a value
larger than
after 50 cycles, also surmounting all other criteria.
Dominant gene action:
If gene action at all loci was completely dominant, both the overall level of R (Figure 2a), as well as the advantage of the alternative selection criteria over GEBV (Figure 2b) were reduced, but the extent depended on the criterion. While EMBV appeared to be robust to dominant gene action for different values of
and
were severely reduced for
reaching only
(
) and
(
) more than
after 50 cycles.
Number of candidates and chromosomes:
Reducing the number of selection candidates from 50 (standard scenario) to 30 lead to a reduction in the overall level of R for all selection criteria (Figure 3). The larger population size with
caused a slightly higher allelic diversity in
calculated as the average number of alleles per QTL, of 1.97 compared to 1.94 for
This increases the probability that rare favorable alleles in the base population are also present in the breeding population, and hence benefits long-term genetic gain. The ranking between different selection criteria for
was similar to
Comparing OHV with EMBV,
tended to decrease relative to
when
was lowered from 50 to 30 individuals.
Genetic gain (R) in cycle for selection criteria genomic-estimated breeding value (GEBV), expected maximum haploid breeding value (EMBV), optimal haploid value (OHV) and weighted GEBV (wGEBV) under recurrent selection with purely additive gene action. Boxes and whiskers indicate standard errors and standard deviations across replicates, respectively.
number of chromosomes;
number of selection candidates;
number of selected individuals.
Larger slightly elevated the overall level of R for all selection criteria (Figure S2 in File S3). With a constant genome size of
cM assumed in our study, increasing
increased the overall number of recombinations between loci, which benefited long-term genetic gain. The relative differences in R gain between the selection criteria was hardly influenced by
However, it must be taken into account that for OHV, we considered only the optimal number of haplotypes
For instance, choosing
per chromosome yielded optimal R only for
but not for
(Figure S1-3 in File S1).
Genetic diversity
The criteria EMBV and OHV generally showed the ability to maintain higher genetic diversity in terms of in the population than GEBV, while criterion wGEBV only showed larger
than GEBV for
(Figure 2c). The rate of decline of
became more pronounced when
was reduced from 10 to 1. Across all cycles,
was always larger for criterion OHV compared to GEBV. After 50 cycles,
was entirely depleted with EMBV and GEBV, but not with OHV and wGEBV for
Here,
(OHV) and
(wGEBV) of
was left. For
wGEBV showed a higher
of
in cycle 50 that for
(Figure S6 in File S3). Remnant
explains why the selection limit was not fully reached in the case of OHV and wGEBV (Figure 2a). This indicates that the final genetic gain of OHV and wGEBV would have been higher if selection was continued for more than 50 cycles. Generally,
and
had only small effects on
for the different selection criteria (Figure S6 in File S3). Trends were similar under completely dominant gene action (Figure 2c, Figure S7 in File S3).
Discussion
Genomic selection allows for predicting GEBVs of unphenotyped individuals and has been proposed for RS to increase genetic gain per unit time (Windhausen et al. 2012; Gorjanc et al. 2016). A first empirical study on GS in a multi-parental population produced from 18 tropical maize lines showed promising results, reporting 2% genetic gain in grain yield per year (Zhang et al. 2017). However, selection on GEBVs is expected to maximize single-cycle genetic gain, but not genetic gain over several cycles. In this study, we propose a novel selection criterion called expected maximum haploid breeding value (EMBV) as an alternative to the use of GEBVs for RS. EMBV takes into account information about genetic map positions of loci, linkage phases between alleles and the population size to improved long-term genetic gain. We used extensive computer simulations to compare EMBV to two other alternative selection criteria, wGEBV and OHV (Goddard 2009; Jannink 2010; Daetwyler et al. 2015) in a generic RS program.
RS was pioneered in maize (Zea mays L.) breeding (Jenkins 1940; Hull 1945; Comstock et al. 1949) and two basic types of selection strategies have been developed, intra- and inter-population improvement, where the latter is also called reciprocal RS. RS had only a limited but yet significant impact on the development of improved inbred lines in commercial hybrid breeding. Most notably, the Iowa Stiff Stalk Synthetic produced many successful inbred lines and its traces are present in a large proportion of today’s elite germplasm (Mikel and Dudley 2006; Hallauer and Carena 2012). Because of the historically limited success of RS, Hallauer and Carena (2012) recommend to tightly integrate the development of elite inbreed lines with germplasm enhancement programs driven by RS. This is particularly facilitated by the DH technology, which allows for rapid development of fully homozygous lines ready for testcross evaluation. While RS (either intra- or inter-population) can be used to steadily improve the germplasm, DH lines can be simultaneously created and tested as spin-offs from top parents. We expect EMBV to be also highly suitable for the selection of such DH parents, because by its very definition, it enables the identification of parents that most likely produce top performing DH lines. Genetic progress is then not measured in terms of population mean performance, but in terms of the performance of the best DH that can be achieved for line development, similar to Daetwyler et al. (2015). If EMBV is deployed for both RS and spin-off DH production, the parents used for DH line development do not need to be recruited from the individuals selected for intercrossing, but can constitute a separate set. This is because the ranking of the candidates in both applications will likely differ due to (i) differences in and (ii) differences in allele substitution effect estimates, which occur if different testers are used and gene action is not purely additive. For intra-population RS, the tester is the (current) population (e.g., evaluation of half-sibs), whereas for inter-population RS, the tester stems from the opposite heterotic group. In both cases, the selection of DH parents requires substitution effects being estimated from testcrosses. EMBV might also be successfully applied independently of RS in advanced hybrid breeding programs, where new lines are commonly developed from bi-parental crosses between recycled elite lines. However, these extensions require further investigation.
EMBV
The EMBV is an independent property of each selection candidate and is derived from the distribution of their virtual DH progeny. By this approach, the ultimate goal of using EMBVs is not to maximize genetic gain in the subsequent generation, but to improve gain in later stages of the breeding program. This is underlined by our result that selection on EMBVs needed around 5 cycles to outperform GEBV (Figures S4 and S8 in File S3), even though the initial penalty of using EMBVs was minimal. By selection on EMBVs, only individuals that are expected to produce the best gametes in the next generation are advanced. If such top gametes eventually unite, a superior individual is created, which, if selected for further breeding, can increase the population mean of future selection cycles. Due to the linearity of expectations, the EMBV can also be expressed as(5)where
is the GEBV of candidate i,
is the standard deviation of the GEBVs of the DH lines derived from i, and
the expected value of the largest order statistic of
random variables from
assuming the GEBVs of the virtual DH progeny are normally distributed. This is described in greater detail in File S2. When expressed in this way, the EMBV can be immediately interpreted as a compromise between the candidate’s GEBV (current breeding potential) and its segregation variance (indicative of future breeding potential). Increasing the number of contributed gametes
increases
(Figure S2-1 in File S2and hence the importance of
Hence, the ranking of candidates can vary, depending on
(see Figure S2-3 in File S2 for an example). Candidates with intermediate GEBVs showed larger variation in
compared to candidates with low or high GEBVs. Therefore, selection on EMBV often times chooses candidates with suboptimal GEBV, but in return larger
OHV
Application of criterion OHV requires the definition of haplotypes, from which the optimal combination of haplotype values is calculated. OHV conceptually fits into the framework of EMBV in that for
given complete linkage among loci within a haplotype but free recombination between haplotypes. The need for an explicit specification of haplotypes could be considered as a disadvantage of OHV. Our results demonstrate that OHV has a large potential to boost long-term genetic gain. However, these results might be overly optimistic, because we only used optimal values of
As Daetwyler et al. (2015) pointed out, decreasing
(increasing haplotype lengths) shifts the breeding goal of maximizing genetic gain into the future, which is underlined by our results (Figure S1-1 in File S1). The reason is that a gamete exhibiting the OHV (or being at least close to it) can only be produced through the accumulation of favorable recombination events close to the haplotype borders. By definition, OHV only considers the possibility of the optimal gamete combining only the best haplotypes, not taking into account its probability of occurrence (Han et al. 2017). If
is chosen such that genetic gain is maximized at an earlier stage of the breeding program, gain in cycle 50 is compromised (results not shown). As a consequence, OHV needs to be tuned according to the length of the breeding program. We observed substantial losses of genetic gain for OHV relative to GEBV in early selection cycles, in accordance with a simulation study by Goiffon et al. (2017). This was not found by Daetwyler et al. (2015), likely because they evaluated genetic progress in terms of the performance of only the best DH produced from all selected individuals. Hence, if a (nearly) optimal gamete is eventually produced, it will directly and exclusively enter into the measurement of genetic gain. Conversely, we measured genetic progress as the average genotypic value of the entire breeding population.
wGEBV
Criterion wGEBV was unique because it performed poorly for small but clearly outperformed all other criteria for
in terms of long-term genetic gain. In the latter case, remnant
suggests that if selection was continued further, the difference would have been even larger. We suspect that wGEBV is not competitive for small
because of strong genetic drift. This will rapidly result in a loss of many highly favorable but low-frequency alleles from the population. Only a very limited number of recombination events occur before individuals ultimately become homozygous; hence, there is not enough opportunity for combinations of favorable alleles to appear. Below, we explain why we expect that the superiority of wGEBV for
is likely overestimated.
Recently, Liu et al. (2015) proposed a further modification of the original approach to wGEBV. In their study, the effect weights are not only determined by the favorable allele frequency and change due to shifts in the latter, but also by a parameter regulating the initial weight at the beginning of selection and by the number of remaining generations until the end of the breeding program (“time horizon”). The closer the breeding program comes to its end, the lower the weight on effects with low favorable allele frequencies. They showed that their modified approach can improve on wGEBV in terms of long-term genetic gain. However, similar to wGEBV in our study, their method showed a clear performance penalty during the first cycles.
Genetic diversity
The genetic diversity maintained in the breeding population was substantially higher for EMBV than GEBV. Selection based on GEBVs puts rare favorable alleles at a high risk of becoming lost. This is because such alleles will occur only in a small number of candidates. If they coincide with many unfavorable alleles, their positive effect is concealed. In other words, if rare favorable alleles are only present in candidates with an otherwise low GEBV, they will likely be lost. On the other hand, criterion EMBV allows rare favorable alleles to recombine and be joined with other favorable alleles into a high-performing gamete, reducing negative selection pressure on them. Moreover, the interpretation of the EMBV as a compromise between the GEBV and the segregation variance makes it evident that EMBV positively weights and maintains diversity. Criterion OHV maintained the highest diversity. Because it is computed as the sum of only the favorable haplotypes, it allows rare favorable alleles, similar to EMBV, to be separated from unfavorable alleles on other haplotypes and joined with favorable ones. Similar to OHV, criterion wGEBV was able to maintain relatively high genetic diversity, but only for This is because the differential weighting leads to a strong selection of rare favorable alleles (Jannink 2010). This effect was canceled by genetic drift if
was small.
Effect of dominant gene action
In inter-population RS, breeders usually apply the same or closely related testers from the opposite heteroric group for several selection cycles. In this case, the genetic model for testcross performance behaves (in the absence of epistasis) like a model with only additive gene action (Melchinger et al. 1998), so that our simulations assuming additive gene action closely reflect this situation. However, for intra-population RS, the current cycle usually serves as a tester, and therefore allele frequencies are variable. Thus, in the presence of dominant gene action () the allele substitution effects will change with changes in the allele frequencies across selection cycles. Moreover, in reality dominant gene action appears to be the rule rather than the exception (Crnokrak and Roff 1995; Hill et al. 2008). For these reasons, we investigated the extreme case of completely dominant gene action at all loci to assess the potential impact on the comparison of the selection criteria. Our results showed that EMBV and, to a lesser extent OHV, are robust with respect to dominance. On the other hand, the performance of wGEBV was severely affected under complete dominance. An explanation for the good performance of EMBV and OHV could be that these criteria are based on the assessment of homozygous individuals (DH lines), which removes the masking effect on recessive alleles present in heterozygotes, which affects criteria GEBV and wGEBV. Moreover, wGEBV was specifically proposed as a criterion for long-term population improvement, so it is implicitly assumed that substitution effects have a long-lasting significance, which does not hold under dominant gene action if allele frequencies change.
Further research
Estimation of allele substitution effects:
In our study, we assumed that both the loci as well as the effect sizes of QTL are perfectly known. In practice, QTL are generally unknown and markers are used as proxies and their allele substitution effects have to be estimated from a training set, using one of several available analytical methods (cf. de los Campos et al. 2013). However, a high degree of colinearity among markers, especially in high density marker panels, entails that the effect of a QTL is distributed among surrounding markers in a complex manner, reducing the statistical power to accurately estimate their effects (Liu et al. 2015; Chang et al. 2018) Obviously, the accuracy of marker effects is further impaired for traits with low heritability or if insufficient phenotypic data are available.
We expect that estimation of allele substitution effects of markers will differently affect the selection criteria. In wGEBV, individual effects are weighted by the frequency of the favorable allele. However, the more inaccurate marker effects are estimated, the lower the significance of their effects and the higher the chance that the wrong allele is considered being favorable, potentially causing selection in the wrong direction. We therefore surmise that the potential of wGEBV is largely overestimated by the assumption of known substitution effects. Conversely, the criteria OHV and EMBV do not rely on individual marker effects, but consider entire haplotypes. This is done explicitly in OHV and implicitly in EMBV, because the virtual DH progeny of a selection candidate reflect colinearity among markers due to cosegregation. Hence, we expect that these criteria are less susceptible to inaccuracies in effect estimates, but this warrants further research.
Furthermore, as the predictive information of effect estimates erodes over multiple selection cycles due to changes in linkage disequilibrium between QTL and markers (e.g., Muir 2007; Müller et al. 2017), periodic re-training of the prediction model is required, for instance every third cycle (Jannink 2010). Substitution effects should also be recalculated in every generation based on the allele frequencies of the respective tester, which is especially relevant if only a small fraction of the candidates is advanced.
Phasing:
The application of selection criteria EMBV and OHV requires the availability of phased genotypic data. If selection candidates are F1 crosses from homozygous inbred lines, all linkage phases are known. Otherwise, genotypes have to be phased before the candidates can be evaluated. In the past years, numerous software tools have been developed to achieve this task, e.g., PHASE (Stephens et al. 2001), its successor fastPHASE (Scheet and Stephens 2006) or BEAGLE (Browning and Browning 2007). However, as phasing is still associated with a certain error rate, additional investigations are required to assess the influence of phasing error on the performance of EMBV and OHV. While selection criteria GEBV and wGEBV will stay unaffected by phasing errors, we expect that EMBV and OHV could both show a slightly reduced performance.
Conclusions
We showed in a proof-of-concept that our novel selection criterion EMBV has the potential to yield higher long-term genetic gain as compared to using GEBVs while not jeopardizing short-term gain. Although criterion OHV performed well in the long run, it was not competitive with GEBV in early cycles, which makes it unattractive for practical breeding programs. Criterion wGEBV also showed promising long-term results for quantitative traits with purely additive gene action, but was also accompanied by a performance penalty in early cycles and was, moreover, sensitive to deviation from additive gene action. EMBV could also be a promising approach for the selection of parents for producing DH lines in hybrid breeding programs, which is a subject of future research.
Acknowledgments
We cordially thank Pedro Correa Brauner, Willem Molenaar, Tobias Schrag, and Matthias Westhues for reviewing the manuscript and providing valuable suggestions for its improvement. We declare no conflict of interest associated with this study. We declare that ethical standards were met, and all the experiments comply with the current laws of the country in which they were performed.
DM developed selection criterion EMBV, conceptualized the simulation study, conducted the simulations, analyzed the data and wrote the manuscript. PS supported in conceptualizing the simulation study and reviewed the manuscript. AEM reviewed the manuscript. All authors read and approved the final version of the manuscript.
Footnotes
Communicating editor: D. J. de Koning
Supplemental Material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.118.200091/-/DC1
- Received July 7, 2017.
- Accepted January 31, 2018.
- Copyright © 2018 Muller et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.