A Genomic Selection Index Applied to Simulated and Real Data

A genomic selection index (GSI) is a linear combination of genomic estimated breeding values that uses genomic markers to predict the net genetic merit and select parents from a nonphenotyped testing population. Some authors have proposed a GSI; however, they have not used simulated or real data to validate the GSI theory and have not explained how to estimate the GSI selection response and the GSI expected genetic gain per selection cycle for the unobserved traits after the first selection cycle to obtain information about the genetic gains in each subsequent selection cycle. In this paper, we develop the theory of a GSI and apply it to two simulated and four real data sets with four traits. Also, we numerically compare its efficiency with that of the phenotypic selection index (PSI) by using the ratio of the GSI response over the PSI response, and the PSI and GSI expected genetic gain per selection cycle for observed and unobserved traits, respectively. In addition, we used the Technow inequality to compare GSI vs. PSI efficiency. Results from the simulated data were confirmed by the real data, indicating that GSI was more efficient than PSI per unit of time.

ABSTRACT A genomic selection index (GSI) is a linear combination of genomic estimated breeding values that uses genomic markers to predict the net genetic merit and select parents from a nonphenotyped testing population. Some authors have proposed a GSI; however, they have not used simulated or real data to validate the GSI theory and have not explained how to estimate the GSI selection response and the GSI expected genetic gain per selection cycle for the unobserved traits after the first selection cycle to obtain information about the genetic gains in each subsequent selection cycle. In this paper, we develop the theory of a GSI and apply it to two simulated and four real data sets with four traits. Also, we numerically compare its efficiency with that of the phenotypic selection index (PSI) by using the ratio of the GSI response over the PSI response, and the PSI and GSI expected genetic gain per selection cycle for observed and unobserved traits, respectively. In addition, we used the Technow inequality to compare GSI vs. PSI efficiency. Results from the simulated data were confirmed by the real data, indicating that GSI was more efficient than PSI per unit of time. KEYWORDS genomic estimated breeding value net genetic merit selection index selection response genomic selection GenPred shared data resource In genomic selection (GS), phenotypic and marker data from the training population are fitted in a statistical model to estimate all available marker effects. These estimates can then be used in subsequent selection cycles to obtain genomic estimated breeding values (GEBVs) that are predictors of the breeding values in the testing population (candidates for selection) for which there is only marker information (Meuwissen et al. 2001;Heffner et al. 2009;Lorenz et al. 2011;Nakaya and Isobe 2012). In GS, GEBVs are tools for ranking and selecting candidates for selection. Bernardo and Yu (2007) and Heffner et al. (2011) have shown that selection based on genomic predictions can lead to greater genetic gains per unit of time for complex traits. Technow et al. (2013) derived an inequality that depends on GS accuracy and the square root of the heritability of the unobserved trait, which is useful to compare the genomic selection efficiency with the phenotypic efficiency in terms of time. The standard method for predicting marker effects and breeding values is the ridge-regression best linear unbiased predictor, or its equivalent, the genomic best linear unbiased predictor, which assumes that the effects of all markers have a multivariate normal distribution with mean zero and constant variance (Meuwissen et al. 2001;VanRaden 2008). The difference among the various Bayesian regression methods lies in how they specify the prior distribution of the parameters of interest (de los Campos et al. 2013;Gianola 2013). Methods such as Bayes A and Bayes B assume that the variance of marker effects has an a priori inverse x 2 distribution (Meuwissen et al. 2001) that produces shrinkage as well as variable selection. Nevertheless, when the true marker effects have a multivariate normal distribution and the size of the training population and the number of markers is large, all methods produce GEBVs that are highly correlated with the true breeding values of the candidates for selection (Hayes et al. 2009;Verbyla et al. 2010).
In the context of molecular marker-assisted selection, Lande and Thompson (1990) proposed a selection index that combines marker information with phenotypic information, whereas Dekkers (2007) proposed a selection index that combines GEBVs with phenotypic information. Both selection indices were evaluated using simulated data and in both studies the authors found that the estimated selection response was greater than when only phenotypic information was used to estimate it. In the context of GS, Togashi et al. (2011) proposed four selection indices similar to Dekkers' index based on the best linear unbiased predictor theory; however, their results are hypothetical because the authors did not use any data (either simulated or real) to validate these indices. The indices of Togashi are a direct application of the phenotypic selection index (PSI) (Smith 1936), but they do not explain how to estimate the GS response and the genomic selection index (GSI) expected genetic gain per selection cycle for unobserved traits after the first selection cycle, which is important, because they give information on the genetic gains in the next selection cycle and are the base criteria to compare the efficiency of two or more linear selection indices (Bulmer 1980;Moreau et al. 1998).
This study had three main objectives: (1) to apply the GSI to two simulated and four real data sets that only use GEBVs for selecting nonphenotyped candidates for selection; (2) to propose a method to estimate the GSI selection response and the GSI expected genetic gain per selection cycle for unobserved traits after the first selection cycle; and (3) to compare GSI efficiency vs. PSI efficiency using simulated and real data.

PSIs and GSIs
The objective of any linear selection index, whether phenotypic or genomic, is to predict the net genetic merit H ¼ w9a, where a9 ¼ ½ a 1 a 2 ::: a t (t ¼number of traits) is a vector of true breeding values for an individual and w9 ¼ ½ w 1 w 2 ::: w t is a vector of economic weights. According to Kempthorne and Nordskog (1959), the selection response of any linear selection index can be written as where k is the standardized selection differential (or selection intensity), s H;I is the covariance between H and any linear index I, s 2 I is the variance of I, s H is the standard deviation of H, r H;I is the correlation between H and any linear index I, and L denotes the time required to collect information to evaluate I and complete one selection cycle. The second part of Equation (1) ( k L s H r H;I ) indicates that the genetic change due to selection is proportional to r H;I and to k, which is the selection differential in the index in standard deviation units (Kempthorne and Nordskog 1959). If k, s H , and L are fixed, R will be maximized when r H;I is maximized and the final form of Equation (1) will depend on the particular linear selection index used to select individuals, e.g., PSI or GSI. Figure 1 Schematic illustration of the steps followed to generate data sets 1 and 2 for the selection process using the phenotypic selection index and the genomic selection index. Dotted lines indicate the process used to simulate the phenotypic data.
PSI and its selection response: Let p9 ¼ ½ p 1 p 2 . . . p t be a vector of phenotypic trait values; the PSI (Smith 1936) can be written as PSI ¼ b9p and its maximized selection response is where L PSI denotes the time required for PSI to complete one selection cycle, b ¼ P 21 Cw, P 21 is the inverse of the phenotypic covariance matrix (P), and C is the covariance matrix of true breeding values a; k and w were defined previously.
GSI and its selection response: The GSI can be written as where g9 ¼ ½ g 1 g 2 ::: g t is a 1 · t vector of genomic breeding values for one individual; it can be shown that the maximized GSI selection response is where L GSI denotes the time required for GSI to complete one selection cycle; k and w were defined previously; G ¼ fs gqq9 g (q; q9 ¼ 1; 2; :::; t) is a covariance matrix of additive genomic breeding values g.
Note that in each selection cycle, matrices P, C, and G change their values as a result of many individuals being eliminated by the selection process.
Estimating the parameters of the PSI: In each selection cycle, we used the restricted maximum likelihood method (Patterson and Thompson 1971) to estimate the covariance matrix of true breeding values (C) and of the residuals (R), which were denoted asĈ andR, respectively, from where matrixP ¼Ĉ þR was an estimator of the phenotypic variance-covariance matrix (P). We estimated b ¼ P 21 Cw and R PSI ¼ k LPSI ffiffiffiffiffiffiffiffiffiffi b9Pb p asb ¼P 21Ĉ w and Estimating the GEBV and the GSI in the l th selection cycle: Letû be the estimator of the vector of marker effects u9 ¼ ½ u9 1 u9 2 . . . u9 t for t traits (Appendix). We obtained the q th GEBVs (q ¼ 1; 2; . . . ; t) in the l th selection cycle (l ¼ 1; 2; :::; number of cycles) aŝ whereû q is the vector of size m · 1 of the marker effects of the q th trait in the base population and X l is a matrix of size g · m of the coded values of marker genotypes in the l th selection cycle (Goddard 2009). The estimated GSI (GSI E ) values in this cycle were where w q is the q th economic weight andĝ ql was defined in Equation (5). Note that Equation (6) is a vector of size g · 1 (g ¼number of genotypes). In practice, GSI E values are ranked to select individual genotypes with optimum GEBV values.
Estimating the Gmatrix: Suppose that g q and g q9 have multivariate normal distribution jointly, with mean 1m g q and 1m g q9 , respectively, and covariance matrix Gs g qq9 , where 1 is a g · 1 vector of 1s and G ¼ XX9=c is the additive genomic relationship matrix (Appendix). Then G ¼ fs g qq9 g can be estimated aŝ whereŝ g qq9 ¼ 1 g ðĝ ql 2 1m g ql Þ9G 2 1 l ðĝ q9l 2 1m g q9l Þ is the estimated covariance between g q and g q9 in the l th selection cycle; g is the number of genotypes;ĝ ql was defined in Equation (5);m g ql andm g q9l are the estimated arithmetic means of the values ofĝ ql andĝ q9l ; 1 is a g · 1 vector of 1s and G l ¼ c 21 X l X9 l is the additive genomic relationship matrix in the l th selection cycle (l ¼ 1; 2; :::; number of cycles). From Equations (4) and (7), the estimated GSI response iŝ Criteria for comparing GSI efficiency vs. PSI efficiency Assuming that k is the same in both indices, to compare GSI efficiency vs. PSI efficiency in the l th selection cycle, we used the ratio which was proposed by Bulmer (1980) and Moreau et al. (1998) as a criterion for comparing the efficiency of linear selection indices. In Equation (8),R PSI andR GSI are estimators of Equations (2) and (4), respectively, andr H;GSI andr H;PSI are the maximized estimated correlation (or accuracy) between H and GSI, and between H and PSI, respectively. Using this criterion, if l . 1, GSI efficiency will be greater than PSI efficiency, if l ¼ 1, the efficiency of both selection indices will be equal, and if l , 1, PSI will be more efficient than GSI.
PSI and GSI expected genetic gain per selection cycle Besides Equation (8) for comparing the efficiency of PSI vs. GSI, we used the estimated values of the following two equations: where E PSI and E GSI are the expected genetic gain per selection cycle for each trait in the PSI (Lin 1978) and in the GSI (Togashi et al. 2011), respectively. All the terms in Equations (9) and (10) were defined and estimated according to Equations (2) and (4), respectively.

Simulated and real data sets
Simulated data sets (data sets 1 and 2): Figure 1 presents a schematic illustration of the steps followed to generate the simulated data sets. For the simulation, the performance of the F 2 or S n families was evaluated using the selfing generation (F 3 or S n+1 ) of the F 2 or S n families, whereas in practice, the F 2 or S n families would be evaluated by crossing them to a tester (or testers). We simulated eight phenotypic selection cycles [cycle 0 (C0)2 cycle 7 (C7)] for PSI (data set 1), and seven GS cycles (C12C7) for GSI (data set 2), each with four traits (T1, T2, T3, and T4), 500 genotypes and four replicates for each genotype under one possible scenario: 5% of quantitative trait loci (QTL) were in linkage equilibrium.
C0 was the GSI training population, which contained phenotypic and genotypic data; it is the population where we estimated the molecular marker effects (Appendix). In all selection cycles, we selected and intermated the top 10% of individuals (k ¼ 1:75). The economic weights used in PSI and GSI for T1, T2, T3, and T4 were 1, 21, 1, and 1, respectively. Selections were based on PSI and GSI values that incorporated all four trait (T1, T2, T3, and T4) means in each selection cycle to predict and select the net genetic merit (H ¼ w9a) of each individual.
Simulated data were generated using QU-GENE software (Podlich and Cooper 1998;Wang et al. 2003). Three hundred fifteen QTL and 2500 molecular markers were distributed uniformly across 10 chromosomes to simulate two maize (Zea mays L.) populations. Each QTL and molecular marker was biallelic and the QTL additive values ranged from 0 to 0.5. The 315 QTL were randomly allocated over the 10 chromosomes. Because QU-GENE uses recombination fraction rather than map distance to calculate the probability of crossover events, recombination between adjacent pairs of markers was set at 0.0906, those between a QTL and its flanking markers set at 0.0 and 0.0906, and that between two adjacent QTL set at 0.0. The recombination fraction between 15 random QTL and their flanking markers was set at 0.5, i.e., complete independence (Haldane 1919), to simulate linkage equilibrium between 5% of the QTL and their flanking markers.
The genotypic value of each plant was generated based on its haplotypes and the QTL effects for each trait. For each trait, the phenotypic value for each of four replications of each plant was obtained from QU-GENE software by setting the per-plot heritability of T1, T2, T3, and T4 at 0.4, 0.6, 0.6, and 0.8, respectively.
In cycle C0 (the training population), 500 F 2 plants were generated from a cross of two inbred parents. The haplotypes of these parents were randomly generated, but the two parents shared no common alleles. In subsequent cycles (i.e., C12C7), 500 plants were generated from a random intercross of the selected 10% of lines from the previous cycle using the PSI and GSI methods. In C0, only PSI was applied. In C1, two selection methods were applied: PSI (data set 1) and GSI (data set 2); the 10% of individuals selected with each method were advanced to the next selection cycle.
For each data set, C0 contained genotypic data and four phenotypic traits: grain yield (GY, t/ha), plant height (PHT, cm), ear height (EHT, cm), and anthesis days (AD, d), as well as three sets of markers corresponding to C0 (training population), C1, and C2. The numbers of individuals and molecular markers in each population are shown in Table 1. Assuming that the breeding objective was to increase GY while decreasing PHT, EHT, and AD, the vectors of economic weights in C0, C1, and C2 for GY, PHT, EHT, and AD, were w9 ¼ ½ 5 2 0:3 2 0:3 2 1 for both indices and the four data sets.
In our study, the PSI was applied only in C0 because there were no phenotypic data after that cycle, whereas GSI was applied in C0, C1, and C2. Note that GSI was used in C0 only with the purpose of comparing GSI efficiency vs. PSI efficiency. The top 10% (k = 1.75) was selected in all cycles of the four data sets.
We analyzed the simulated and real data results for all traits in each selection cycle, by using three different criteria: the estimated GSI and PSI selection responses, the estimated expected genetic gain per selection cycle for each trait in the PSI and in the GSI, and the estimated Technow et al. (2013) inequality (see Supporting Information, File S1 for the last criteria).
Data repository: The simulated phenotypic selection cycles (C02C7) for PSI (data set 1), and GS cycles (C12C7) for GSI (data set 2), as well as the real data sets (data sets 326) including the phenotype and haplotype data are deposited at http://hdl.handle.net/11529/10199. This repository also has the File S1 cited several times in the text of the paper.

Data availability
The data repository has the following data: Real_Data_Sets_GSI, Sim-ulated_Data_GSI, and a manuscript: Supplementary Material-2.doc that are described below File Real_Data_Sets_GSI contains four file data sets: DATA_SET-3, 4, 5 and 6. In addition, each four file data sets contains four excel data. For example, the four excel data for file DATA_SET-3 are: DATA_SET-3_Markers_Cycle-0, 1, 2, and DATA_SET-3_Phenotypic_Cycle-0. The first three excel data contains the marker coded values for cycles 0, 1 and 2, while the excel data DATA_SET-3_Phenotypic_Cycle-0 contains the phenotypic information of cycle 0 (Training population). These four data sets were used to make selection, to estimate the selection response and the genetic expected gains; the results were presented in Table 4.
The other three file data sets: DATA_SET-4, 5 and 6 contains similar information that file DATA_SET-3, but this information correspond to data set 4, 5 and 6 used to make selection, to estimate the selection response and the genetic expected gains; the results were presented in Table 4.
Finally, the manuscript Supplementary Material-2.doc contain a complete description related with the form we adapted the Technow inequality to the genomic and phenotypic selection index to comparer its efficiency in terms of time. In addition, this manuscript contains Table S1 and Table S2; the first one contains the results of the simulated data and the second one the results of real data.

Simulated data
Correlations between GEBV and the trait true breeding values: Figure 2 shows the correlation between the GEBV and the individual trait true breeding values obtained by the Pearson correlation coefficient. The genomic relationship (G) is not incorporated in the correlations. In Figure 2, each selection cycle contains four columns: the first column (from left to the right) corresponds to the correlation between the GEBV and the T1 true breeding values; the second column corresponds to the correlation between the GEBV and the T2 true breeding values; etc. In this figure, all correlation values tend to decrease after the first selection cycle. In C7, the correlation values between the GEBV and the trait true breeding values were 0.30, 0.21, 0.38, and 0.34, for each of the four traits, respectively, whereas in cycle one (C1) these correlations were 0.40, 0.53, 0.63, and 0.73, for each of the four traits, respectively. In terms of proportions, the correlation values of C7 were only 76%, 40%, 60%, and 46% of the correlation values of C1. That is, the correlation between the GEBV and the trait true breeding values decreased more for traits 2 and 4 than for traits 1 and 3. This can be  (1)) andĜ was obtained according to Equation (7). In this case, r GSI;H incorporated the genomic relationship (G) information. Figure 3 contains only two columns for each selection cycle: the first column (blue) corresponds to the correlation between the GSI and the true values of H, whereas the second column (red) denotes the correlation between PSI estimated values and H. As expected, the correlation between GSI and H tended to decrease more than the correlation between PSI and H after the third selection cycle. The reason was that the PSI estimated values in each selection cycle were obtained using all phenotypic information of the newly generated population, whereas the GSI estimated values in each selection cycle incorporated only the marker information of the newly generated population. The correlation between GSI and H was 0.71 in C7 and 0.95 in C1, whereas the correlation between PSI and H was 0.83 in C7 and 0.91 in C1.
Estimated and true selection response of PSI and GSI when their generation interval is ignored: The first part of Table 2 shows the GSI estimated (R GSI ) and true (R GSI ), and the PSI estimated (R PSI ) and true (R PSI ) selection responses and their ratios:R GSI /R GSI andR PSI /R PSI , when their generation interval was ignored, for simulated data sets 1 and 2, respectively, for four traits (T1, T2, T3, and T4) and seven GSI and PSI selection cycles. In all selection cycles,R GSI ,R PSI and R GSI , R PSI . In addition, results indicated that, in general,R GSI , R GSI andR PSI , R PSI , i.e.,R GSI andR PSI , underestimated the R GSI and R PSI values in all selection cycles.
The average values for all selection cycles of ratiosR GSI /R GSI and R PSI /R PSI were equal to 0.83 and 0.86, respectively, which indicated that R GSI explained 83% of R GSI performance, whereasR PSI explained 86% of R PSI performance. Then, in terms of mean values, the results indicated thatR GSI andR PSI were good estimators of R GSI and R PSI performance, respectively. The main results here show that for each selection cycle, the estimated PSI response to selection was always higher than the true and estimated GSI response when the generation interval was not incorporated in the estimated selection response.
Estimated and true selection response of PSI and GSI when their generation interval is included: The second part of Table 2 shows the GSI estimated (R GSI ) and true (R GSI ), and the PSI estimated (R PSI ) and true (R PSI ) selection response values and their ratios:R GSI /R PSI and R GSI /R PSI , when their generation interval was included, for simulated data sets 1 and 2 for four traits (T1, T2, T3, and T4) for seven GSI and PSI selection cycles. In this case, the time required to complete one GSI selection cycle was L GSI ¼ 1:5 years, whereas for one PSI selection cycle it was L PSI ¼ 4 years. According to the ratio valuesR GSI /R PSI and R GSI /R PSI , in all selection cycles GSI was more than twice as efficient as PSI. Then, when the generation interval of the estimated GSI and PSI selection response was included in the estimate response to selection, GSI was more efficient than the PSI in all selection cycles for the estimated and true selection responses. Table 3 show the PSI estimated expected genetic gains for each trait per selection cycle for the observed traits (Equation 9) and columns 10217 show the estimated GSI expected genetic gain for each trait per selection cycle for the unobserved traits (Equation 10). Note that the PSI estimated expected genetic gains of columns 225 were not divided by 4 (the time required to collect information to evaluate PSI and complete one selection cycle). Similarly, the GSI estimated expected genetic gains of columns 9213 were not divided by 1.5 (the time required to collect information to evaluate GSI and complete one selection cycle). When the generation interval was not considered, the expected value of PSI for traits in each cycle was always higher than the expected values of the GSI for those traits. However, per unit of time, the expected genetic gains of GSI (columns 14-17 of Table 3) for each cycle and for each trait were always higher than the expected genetic gains of PSI (columns 629 of Table 3).

Real data (F2 maize populations)
Estimated expected genetic gains and selection responses for PSI and GSI with generation interval included: Table 4 shows the estimated PSI and GSI expected genetic gains, and the estimated PSI and GSI selection responses (Equations 2 and 4, respectively) for one PSI selection cycle (C0) and three GSI cycles (C0, C1, and C2) of four maize (Zea mays L.) F 2 populations and four traits (GY, PHT, EHT, and AD), when their generation interval was included. We aimed to increase GY while decreasing PHT, EHT, and AD by using three sets of markers (Table 1). As for the simulated data, for the PSI and GSI response, the time required to complete one selection cycle was L GSI ¼ 1:5 and L PSI ¼ 4 years for GSI and PSI, respectively. n Estimated (R GSI ) and true (R GSI ) GSI responses; estimated (R PSI ) and true (R PSI ) PSI responses, and the ratios:R GSI /R PSI and R GSI /R PSI , when their generation intervals were included in simulated data sets 1 and 2, respectively, for four traits (T1, T2, T3, and T4). We conducted eight selection cycles (including cycle 0) with PSI and seven (from cycle 127) with GSI. The average responses and ratio values from cycle 1 to 7 are shown in the last line of each sub-table. GSI, genomic selection index; PSI, phenotypic selection index Estimated expected genetic gains for PSI and GSI: In this case, the GSI estimated expected genetic gain for each trait per selection cycle for the unobserved traits in C0 (or training population) were greater than the PSI estimated expected genetic gains for each trait per selection cycle for the observed traits. These results showed a similar tendency to the simulated results when the generation interval was included. That is, in C0, the estimated GSI expected genetic gains were greater than the estimated PSI expected genetic gains. In C1 and C2, it was not possible to compare GSI vs. PSI because there were no phenotypic data in those cycles.
Estimated PSI and GSI selection response: The numbers of individuals and markers used in the four real data sets were lower (Table 1) than those used in the simulated data; for this reason, the estimated selection values observed in the real data sets (Table 4) were lower than those in the simulated results shown in Table 4. However, in general, the decrease in estimated GSI responses after C0 was similar to the decrease in estimated GSI selection responses after C1 in the simulated data (Table 2). For the real data sets, in C0, the estimated GSI selection response was higher than the estimated PSI selection response, whereas in C1 and C2, it was not possible to compare GSI vs. PSI because there were no phenotypic data in those cycles.
Additional criteria for comparing PSI vs. GSI Besides Equations (8), (9), and (10), we used the Technow et al. (2013) inequality adapted to the context of PSI and GSI (Supplemental Materials, Equation (S1)) as additional criteria to compare the efficiency of GSI vs. PSI in terms of time. This last criterion corroborated the results obtained with Equations (8), (9), and (10). Results of the last criterion are given in Table S1 and Table S2 for simulated and real data, respectively.

Simulated data
Our results showed that GSI is more efficient than PSI per unit of time but not in terms of cycle. The average of the PSI and GSI selection responses values for all cycles, and the average of the PSI and GSI expected genetic gains per selection cycle for all cycles for observed and unobserved traits, respectively, were very similar when their generation interval was ignored because in the simulation process 95% of the QTL were in linkage disequilibrium with markers. After C3, the correlation between true and estimated PSI and GSI values was greater for PSI than for GSI. In our simulation, if instead of using 95% of the QTL in linkage disequilibrium, we had used 100% of the QTL in linkage disequilibrium n Table 3 Estimated expected genetic gains obtained using the PSI and the GSI for simulated data sets 1 and 2, respectively, for four traits (T1, T2, T3, and T4), when the generation interval is ignored and when it is included a Generation Interval Ignored  Generation Interval Included  Generation Interval Ignored  Generation Interval Included   T1  T2  T3  T4  T1  T2  T3  T4  T1  T2  T3  T4  T1  T2  T3  T4   1  We conducted eight selection cycles (including cycle 0) with PSI and seven (from cycle 1 to 7) with GSI. The average responses and genetic gains from cycle 1 to 7 are shown in the last line of the table. PSI, phenotypic selection index; GSI, genomic selection index. a For PSI, the time required to complete one selection cycle is 4 years; for GSI, the time required to complete one selection cycle is 1.5 years.

Cycles PSI Estimated Expected Genetic Gains GSI Estimated Expected Genetic Gains
n Table 4 Expected genetic gains per selection cycle for the PSI and GSI for cycle 0 and cycles 0, 1, and 2, respectively, for four traits (GY, EHT, PHT, and AD) in four maize (Zea mays) F 2 populations when the generation interval was included The last line of each subtable shows the estimated PSI (cycle 0) selection response, and the estimated GSI (cycles 0, 1, and 2) selection responses. PSI, phenotypic selection index; GSI, genomic selection index; GY, grain yield; EHT, ear height; PHT, plant height, AD, anthesis days.
with markers, we would expect the PSI and GSI results to be practically equal under the assumption of a very large number of markers. The importance of this result is that when the generation interval was ignored, PSI efficiency . GSI efficiency, but on average across all cycles, they were similar. When the interval length was used in the PSI and GSI selection responses and in the PSI and GSI expected genetic gain per selection cycle, GSI was always more efficient than PSI in maize population selection for relatively dense molecular markers in an F 2 population. We compared the PSI response with the GSI response considering the time (years) needed for each method to complete a selection cycle assuming that selection intensity is the same in both selection indices. Then, the ratio of the GSI selection response over the PSI selection response (Equation 8) was a good criterion for comparing PSI efficiency vs. GSI efficiency because each selection response included all the information on the genetic gains for each selection index in each selection cycle. In the case of the maize populations, GSI led to greater rates of genetic gain/year than PSI because PSI requires about 4 years to complete each selection cycle, whereas GSI requires about 1.5 years (Beyene et al. 2015). Thus GSI efficiency was greater than PSI efficiency because the interval of time between selection cycles in GSI is shorter than in PSI. If this factor is not taken into account, the average PSI response for the simulated data were 14% greater than the average GSI response.

Real data
In the real data sets, the trend of GSI responses was very similar to those observed in the simulated data when their generation interval was not ignored. That is, GSI responses were higher than PSI responses in C0 for all four data sets (Table 4). One reason for these results may be that markers were in linkage disequilibrium with many QTL of the trait. In that case, GSI was very effective. As shown by Beyene et al. (2015), in eight biparental populations, a good genetic gain is expected from rapid cycling of GS in an F 2 population with maximum linkage disequilibrium. Note that the estimated selection response for GSI decreased in a manner similar to that of the simulated data after cycle 0. This is because in the real data sets, the estimated selection response depends on the additive genomic variance-covariance matrix (G), whose covariance components decreased in each selection cycle.
The importance of the estimation of matrix G in simulated and real data and its effect on GSI correlations, GSI response, and GSI expected genetic gains We proposed one way of estimating matrix G ¼ fs g qq9 g (Equation 7). This method significantly affected (1) the correlation between GSI and the net genetic merit (H ¼ w9a), (2) the estimated GSI response, and (3) the estimated GSI expected genetic gains. The elements of G were estimated asŝ g qq9 ¼ 1 g ðĝ ql 2 1m g ql Þ9G 2 1 l ðĝ q9l 2 1m g q9l Þ. Another form of s g qq9 estimate isŝ g qq9 ¼ 1 g ðĝ ql 2 1m g ql Þ9ðĝ q9l 2 1m g q9l Þ, where matrix G 2 1 l is omitted. In that case, the correlation between the GSI and H would tend to be smaller (data not shown) than that shown in Figure 3. In addition, we could also expect that the estimated GSI selection responses and the estimated GSI expected genetic gains per selection cycle would be smaller than those shown in Table 2, Table 3, and  Table 4.
These results indicate the importance of matrix G 2 1 l in the estimation of the GSI response, the GSI expected genetic gains per selection cycle, and in the correlation between GSI and H because the use of the genomic relationship matrix increases the accuracy of parameter estimation.
PSI vs. GSI PSI and GSI are predictors of H and both have optimal statistical properties. However, while PSI is a phenotypic predictor of H, GSI is a genomic predictor of H. Based on trait hereditability and genetic architecture, PSI is expected to be more accurate and have a greater selection response per selection cycle than GSI. However, in terms of genetic gain per unit of time, GSI needs one-third of the time required by PSI or less (Lorenz et al. 2011). Thus, GSI will be more efficient than PSI in GS programs. We have shown (in File S1) that the Technow inequality corroborated this last argument.
In simulation and empirical studies, GEBVs based solely on individual genotypes have been remarkably accurate. These accuracies depend on the characteristics of the population under selection (Lorenz et al. 2011). According to Equations (3) and (6), GSI is a linear combination of indices because GEBVs are indices per se (Robinson 1991;Togashi et al. 2011) and its main function is to predict the net genetic merit (H ¼ w9a) of the candidate for selection. According to classical best linear unbiased predictor theory (McLean et al. 1991;Robinson 1991): (a) GSI is the best linear predictor of H; (b) the correlation between GSI and H is maximum; (c) the GEBVs are unique; and (d) EðH=GSI E Þ ¼ GSI E , i.e., the expectation of H given GSI E is GSI E . PSI was constructed with trait phenotypic means to predict and select H; however, Henderson (1963) showed that all four points are also true for PSI when matrices P and C are known.
For the selection objective, GSI requires only the genomic best linear unbiased predictor obtained in the training population (in this case, C0) and the population markers of each selection cycle that are used to obtain the GEBV in each selection cycle. Then, for selection proposes, we only need to construct the estimated GSI as GSI E ¼ P t q¼1 w qĝql (Equation 6) and the GSI E values are then ranked to select individual genotypes with optimum GEBV values. However, in the present paper, we used the PSI theory originally developed by Smith (1936) to obtain the GSI selection response and the GSI expected genetic gains per selection cycle. Selection response and expected genetic gains give information on the genetic gains in the next selection cycle and are the base criteria for comparing the efficiency of two or more selection indices (Bulmer 1980;Moreau et al. 1998).
PSI vs. GSI when the generation interval is equal in both indices Some of the results shown in Tables 2 and Table 3 and Table S1 and Table S2 occurred when the PSI generation interval (L PSI ) was greater than the GSI generation interval (L PSI ). What would happen if L PSI = L GSI ? In this case, if the number of markers is very small, then Equation (4) will give lower values than Equation (2) and PSI efficiency will be greater than GSI efficiency. However, if the number of markers is very large, the PSI and GSI responses will be very similar. This argument also holds true for the Technow et al. (2013) inequality and the PSI and GSI expected genetic gain per selection cycle for observed and unobserved traits. For example, note that in Table S1, we have assumed that L GSI ¼ 1:5 and L PSI ¼ 4:0. Suppose now that L PSI ¼ L GSI ¼ 4:0. In this case, the Technow et al. (2013) inequality will not hold true because in all selection cycles L GSI . r H;GSI hPSI L PSI . That is, the Technow et al. (2013) inequality will change its direction. Finally, it is evident that if L PSI , L GSI , PSI will be more efficient than GSI even in the hypothetical case when the number of molecular marker is infinite. In conclusion, GSI will be more efficient than PSI in terms of unit of time only if L PSI . L GSI ; in this case, the Technow et al. (2013) inequality is true. In all other cases, PSI will be more efficient than GSI.
In this study, we applied the theory of GSI to simulated and real data and compared its efficiency with PSI efficiency by using three different criteria: the ratio of the GSI response over the PSI response, the PSI and GSI expected genetic gain per selection cycle for observed and unobserved traits, respectively, and the Technow inequality. In all three cases, for simulated and real data, GSI efficiency was higher than PSI efficiency per unit of time in all selection cycles. We thus concluded that GSI is an efficient choice when the purpose of a breeding program is to select individuals using GS.

Multivariate prediction of molecular marker effects
In the univariate context, VanRaden (2008) showed that marker effects in the training population could be estimated aŝ u q ¼ c 21 X9½G þ lI g 21 y q (A1) where G ¼ c 21 XX9 is the additive genomic relationship matrix, X is a matrix with coded marker values on the base population (e.g., 1, 0, and 21 for genotypes AA, Aa, and aa, respectively); p j is the allelic frequency of a; c ¼ P N j¼1 2p j ð1 2 p j Þ is a proportional constant (Habier et al. 2007); l ¼ s 2 eq s 2 aq ; s 2 aq and s 2 eq are the additive and residual variances, respectively, associated with the q th trait; y q~N MV(0, V q ) is a vector of observations, where NMV stands for the multivariate normal distribution, V q ¼ Gs 2 aq þ I g s 2 eq ; and I g is an identity matrix of order g · g. In the multivariate context, to estimate the vector u9 ¼ ½ u9 1 u9 2 . . . u9 t , Equation (A1) can be written aŝ where Z t ¼ I t 5X, "5" denotes the direct product, X was defined in Equation (A1); L ¼ RC 21 , R is the residual covariance matrix, and C was defined in the text (Equation 1); y9 ¼ ½ y9 1 y9 2 . . . y9 t ~NMV(0, V) is a vector of size 1 · tg, and V ¼ C5G þ R5I g ; I t is an identity matrix of size t · t and I g was defined in Equation (A1). In this case, the estimator of the vector g9 ¼ ½ g 1 g 2 ::: g t isĝ ¼ Z tû .