## Abstract

A combined multistage linear genomic selection index (CMLGSI) is a linear combination of phenotypic and genomic estimated breeding values useful for predicting the individual net genetic merit, which in turn is a linear combination of the true unobservable breeding values of the traits weighted by their respective economic values. The CMLGSI is a cost-saving strategy for improving multiple traits because the breeder does not need to measure all traits at each stage. The *optimum* (OCMLGSI) and *decorrelated* (DCMLGSI) indices are the main CMLGSIs. Whereas the OCMLGSI takes into consideration the index correlation values among stages, the DCMLGSI imposes the restriction that the index correlation values among stages be zero. Using real and simulated datasets, we compared the efficiency of both indices in a two-stage context. The criteria we applied to compare the efficiency of both indices were that the total selection response of each index must be lower than or equal to the single-stage combined linear genomic selection index (CLGSI) response and that the correlation of each index with the net genetic merit should be maximum. Using four different total proportions for the real dataset, the estimated total OCMLGSI and DCMLGSI responses explained 97.5% and 90%, respectively, of the estimated single-stage CLGSI selection response. In addition, at stage two, the estimated correlations of the OCMLGSI and the DCMLGSI with the net genetic merit were 0.84 and 0.63, respectively. We found similar results for the simulated datasets. Thus, we recommend using the OCMLGSI when performing multistage selection.

- Genomic estimated breeding value
- Molecular marker effects
- Multistage selection
- Total selection response
- Genomic Prediction GenPred
- Shared Data Resources

The linear selection index can be a linear combination of phenotypic values (Smith 1936; Hazel and Lush 1942; Hazel 1943), genomic estimated breeding values (GEBV) (Ceron-Rojas *et al.* 2015; Cerón-Rojas and Crossa 2019), or of phenotypic values and GEBV (Dekkers 2007) jointly. In addition, it can also be a linear combination of phenotypic values and marker scores (Lande and Thompson 1990). All of these linear selection indices can be single-stage or multistage and are useful for selecting parents for the next generation and for predicting the individual net genetic merit, which, in turn, is a linear combination of the true unobservable breeding values of traits weighted by their respective economic values. The main aims of linear indices are to (1) predict the net genetic merit values of the candidates for selection, (2) maximize the selection response, and (3) provide the breeder with an objective rule for evaluating and selecting several traits simultaneously.

The selection response and the correlation between the index and the net genetic merit are the main index parameters; they are also the criteria used to compare the efficiency of any linear index to predict the net genetic merit. When the mean of the original population is zero, the selection response is the expected net genetic merit of the selected individuals (Smith 1936; Cochran 1951; Cerón-Rojas and Crossa 2018; Chapter 2). Both parameters give breeders an objective basis on which to validate the success of the adopted selection method.

Smith (1936) was the first to describe the single-stage linear phenotypic selection index (LPSI) theory under two assumptions: that the genotypic values that make up the net genetic merit are composed entirely of the additive effects of genes, and that the LPSI and the net genetic merit have bivariate normal distribution. When the phenotypic and genotypic covariance matrices of the individual traits under selection are known, the LPSI is the best linear predictor of the net genetic merit and is called optimum LPSI. The major advantages of the LPSI are that it assigns higher weights to traits whose differences are genetic and that it is relatively simple to analyze. Its disadvantages are that it requires large amounts of information, that the economic weights are difficult to assign and that the sampling error could be large. All linear selection indices associated with the LPSI theory have the same advantages and disadvantages (Cerón-Rojas and Crossa 2018, Chapters 2 to 6).

Cochran (1951) was the first to propose a two-stage LPSI, and Young (1964) generalized the Cochran (1951) approach to the multistage LPSI context. Young (1964) combined the independent culling selection method (Hazel and Lush 1942) and the LPSI theory to develop the optimum multistage LPSI (OMLPSI) for selecting several traits. One stage is the individual age (the length of time that a plant or animal has lived) at which the breeder measures and selects the individual based on the traits of interest into one specific selection cycle. Thus, suppose two vectors of individual traits, **x** and **y**, become evident at different animal or plant ages. In the single-stage context, both vectors of information are selected jointly using LPSI, whereas in the two-stage selection context, first **x** is selected at stage 1, and, at stage 2, we select **x** and **y** using OMLPSI. Young (1964) called the OMLPSI the “*part and whole index selection method*”, whereas Xu and Muir (1992) called it “*selection index updating*” because as traits become available, each subsequent index contains all traits available up to that stage.

Breeders apply the OMLPSI mainly in animal and tree breeding, where the target traits become evident at different individual ages. The OMLPSI is a cost-saving strategy for improving multiple traits because the breeder does not need to measure all traits at each stage. That is, the main advantage of the OMLPSI over the single-stage LPSI is that the breeder does not need to carry a large population of individuals throughout the multi-trait selection process. If the breeder selects with the LPSI, the same individuals for each trait of interest must be measured, and then the number of traits measured per mature individual is the same as for an immature individual. When traits have a developmental sequence in ontogeny, or when there are large differences in the costs of measuring several traits, the efficiency of OMLPSI over LPSI, in terms of cost saving, can be substantial (Xu and Muir 1991, 1992; Xu *et al.* 1995; Xie *et al.* 1997).

Some problems associated with the OMLPSI are as follow. First, after the first selection stage, the OMLPSI values could be non-normally distributed. Second, for more than two stages, the OMLPSI requires multiple integration techniques to derive selection intensities. Third, there are problems of convergence when the traits and the index values of successive stages are highly correlated, and finally, the computational time could be unacceptable if the number of selection stages becomes too high (Börner and Reinsch 2012). For these reasons, Xu and Muir (1992) developed s*election index updating* or the *decorrelated* multistage linear phenotypic selection index (DMLPSI).

In a similar manner as Cochran (1951) and Young (1964) developed the OMLPSI, Xu and Muir (1992) developed the DMLPSI combining the independent culling selection method and the LPSI theory for selecting several traits in the multistage context. However, while the OMLPSI theory takes into account the correlation among the OMLPSI values at different stages when predicting the net genetic merit, the DMLPSI theory imposes the restriction that the correlation between the DMLPSI values at different stages be zero; hence the name *decorrelated multistage index* (Börner and Reinsch 2012). Under this restriction, exact truncation points and selection intensities can be determined for a fixed selection proportion before selection is carried out, and the selected individual index values after the first selection stage could be normally distributed; in addition, it is not necessary to use multiple integration techniques to derive selection intensities. Xu and Muir (1992) derived a set of nonlinear equations in the DMLPSI context to obtain truncation points and selection intensities, and indicated that these equations may be solved iteratively using a multidimensional Newton method. Xu *et al.* (1995), however, found that the Newton method is sensitive to the initial values and frequently converges with solutions at a local maximum. Another problem associated with DMLPSI is that its selection responses and correlation with the net genetic merit are lower than the OMLPSI selection response and accuracy after the first selection stage (Börner and Reinsch 2012; Cerón-Rojas *et al.* 2019a, b).

In the marker-assisted selection (MAS) context, Lande and Thompson (1990) proposed a linear marker selection index (LMSI) which uses phenotypic and marker score values jointly to predict the net genetic merit. This index exploits the linkage disequilibrium between markers and quantitative trait loci (QTL) that occurs when inbred lines are crossed (Lange and Whittaker 2001). The LMSI requires regressing phenotypic values on marker coded values and, with these estimates, it constructs the marker score for each individual trait of the candidate for selection. Subsequently, the marker scores are combined with phenotypic information using the LMSI to predict the net genetic merit. Lande and Thompson (1990) assumed that the average effects on phenotype and the favorable alleles are known; however, this assumption is valid for major gene traits but not for quantitative traits that are affected by the environment, as well as many QTL with small effects that could interact with the environment and among themselves (Heffner *et al.* 2009). Several authors (Lange and Whittaker 2001; Meuwissen *et al.* 2001; Dekkers 2007) have criticized the LMSI approach because it makes inefficient use of the available data, as one would rather use all the available data in a single step to achieve maximally accurate estimates of marker effects. In addition, because the LMSI is based on only a few large QTL effects, it violates the selection index assumptions of multivariate normality and small changes in allele frequencies (Heffner *et al.* 2009).

Dekkers (2007) proposed a slightly modified version of the Lande and Thompson (1990) index. Instead of using marker scores, the Dekkers (2007) index uses the genomic estimated breeding values (GEBV) jointly with the phenotypic values to predict the net genetic merit. Céron-Rojas and Crossa (2018, Chapter 5) called this index the combined linear genomic selection index (CLGSI), and because it uses GEBV instead of marker scores, it is free of the problems (indicated earlier) that the LMSI presents. In the CLGSI context, all marker effects and GEBV of the genotyped individuals in the training population are estimated using marker and phenotypic data, and then the GEBVs are combined with the phenotypic values in a CLGSI to predict the net genetic merit and select parents for the next generation.

Xie and Xu (1998) extended the DMLPSI to the MAS context for developing a decorrelated multistage LMSI similar to the index of Lande and Thompson (1990). That is, the Xie and Xu (1998) index is an LMSI used to predict the net genetic merit in the multistage selection context. The main objective of Xie and Xu (1998) was to increase the efficiency of MAS in a two-stage breeding selection scheme. For this reason, they decided to select immature individuals (embryos) or seedlings at stage one based on a linear combination of trait molecular scores only, and, at stage two, to select mature individual traits based on a linear combination of trait molecular scores and phenotypic values jointly. According to Xie and Xu (1998), this selection method was implied in the paper by Lande and Thompson (1990) and the problem of these last two authors was how to find the selection intensities associated with a two-stage breeding scheme. For this reason, Xie and Xu (1998) adapted the DMLPSI theory to the MAS breeding context. This approach, however, has the same problems as those associated with the LMSI, which we indicated earlier.

In this work, we adapted the Dekkers (2007) index (which is an optimum index) to the multistage selection context. This index uses GEBV instead of marker scores; thus, it is free of the problems associated with the Xie and Xu (1998) index. We applied the proposed index in the two-stage context as follows. In stage one, we selected immature seedlings and embryos based on a linear combination of GEBV only, and, in stage two, we selected individual traits based on a linear combination of GEBV and phenotypic values jointly. This is the Xie and Xu (1998) idea but in the genomic selection context.

We validated the results of the proposed index using the optimum and decorrelated selection index theory in a two-stage breeding selection scheme (this approach can be extended to any number of stages). The optimum index was named *optimum combined multistage linear genomic selection index* (OCMLGSI), while the decorrelated index was called *decorrelated combined multistage linear genomic selection index* (DCMLGSI) because, at stage two, both indices use GEBV and phenotypic information jointly to predict the net genetic merit. While the OCMLGSI was based on the Dekkers (2007) and Young (1964) index theory, the DCMLGSI was based on the Xie and Xu (1998) and Xu and Muir (1992) index theory. We obtained the theoretical results of both indices under the assumption that the indices and the net genetic merit values have bivariate normal distribution. Under this assumption, the regression of the net genetic merit on any linear function of the phenotypic or GEBV values is linear (Kempthorne and Nordskog 1959) and the total selection response for two or more stages is the sum of each response obtained at each stage (Cochran 1951; Young 1964; Cerón-Rojas *et al.* 2019a).

We compared the relative efficiency of OCMLGSI and DCMLGSI using real and simulated datasets. The criteria used to compare the relative efficiency of both indices were that the total selection response of each index must be lower than, or equal to, the single-stage CLGSI (Dekkers 2007) selection response (Young 1964; Saxton 1983) and that the correlation of each index with the net genetic merit should be the maximum possible. The results of this study are the first ones comparing (with real and simulated data) the relative efficiency of the OCMLGSI with DCMLGSI efficiency using the total selection response and the maximized correlation with the net genetic merit as the main criteria to compare the efficiency of both indices.

## Material and Methods

We completed this section with three supplementary materials (Supplementary material 1, 2 and 3) that are located at http://hdl.handle.net/11529/10548356.

### Objectives of the combined multistage linear selection indices

Two objectives of the OCMLGSI and DCMLGSI are to maximize the selection response and predict the net genetic merit (, where and are vectors of economic weights and true unobservable breeding values, respectively, and number of traits). Additional OCMLGSI and DCMLGSI objectives are to select individuals with the highest H values as parents of the next generation, and provide the breeder with a rule for evaluating and selecting several traits simultaneously.

### The OCMLGSI and DCMLGSI at stage one

At stage 1, the OCMLGSI and DCMLGSI use only GEBV to predict the net genetic merit and select individual traits. The index to predict the individual net genetic merit at stage 1 is(1)where number of traits, and (Appendix 1, Equation A1 for details) are vectors of economic weights and genomic breeding values, respectively. At stage 1, Equation (1) is the same for both indices.

### The OCMLGSI and DCMLGSI parameters at stage one

The maximized OCMLGSI and DCMLGSI selection responses are(2)respectively, where and (the selection intensities of each index) are the only difference between and . In Equation (2), is the covariance matrix of genomic breeding values (Appendix 1, Equation A2 for details), whereas w was defined earlier. The maximized correlation between and is , where is the standard deviation of the variance of , is the standard deviation of the variance of , and is the covariance matrix of **g**.

### The combined multistage linear selection index and the net genetic merit at stage two

At stage 2, the OCMLGSI and DCMLGSI use genomic and phenotypic information jointly to predict the individual net genetic merit and can be written as(3)where is a vector of trait ( number of traits) phenotypic values, and z was defined earlier; and are vectors of coefficients of genomic and phenotypic weight values, respectively; and . The only difference between OCMLGSI and DCMLGSI is the way vector is obtained.

The net genetic merit () can be written as(4)where g, w and z have been defined above; is a null vector associated with vector z, whereas and .

### The OCMLGSI and DCMLGSI covariance matrices at stage two

Let , and be the phenotypic, genotypic and genomic covariance matrices, respectively; then(5)are block covariance matrices associated with OCMLGSI and DCMLGSI at stage 2. Equation (5) indicates that the covariance matrix between g and z is (Appendix 1, Equation 2). In Appendix 1 (Equations A6 to A8) we describe how we estimated matrices Γ, P, C.

### The OCMLGSI parameters at stage two

In Supplementary material 1 (see http://hdl.handle.net/11529/10548356), we extend the OCMLGSI theory to the multistage context. Here we present only the main results for the two stages. Since the OCMLGSI theory is based on LPSI theory, the OCMLGSI vector of coefficients (β) that maximizes the OCMLGSI response (and the correlation between the OCMLGSI and the net genetic merit values) is(6)where and . By Equation (6), the maximized OCMLGSI selection response is(7)where is the OCMLGSI intensity at stage 2. The total selection response for stages 1 and 2 is . The maximized correlation between H (Equation 4) and the OCMLGSI is(8)In Appendix 1 (Equation A9), we indicated how to estimate . Additional details of the parameters associated with Equations (7) and (8) are given in Supplementary material 3 (see http://hdl.handle.net/11529/10548356).

### The DCMLGSI parameters at stage 2

In Supplementary material 2 (see http://hdl.handle.net/11529/10548356), we extended the DCMLGSI theory to the multistage context, and we showed that the DCMLGSI vector of coefficients at stage 2 is(9)where , and (Equation 6), whereas I is an identity matrix of the same size as matrix T; S is the matrix of constraints (Supplementary material 2, Equation S8, see http://hdl.handle.net/11529/10548356) that makes the covariances of the DCMLGSI values among stages null. Matrix K transforms the OCMLGSI vector of coefficients into the DCMLGSI vector of coefficients and is the only difference between Equations (6) and (9). At stage 1, (no constraints), and , the OCMLGSI vector of coefficients.

The maximized DCMLGSI selection response for stage two is(10)where is the DCMLGSI selection intensity at stage 2. The total selection response for stages 1 and 2 is . In Appendix 1 (Equation A10), we indicate how to estimate .

The maximized correlation between H (Equation 4) and DCMLGSI for stage 2 is

(11)Note that the only difference between Equations (10) and (7), and Equations (11) and (8) is the vector of coefficients of each index.

### The OCMLGSI and DCMLGSI selection intensities for stages 1 and 2

The OCMLGSI selection intensities for stages 1 and 2 ( and , respectively) are those values associated with the maximum value of , which were obtained with the method described in Appendix 2 (Equations A11 and A12). We obtained the DCMLGSI selection intensities for stages 1 and 2 ( and , respectively) using the Xu and Muir (1992) method. The value of and should maximize the total DCMLGSI selection response .

### The genomic estimated breeding values for the OCMLGSI and DCMLGSI

Several authors (Ceron-Rojas *et al.* 2015; Isik *et al.* 2017; Cerón-Rojas and Crossa 2018, 2019) have given detailed descriptions of how to obtain genomic estimated breeding values (GEBV) that are predictors of unobservable individual trait breeding values. In the OCMLGSI and DCMLGSI context, we fitted, in a statistical model, phenotypic and marker data from the training population to estimate all available marker effects and obtain the GEBV (Appendix 1, Equations A3 and A4 for additional details).

### Testing the OCMLGSI and DCMLGSI normality assumption

Several authors (Shapiro and Wilk 1965; Mohd-Razali and Bee-Wah 2011; Rani Das and Rahmatullah-Imon 2016) have given details of how to perform a normality test procedure on a dataset, and many statistical packages provide graphs and normality tests (Crawley 2015).

For the real dataset, we corroborated the OCMLGSI and DCMLGSI normality assumption at stage 2 using graphical methods (histograms and normal Quantile-Quantile plots) and analytical test procedures (the Shapiro-Wilk and Kolmogorov-Smirnov normality tests). The corroboration procedure was as follows. In a two-stage context, let be the fixed total proportion retained, where and denote the proportion selected at stages 1 and 2, respectively, and let n be the size of the dataset at stage 1; then will be the size of the selected individuals at stage 1. We used the individual genotypes and traits at stage 2 to construct graphs and statistical tests to corroborate the OCMLGSI and DCMLGSI normality assumption.

### Criteria for comparing OCMLGSI efficiency *vs.* DCMLGSI efficiency

The criteria to compare OCMLGSI efficiency *vs.* DCMLGSI efficiency were that the total OCMLGSI and DCMLGSI selection responses ( and , respectively) should be lower than, or equal to, the single-stage CLGSI selection response (R) described by Dekkers (2007) and Céron-Rojas and Crossa (2018, Chapter 5). In addition, the maximized correlation between the net genetic merit and the OCMLGSI and DCMLGSI (Equations 9 and 11, respectively) should be a maximum at each stage. Thus, the greater the OCMLGSI and DCMLGSI correlation with the net genetic merit, the more effective they are at predicting the net genetic merit and the selection response at each stage.

### Real data

We used a real maize (*Zea mays* L.) F2 population with 247 genotypes, 195 markers and 4 phenotypic traits: grain yield (GY, t/ha), plant height (PHT, cm), ear height (EHT, cm), and anthesis days (AD, d) to compare OCMLGSI efficiency *vs.* DCMLGSI efficiency to predict the net genetic merit. Beyene *et al.* (2015) described this dataset and denoted it as JMpop1 DTMA Mexico optimum environment. We assumed that the breeding objective was to increase GY while decreasing PHT, EHT, and AD. The vector of economic weights for those traits was , while the total proportions () of retained values were 0.05, 0.10, 0.20 and 0.30 for both indices.

The estimated matrices of P, C, and Γ for all four traits were, respectively. At stage 1 we used matrix and the vector of economic weights w to estimate the OCMLGSI and DCMLGSI selection responses (**Appendix 1**, Equation A9 and A10, respectively). At stage 2 we used all three matrices (, , and ) to estimate matrices T and Ψ (Equation 5) as and , respectively, from where we estimated the OCMLGSI and DCMLGSI selection responses (**Appendix 1**, Equations A9 and A10, respectively).

### Simulated datasets

The datasets were simulated with QU-GENE software (Podlich and Cooper 1998) by Ceron-Rojas *et al.* (2015) using 2500 molecular markers and 315 QTL for eight phenotypic selection cycles (C0 to C7), each with four traits (, , , and ), 500 genotypes and 4 replicates of each genotype. The authors distributed the markers uniformly across 10 chromosomes and the QTL randomly across the 10 chromosomes to simulate maize (*Zea mays* L.) populations. A different number of QTL affected each of the four traits: 300, 100, 60, and 40, respectively. The common QTL affecting the traits generated genotypic correlations of -0.5, 0.4, 0.3, -0.3, -0.2, and 0.1 between and , and , and , and , and , and , respectively. The economic weights for , , , and were 1, -1, 1, and 1, respectively, in all selection cycles. To illustrate the efficiency of the OCMLGSI and DCMLGSI to predict the net genetic merit, in this work we used six selection cycles (C1 to C6) with 0.05, 0.10 and 0.20 in each cycle. We selected all four traits in each selection cycle.

### Data availability

The real and simulated datasets are available in the *Application of a Genomics Selection Index to Real and Simulated Data* repository, at http://hdl.handle.net/11529/10199. The real dataset used in this work is the folder named “File Real_Data_Sets_GSI” which contains four folders named “DATA_SET-3, 4, 5, and 6”. Each of the four folders contains four Excel data files. The four Excel data files within the folder DATA_SET-3 are as follows: DATA_SET-3_Markers_Cycle-0, 1, 2, and DATA_SET-3_Phenotypic_Cycle-0. The first three Excel files contain the marker-coded values for cycles 0, 1, and 2, while the Excel file DATA_SET-3_Phenotypic_Cycle-0 contains the phenotypic information of cycle 0 (training population). The Excel data files of the other folders were described in a similar manner as for folder 3. In this work, we used dataset 3 for cycle 0 to make selections and to estimate the selection response and the correlation of the OCMLGSI and DCMLGSI with the net genetic merit. The results are presented in Table 1.

Folder Simulated_Data_GSI contains two folders: Data_Phenotypes_April-26-15 and Haplotypes_GSI_April-26-15. In turn, folder Data_Phenotypes_April-26-15 also contains two folders: GSI_Phenotypes-05 and PSI_Phenotypes-05. Within folder GSI_Phenotypes-05, there are six Excel data files, each denoted as C2_GSI_05_Pheno, C3_GSI_05_Pheno, C4_GSI_05_Pheno, C5_GSI_05_Pheno and C6_GSI_05_Pheno, corresponding to the phenotypic simulated information for the genomic selection index for cycles 2-7. In addition, folder GSI_Phenotypes-05 contains eight Excel datasets denoted as C0_Pheno_05, C1_PSI_05_Pheno, C2_PSI_05_Pheno, C3_PSI_05_Pheno, C4_PSI_05_Pheno, C5_PSI_05_Pheno, C6_PSI_05_Pheno, and C7_PSI_05_Pheno corresponding to the phenotypic simulated information for the phenotypic selection index for cycles 0-7. File Haplotypes_GSI_April-26-15 contains the haplotypes of the markers for cycles 0-7 of GSI. We present the results of the simulated datasets in Tables 2 and 3 for cycles 1 to 6.

### Matching phenotypic and genomic real data

To estimate the OCMLGSI and DCMLGSI parameters and make selections, we use the following two Excel files: (1) “*DATA_SET-3_Phenotypic_Cycle-0*” (which contains the raw phenotypic data) and (2) the “*DATA_SET-3_Markers_Cycle-0*” (which contains the coded molecular markers). Both datasets are in the folder named “** DATA_SET-3**”.

### Matching phenotypic and genomic simulated data

To estimate the OCMLGSI and DCMLGSI parameters and to make selections, we used the data of two folders. The first one is called “PSI_Phenotypes-05” (which contains the raw phenotypic data of six Excel files named: C1_PSI_05_Pheno to C6_PSI_05_Pheno) and a second one named “Haplotypes_GSI_April-26-15”(which contains the raw marker data of six text files named: C1_PSI_S2_05_Haplo.pop to C6_PSI_S2_05_Haplo.pop). Both datasets are in the folder named “** Simulated_Data_GSI**”. To estimate the OCMLGSI and DCMLGSI parameters, the foregoing files were matched as follows. For selection cycle 1, we matched the Excel file C1_PSI_05_Pheno with the text files C1_PSI_S2_05_Haplo.pop; for selection cycle 2, we matched the Excel file C2_PSI_05_Pheno with the text files C2_PSI_S2_05_Haplo.pop, etc. Finally, in cycle 6, we matched the Excel file C6_PSI_05_Pheno with the text files C6_PSI_S2_05_Haplo.pop.

## Results

### Real data

#### Estimated OCMLGSI parameters for stages 1 and 2:

The estimated OCMLGSI values at stages 1 and 2 were and , respectively, where and were the estimated vector or coefficient (Appendix 1, Equation A9), (Appendix 1, Equations A3 and A4) was a vector of GEBV associated with the vector of traits y, and . The maximum estimated total OCMLGSI selection response was , where and were the estimated selection responses at each stage, and matrix was the adjusted matrix for prior selection on (Appendix 3, Equation A13).

Figure 1 shows the theoretical relationship between one truncation point () value, the proportion retained (), and the density values () of the truncation point at stage 1, while Figure 2 describes the theoretical relationship between two truncation point ( and ) values and their density values [, Appendix 2, Equation A11] for two stages. In Appendix 2 (Equation A12), we described a method to obtain the OCMLGSI selection intensities ( and ) that maximize for both stages. We found the and values as follows: for a fixed value , we used an iterative process with an R-code (Kabakoff 2011) where, by successively changing the possible values of (), , and , we found the maximum value of the estimated total OCMLGSI selection response (Figure 3). Thus, for , the values of the truncation points ( and ), proportions retained ( and ) and selection intensities ( and ) at both stages, were those associated with the maximum value (Figure 3).

In the OCMLGSI and DCMLGSI context, , , that is, the only parameter that changes is because p is fixed. The same is true for the truncation points ( and ) because and , where is the inverse function of the standard normal distribution (Xu and Muir 1991, 1992). Thus, in this context, , , and values are mainly associated with the possible values of .

Figure 3 presents the values on the *y*-axis, whereas the *x*-axis presents the possible values of the combinations of and for all possible realizations of , and for this reason, the *x*-axis takes values from 1 to 4676. For *x*-axis number 1, we should have a possible combination of and values for a possible realization of , which we could denote as , while for *x*-axis number 4676, we should have an additional realization denoted as . The and values are different and their values should appear in the graph and on the *y*-axis. The value was between and .

In the single-stage context, when , , and the estimated CLGSI selection response was , which should be higher than, or equal to (Young 1964; Saxton 1983). In this case, explained 97.27% of the value. Thus, both selection responses were similar. In addition, because matrix had more information than matrices and (Appendix 1, Equations A5 to A9), was higher than the estimated LGSI and LPSI selection responses in the single-stage context ( and , respectively), for , and, in addition, (see Supplementary material 3 for details).

For 0.10, 0.20, and 0.30, the values were 6.97, 5.57, and 4.62, whereas the values were 7.16, 5.71, and 4.73, respectively (Table 1). Thus, the values explained 97.35, 97.55, and 97.68%, respectively, of the values. That is, the estimated selection responses of OCMLGSI and CLGSI were very similar.

The estimated maximized OCMLGSI correlations with the net genetic merit (Equation 8) at stages 1 and 2 were and , respectively, where and were the standard deviations of the variances of and , whereas and were the estimated standard deviations of the net genetic merit (Equation 4) at each stage. Matrix was the adjusted matrix for prior selection on (**Appendix 3**, Equation A14) and because matrices and had more information than matrices and , .

### Estimated DCMLGSI parameters for stages 1 and 2

The estimated DCMLGSI values for both stages were and , where was the vector of economic weights and was the estimated vector of coefficients, whereas , and y were described earlier. For , the DCMLGSI values of the truncation points ( and ), proportions retained ( and ) and selection intensities ( and ) maximized (Figure 3) and were obtained with the Xu and Muir (1992) method. In this case, (Appendix 1, Equation A10) explained 91.68% of the estimated CLGSI selection response () and was lower than the estimated total OCMLGSI selection response ().

For 0.10, 0.20, and 0.30, the values (6.49, 5.10, and 4.17, respectively) explained 90.64, 89.32, and 88.16%, respectively, of the values (7.16, 5.71, and 4.73, respectively) (Table 1). Thus, for all p values, the estimated total OCMLGSI response was higher than the estimated total DCMLGSI response.

The estimated DCMLGSI correlations with the net genetic merit at stages 1 and 2 were and , respectively, where and were the standard deviations of the variances of and , respectively, whereas and were the estimated standard deviations of the variances of the net genetic merit (Equation 4). In this case, .

### Truncation points, proportion retained and selection intensities

Table 1 presents the OCMLGSI and DCMLGSI truncation points, proportions retained and selection intensities for 0.05, 0.10, 0.20, and 0.30 in a two-stage context. When the values changed from 0.05 to 0.30, the truncation point values decreased, the proportions retained values increased and the selection intensity values decreased in both indices, as we would expect. In addition, while the DCMLGSI truncation points and proportions retained values were the same at both stages, the OCMLGSI truncation point values at stage 1 were lower than at stage 2, and then . For this reason, the OCMLGSI selection intensity was different from the DCMLGSI selection intensity.

### Simulated data

#### Estimated maximized OCMLGSI and DCMLGSI selection responses:

For 0.05, 0.10, and 0.20, Table 2 presents the estimated maximized OCMLGSI and DCMLGSI selection responses (, , ) and the estimated maximized single-stage CLGSI responses (, , ) for six simulated selection cycles in a two-stage breeding selection scheme. For 0.05, the average of the estimated total OCMLGSI selection responses (17.47) explained 98.10% of the average of the estimated CLGSI selection responses (17.81), whereas for 0.10 and 0.20, the average of the estimated total OCMLGSI selection responses (14.89 and 11.93, respectively) explained 98.28% and 99.42% of the average of the CLGSI selection response (15.15 and 12.08, respectively). Thus, for this dataset, the OCMLGSI and CLGSI results were equivalent for all p values.

For 0.05, the average of the estimated total DCMLGSI selection responses (16.50) explained 92.64% of the average of the CLGSI selection responses (17.81), whereas for 0.10 and 0.20, the average of the estimated total DCMLGSI selection responses (13.92 and 10.98, respectively) explained 91.88% and 90.89% of the average of the CLGSI selection responses (15.15 and 12.08, respectively).

The foregoing results indicate that while the average of the total OCMLGSI selection responses (for all p values) explained 98.60% of the average of the CLGSI, the average of the total DCMLGSI selection responses (for all pvalues) explained 91.80% of the average of the CLGSI. Thus, OCMLGSI accuracy was higher than DCMLGSI accuracy for predicting the selection response.

#### Estimated OCMLGSI and DCMLGSI correlations with the net genetic merit:

Table 3 presents the estimated and maximized values of the OCMLGSI ( and ) and DCMLGSI ( and ) correlations with the net genetic merit in a two-stage context for six simulated selection cycles. At stage 1, the averages of the estimated OCMLGSI and DCMLGSI correlations were the same. However, at stage 2, the average of the estimated OCMLGSI correlations with net genetic merit was 40.0% higher than the average of the estimated DCMLGSI correlations for six simulated selection cycles.

#### Histograms and quantile-quantile plots for the estimated OCMLGSI and DCMLGSI values at stage two:

We used the real dataset selected in cycle 1 to test the normality assumption of the estimated OCMLGSI and DCMLGSI values at stage 2. For 0.05 and 0.30, the values for OCMLGSI were 0.27 and 0.63, while those values for DCMLGSI were 0.22 and 0.55, respectively. Then, at stage 2, and were the number of genotypes for OCMLGSI, whereas for DCMLGSI, the number of genotypes were and , where 247 was the number of genotypes at stage 1. With these genotypes, we constructed histograms (Figure 4) and quantile–quantile plots (Figure 5) of the OCMLGSI and DCMLGSI values at stage 2. If the OCMLGSI and DCMLGSI values have normal distribution, the histograms of the values of both indices should not show a strong negative or positive skew in the OCMLGSI and DCMLGSI values seen in the histogram. Similarly, if the OCMLGSI and DCMLGSI values are normally distributed, then they should form a straight line in the quantile–quantile plots. If there are departures from normality, the OCMLGSI and DCMLGSI values should show up as various kinds of non-linearity, *e.g.*, S-shaped or banana-shaped in the quantile–quantile plots (Crawley 2015).

When the number of genotypes changed from 67 (Figure 4a) to 156 (Figure 4b), the estimated OCMLGSI values were closer to the normal distribution. The same was true for the estimated DCMLGSI values when the number of genotypes changed from 54 (Figure 4c) to 136 (Figure 4d). In addition, the quantile–quantile plots indicate that when the number of genotypes changed from 67 (Figure 5a) to 156 (Figure 5b), the estimated OCMLGSI values tended to form a straight line. The same was true for the estimated DCMLGSI values when the number of genotypes changed from 54 (Figure 5c) to 136 (Figure 5d).

#### Normality test for the estimated OCMLGSI and DCMLGSI values at stage two:

For the real dataset, we present the results of the Shapiro-Wilk and Kolmogorov-Smirnov normality tests of the estimated OCMLGSI and DCMLGSI values at stage 2 when the number of genotypes was 67 and 156 for OCMLGSI, and 54 and 136 for DCMLGSI. We tested the null hypothesis that the estimated OCMLGSI and DCMLGSI values have normal distribution. The statistical value of the Shapiro-Wilk test should be close to 1.0 to accept the null hypothesis, while the statistical value of the Kolmogorov-Smirnov test should be close to 0.0 to accept the null hypothesis (Rani Das and Rahmatullah-Imon 2016). In the present case, for the OCMLGSI values (67 and 156), the Shapiro-Wilk test values were 0.976 and 0.987, respectively, while the Kolmogorov-Smirnov test values were 0.088 and 0.089, respectively. Thus, the null hypothesis was true for the estimated OCMLGSI values. Similarly, for the DCMLGSI values (54 and 136), the Shapiro-Wilk test values were 0.988 and 0.984, respectively, while the Kolmogorov-Smirnov test values were 0.067 and 0.058, respectively. Thus, we accepted that the estimated DCMLGSI values approach the normal distribution.

## Discussion

### The DCMLGSI restrictions imposed on the covariance values

The DCMLGSI imposed the restriction that the covariance between DCMLGSI values among stages be zero. This restriction was to ensure the existence of solutions for the truncation points at different stages without resorting to numerical multiple integration (Xu and Muir 1992; Xie and Xu 1998). However, the restriction decreased the estimated DCMLGSI selection response and the estimated DCMLGSI correlation with the net genetic merit after stage 1. Xu and Muir (1992) indicated that the loss of DCMLGSI efficiency after stage 1 is justified because their method for obtaining the selection intensities and total responses gives the breeder the opportunity to implement an unlimited number of selection stages, which would otherwise be very difficult or impossible to do.

Xu and Muir (1991; 1992) indicated that the restriction imposed on the covariance between DCMLGSI values is similar to the Kempthorne and Nordskog (1959) restriction imposed on the expected genetic gain per trait, which makes some traits not change their mean values while the rest of the trait means remain without restrictions (Cerón-Rojas and Crossa, 2018, Chapter 3). Xu and Muir (1992) and Kempthorne and Nordskog (1959) used a projection matrix (*e.g.*, K) to project the LPSI vector of coefficients (*e.g.*, δ) into a space smaller than the original space of δ. The reduction of the space into which the Kempthorne and Nordskog (1959) matrix projects δ is equal to the number of zeros that appears on the expected genetic gain per trait, and the selection response and accuracy decrease as the number of restrictions increases (Cerón-Rojas and Crossa 2018, Chapter 3). However, it is not clear if under the Xu and Muir (1992) restrictions, the selection response and accuracy decrease as the number of stages increases. If this were true, the Xu and Muir (1992) method would not give the breeder the opportunity to implement an unlimited number of stages, because the selection response and accuracy would decrease as the number of stages increases and soon would be null. For example, Xie *et al.* (1997) compared the estimated single-stage LPSI selection response with the estimated DMLPSI selection response for two and three stages and found that at stages 2 and 3, the estimated total DMLPSI selection response explained only 92 and 87%, respectively, of the estimated LPSI selection response. That is, at stage 3, the estimated total DMLPSI selection response was lower (5%) than at stage 2.

### The OCMLGSI and DCMLGSI vector of coefficients

The OCMLGSI is an optimum index and we obtained its vector of coefficients based on the OMLPSI (Young 1964) and LPSI (Smith 1936) theory. For this reason, the OCMLGSI vector of coefficients was easier to obtain than the DCMLGSI vector of coefficients. This last vector is a linear transformation of the OCMLGSI vector of coefficients made by a transforming matrix (K) which is idempotent () and projects the OCMLGSI vector of coefficients (β) into a space smaller than the original space of β (Cerón-Rojas and Crossa 2018, Chapter 3). Xu and Muir (1992) and Xie and Xu (1998) did not identify that matrix, and for this reason, the equations they used to estimate the decorrelated vector of coefficients seem very complex. This matrix makes the DCMLGSI values independent among stages and is the base for assuming that the DCMLGSI values are normally distributed after stage one. However, due to this matrix, the correlation of the DCMLGSI with the net genetic merit and its selection response after stage one are lower than the correlation of the OCMLGSI with the net genetic merit and its selection response after stage one. Xu and Muir (1992) and Xie and Xu (1998) indicated that the loss of efficiency is justified because their method for obtaining the selection intensities and total responses gives the breeder the opportunity to implement an unlimited number of selection stages, which otherwise would be very difficult or impossible to do. However, Börner and Reinsch (2012) indicated that decorrelated indices should not be used due to the availability of accurate and fast algorithms for exact multidimensional integration to find the selection intensities for the OCMLGSI.

Mi *et al.* (2014) described an R-code algorithm in the multistage context, but it is useful only when the selection is made for a trait at each stage. That is, up to now, there is no quick algorithm for finding the selection intensities for the OCMLGSI for more than two stages, and for this reason, in this work we described a method to find the OCMLGSI selection intensities in the two-stage context.

### The multivariate normality assumption

Based on the normality assumption of the estimated OCMLGSI, DCMLGSI, GEBV, and phenotypic values, we developed and applied the OCMLGSI and DCMLGSI to the real and simulated datasets. The histograms, quantile–quantile plots, and the Shapiro-Wilk and Kolmogorov-Smirnov normality tests of the OCMLGSI and DCMLGSI values at stage two indicated that these values approached the normal distribution. The multivariate normality distribution is very important for breeding plant and animal quantitative traits because these traits show continuous variability and are the result of many gene effects interacting among themselves and with the environment. That is, quantitative traits are the result of unobservable gene effects distributed across plant or animal genomes, which interact among themselves and with the environment to produce the observable characteristic plant and animal phenotypes (Falconer and Mackay 1996). Under the multivariate normal distribution assumption, the indices, the traits, and GEBV can be described using only means, variances, and covariances. In addition, if the traits are not correlated, they are independent. Linear combinations of traits are also normal; and even when the trait phenotypic values do not have normal distribution, this distribution serves as a useful approximation, especially in inferences involving sample mean vectors, which, by the central limit theorem, have multivariate normal distribution (Rencher 2002). By this reasoning, a fundamental assumption in this work was that the net genetic merit and each index have multivariate normal distribution. Under the latter assumption, the regression of the net genetic merit on any linear function of the phenotypic and GEBV values was linear and the total OCMLGSI selection response was the sum of the responses obtained at each stage.

### DCMLGSI and OCMLGSI

The DCMLGSI is an application of the Xie and Xu (1998) index to the genomic selection (GS) context. Based on the LMSI (Lande and Thompson 1990) and on the DMLPSI (Xu and Muir 1992) theoretical results, Xie and Xu (1998) developed their multistage index in the MAS context before Meuwissen *et al.* (2001) GS theory. For this reason, those authors used molecular scores instead of GEBV to predict the net genetic merit at each stage. Because the Xie and Xu (1998) index has the same theoretical and practical problems as the LMSI indicated in the Introduction of this work, we extended the Xie and Xu (1998) index to the GS context and developed the DCMLGSI, which uses GEBV instead of molecular scores in the prediction. This index is free of the problems of the Xie and Xu (1998) index associated with the LMSI. However, because the DCMLGSI is based on the DMLPSI, it has the same advantages and disadvantages as the DMLPSI, indicated in the Introduction of this work.

The OCMLGSI is an application of the OMLPSI (Young 1964; Cerón-Rojas *et al.* 2019 a and b) to the GS context based on the Xie and Xu (1998) idea. The OCMLGSI has the same advantages and disadvantages as the OMLPSI, as indicated in the Introduction of this work, and is an optimum index. In this work, we showed that its selection response and correlation with the net genetic merit is higher than the DCMLGSI selection response and correlation with the net genetic merit.

The OCMLGSI and the DCMLGSI exploit the linkage disequilibrium between markers and QTL that is produced when inbred lines are crossed, which is useful for identifying markers correlated with the traits of interest and for obtaining GEBV (Meuwissen *et al.* 2001). Börner and Reinsch (2012) indicated that GS could replace traditional progeny testing when maximizing the genetic gain per year, as long as the accuracy of GEBV is higher than 0.45. For the simulated dataset, the correlation values between the GEBVs and the traits’ true breeding values in cycle two were 0.52, 0.74, 0.69, and 0.73 for each of the four traits, respectively, whereas in cycle seven, those correlations were 0.40, 0.55, 0.54, and 0.50 for each of the four traits (Cerón-Rojas and Crossa 2019). In all selection cycles, the estimated correlations were higher than, or equal to, 0.45; thus, the GEBVs obtained with the simulated datasets were good predictors of the individual trait breeding values, and the OCMLGSI and the DCMLGSI were good predictors of the net genetic merit because both indices are linear combinations of GEBV at stage 1, and of the GEBV and phenotypic values at stage 2.

The OCMLGSI and the DCMLGSI can only be used in training populations when there is phenotypic and marker information, while the LGSI (Ceron-Rojas *et al.* 2015; Cerón-Rojas and Crossa 2019) is used in testing populations where there is only marker information. However, because both indices incorporate more information than the single-stage LPSI and LGSI, their selection response and correlation with the net genetic merit are higher than the LPSI and the LGSI selection response and correlation with the net genetic merit in all selection cycles. This is the main reason why the OCMLGSI should be used instead of the LPSI and the LGSI.

### Method for obtaining the OCMLGSI selection intensity

The method used in this work to obtain the OCMLGSI selection intensities in a two-stage context is simple and can be programmed in a computer using an R code. Cochran (1951) and Young (1964) described methods to obtain OMLPSI intensities in the two-stage context; however, such methods overestimate the OMLPSI selection intensity (Cerón-Rojas *et al.* 2019 a and b). The method proposed here was good for obtaining the selection intensity values of OCMLGSI in a two-stage context and did not overestimate the OCMLGSI selection intensity. Thus, breeders should use the proposed method when they perform multistage selection.

### The estimated total DCMLGSI selection response was counter-intuitive

The estimated total OCMLGSI and DCMLGSI selection response should be lower, or equal to, the single-stage CLGSI. This implies that when the total proportion selected (*p*) increased, *e.g.*, from 0.05 to 0.30, the estimated total OCMLGSI and DCMLGSI selection response should tend to be more similar to the estimated single-stage CLGSI selection response. This was true for the estimated total OCMLGSI selection response but not true for the estimated total DCMLGSI selection response for the real and simulated datasets. Thus, for the real dataset, when 0.05, 0.10, 0.20, and 0.30, the estimated total OCMLGSI selection response explained 97.27, 97.35, 97.55, and 97.68%, respectively, of the values, while the estimated total DCMLGSI selection response explained 91.68, 90.64, 89.32, and 88.16%, respectively, of the values. That is, the estimated total DCMLGSI selection response decreased when p values increased. We found similar results for the simulated dataset. Our results were in accordance with those of Börner and Reinsch (2012) and Cerón-Rojas *et al.* (2019b) when these authors compared the optimum with the decorrelated multistage indices in the genomic and phenotypic selection context, respectively. Börner and Reinsch (2012) called the decorrelated multistage index results counter-intuitive and difficult to interpret. Thus, we are in agreement with Börner and Reinsch (2012) that breeders should not use decorrrelated indices when they make multistage selection.

## Conclusion

We evaluated the relative efficiency of two combined multistage linear genomic selection indices. We determined the efficiency of both indices based on the estimated total selection response and correlation of each index with the net genetic merit using real and simulated datasets. In both datasets, we found that the OCMLGSI was a better predictor of the net genetic merit than the DCMLGSI. Therefore, breeders should use the OCMLGSI when performing multistage selection.

## Acknowledgments

We are grateful for the financial support provided by the Bill & Melinda Gates Foundation and CIMMYT’s CGIAR CRP (maize and wheat), as well as the USAID projects (Cornell University and Kansas State University). We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806.

## APPENDIX 1

### Genomic breeding values

The *i*^{th} phenotypic value () can be denoted as , where is the breeding value (*i.e.*, the average additive effects of the genes an individual receives from both parents) and is the residual. It is assumed that and are independent and that both have normal distribution with an expectation equal to zero and variance and , respectively. This means that the vector of n observations for trait i (1, 2,…,t; number of traits) can be written as , whereas the vector of genomic breeding values associated with vector () can be written as(A1)where X is an matrix ( number of observations and number of markers in the population) of coded marker values (, , and for genotypes *AA*, *Aa*, and *aa*, respectively) associated with the additive effects of the QTL and is a vector with m additive effects of the QTL associated with markers that affect the *i*^{th} trait. It is assumed that has multivariate distribution (MVN) with mean **0** and covariance matrix , where is the additive genomic variance of and is the additive genomic relationship matrix; in an F_{2} population; θ is the frequency of allele *A* and is the frequency of allele *a* in the *j*^{th} marker ().

Let (1, 2,…,t and ) be the *j*^{th} element of the vector of true genotypic breeding values ; then the covariance between and is . That is, the covariance between and is equal to the additive genomic variance of (Dekkers 2007). Let and be the vectors of t traits for the genomic and true breeding values, respectively; then(A2)is the covariance matrix of genomic breeding values.

### Estimating marker effects

Let be a vector of marker effect values associated with t traits. The *i*^{th} vector (1, 2,…,t) of marker effects in the training population can be estimated as , where , , , and the other parameters were defined earlier. In addition, we can estimate vector in the multi-trait context as(A3)where , “⊗” denotes the Kronecker product (Schott 2005); c and X were defined in Equation (A1); , R and C are the residual and genotypic covariance matrices for t traits, respectively; ∼ MVN(μ, V) is a vector of size , with covariance matrix ; and are identity matrices of size and , respectively; is a vector t of means, and 1 is a vector of n 1’s.

### Estimating genomic breeding values

We can estimate the vector of genomic breeding values for t traits and n observations as(A4)Equation (A4) is the vector of GEBV for the multi-trait case.

### Estimating the genomic covariance matrix Γ

Let and (Equation A4) be the genomic estimated breeding values (GEBV) of and (Equation A1), respectively, and denote by and the arithmetic means of the values of and . We can estimate matrix Γ (Equation A2) as(A5)where is the covariance between and values (1, 2, …, t); g is the number of genotypes; 1 is a vector of g 1’s and is the additive genomic relationship matrix.

### The phenotypic model to estimate the variance components

We estimated matrices P and C using restricted maximum likelihood (REML) because this estimation method does not require a specific design or balanced data and is useful for estimating genetic and residual variance and covariance in any arbitrary pedigree of individuals. In addition, the Expectation and Maximization algorithm allows computing the REML for the variance components (Lynch and Walsh 1998).

Letbe the phenotypic model, where is a vector of g (number of genotypes in the population) phenotypic values for the trait which has multivariate normal distribution (NMV). That is, ∼ NMV (,), where 1 is a vector of g ones, is the mean of the trait, Z is an identity matrix ; ∼NMV (**0**, ) is a vector of true breeding values, and ∼NMV(**0**,) is a vector of residuals. Matrix A denotes the numerical relationship matrix between individuals (Lynch and Walsh 1998), and . We estimated and assuming absence of dominance and epistatic effects.

### Estimating matrices P and C using the Expectation and Maximization algorithm

The Expectation and Maximization algorithm allows computing the REML for the variance components and by iterating the following equations:(A6)and(A7)where, after n iterations, and are the estimated variance components of and , respectively. In Equations (A6) and (A7), denotes the trace of the matrices within brackets; and is the inverse of matrix . In , is the inverse of matrix .

The additive genetic and residual covariances between the observations of the and *i*^{th} traits, and (and , 1,2,…,t), can be estimated with REML by adapting Equations (A7) and (A8) as follows. The variance of the sum of and can be written as , where is the variance of and is the variance of ; in addition, is the covariance of and , whereas and are the additive and residual covariances, respectively, associated with the covariance of and . Thus, one way of estimating and is using the following equation:(A8)for which Equations (A6) and (A7) can be used.

### Estimating the OCMLGSI and DCMLGSI selection response at stages 1 and *2*

By Equations (A5) to (A8), the estimates of matrices P, C, and Γcan be denoted as , , and , from where the estimated block matrices of Equation (5) can be denoted as and . Thus, at stages 1 and 2, the estimated OCMLGSI selection responses (Equations 2 and 7) are(A9)where w is the vector of economic weights, and is an estimate of . In a similar manner, the estimated DCMLGSI selection responses at stages 1 and *2* are(A10)where is an estimate of .

## APPENDIX 2

### The OCMLGSI selection intensity for two stages

We describe a method to obtain the OCMLGSI selection intensity for a fixed total proportion , where and are the proportions of individuals selected at stages 1 and 2, respectively. Let and be the OCMLGSI at stages 1 and 2, respectively, and assume that the indices have bivariate normal distribution. Let and be transformed as and with mean zero and variance 1.0, where and are the means, whereas and are the standard deviations of the variance of and , respectively. The selected population has bivariate left truncated normal distribution with probability density function , where ,(A11)and is the correlation between and .

Consider the transformations (Springer 1979, Chapter 3): and , with Jacobian j, where , denotes the determinant function and ∂ the partial derivatives of and with respect to and . Thus, and . The transformations indicate that variables and are independent, each with a standard normal distribution.

Variables and are associated with the truncation points as and . This means that and values should be obtained in two steps. First, we obtained the values of and from two independent standard normal distributions; then we obtained the values of , , and ; finally, we obtained the OCMLGSI selection intensity at stages 1 and 2 as:(A12)respectively, where and are the height of the ordinate of the normal curve at the lowest values of and retained, whereas and are the proportions of the population of animals or plants selected at each stage (Figures 1 and 2). The values of Equation (A12), should maximize the value (Figure 3), where and are the selection responses, whereas and are the standard deviations of the variance of and at stages 1 and 2, respectively.

## APPENDIX 3

### Adjusting the OCMLGSI covariance matrices at stage two

Matrices and are affected by prior selection at stage 1, and it is necessary to adjust them to take into consideration the effects that prior selection has on them (Cochran 1951; Cunningham 1975). Because at stage 1 we performed only genomic selection, at stage 2 we adjusted matrices T and Ψ as follows:(A13)and(A14)where and are the adjusted matrices, and , where and are the selection intensity and truncation point, respectively, at stage 1; Γ and w were defined earlier. Thus, the maximized OCMLGSI selection response (Equation 7) and correlation between the OCMLGSI and the net genetic merit (Equation 8) at stage two can be written as and . We did not use Equations (A13) and (A14) to adjust matrices T and Ψ for the DCMLGSI because the values between stages of this index are independent.

## Footnotes

*Communicating editor: R. Wisser*

- Received February 22, 2020.
- Accepted April 18, 2020.

- Copyright © 2020 Cerón-Rojas and Crossa

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.