Increased Prediction Accuracy in Wheat Breeding Trials Using a Marker × Environment Interaction Genomic Selection Model

Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates of selection. Originally, these models were developed without considering genotype × environment interaction(G×E). Several authors have proposed extensions of the single-environment GS model that accommodate G×E using either covariance functions or environmental covariates. In this study, we model G×E using a marker × environment interaction (M×E) GS model; the approach is conceptually simple and can be implemented with existing GS software. We discuss how the model can be implemented by using an explicit regression of phenotypes on markers or using co-variance structures (a genomic best linear unbiased prediction-type model). We used the M×E model to analyze three CIMMYT wheat data sets (W1, W2, and W3), where more than 1000 lines were genotyped using genotyping-by-sequencing and evaluated at CIMMYT’s research station in Ciudad Obregon, Mexico, under simulated environmental conditions that covered different irrigation levels, sowing dates and planting systems. We compared the M×E model with a stratified (i.e., within-environment) analysis and with a standard (across-environment) GS model that assumes that effects are constant across environments (i.e., ignoring G×E). The prediction accuracy of the M×E model was substantially greater of that of an across-environment analysis that ignores G×E. Depending on the prediction problem, the M×E model had either similar or greater levels of prediction accuracy than the stratified analyses. The M×E model decomposes marker effects and genomic values into components that are stable across environments (main effects) and others that are environment-specific (interactions). Therefore, in principle, the interaction model could shed light over which variants have effects that are stable across environments and which ones are responsible for G×E. The data set and the scripts required to reproduce the analysis are publicly available as Supporting Information.

genomic selection multienvironment genomic best linear unbiased prediction (GBLUP) marker · environment interaction International Bread Wheat Screening Nursery GenPred shared data resource The presence of genotype · environment(G·E) interactions in agricultural experiments usually is expressed as changes in the relative performance of genetic materials across environments; this can man-ifest as modifications of the ranking of genotypes across environments or simply as changes in the absolute difference in performance between pairs of genotypes. Accounting for G·E has always been a concern in the analysis of multienvironment plant breeding trials, and several models have been proposed and used for describing the mean response of genotypes over environments and for studying and interpreting G·E in agricultural experiments (e.g., Yates and Cochran 1938;Finlay and Wilkinson, 1963;Eberhart and Russell 1966). The statistical treatment of G·E has evolved over time due to the development of statistical methods and because of changes in the information available, including the increased availability of DNA markers and of precise environmental information. Some approaches deal with G·E implicitly, without explicitly modeling gene · environment interactions; these include some of the early treatment of G·E (e.g., the joint-regression analysis of Yates and Cochran 1938), as well as more modern methods such as the multivariate pedigree-or marker-based models where G·E is modeled using structured or unstructured covariance functions (Piepho, 1997(Piepho, , 1998Smith et al. 2005;Crossa et al. 2006;Burgueño et al. 2007). These approaches have proved to be effective for exploiting G·E; however, they do not shed light on the underlying basis of G·E (e.g., the relative contribution of different genetic regions to stability and to G·E). When genomic and environmental covariate data are available, G·E can be modeled explicitly by means of marker · environment interactions (M·E). This approach was first used with sparse marker data in quantitative trait loci (QTL) analysis (QTL·E Moreau et al. 2004) and also in multilocus models with markers that exhibited "significant" association with the trait of interest (Boer et al. 2007). The QTL·E approach has been further extended to multitrait, multienvironment settings (e.g., Malosetti et al. 2004Malosetti et al. , 2008. Recent developments in genotyping and sequencing technologies have made it possible to use dense genotypic information for genomic selection (GS) (Meuwissen et al. 2001). Empirical evidence obtained with plant and animal breeding data has demonstrated that GS can outperform the prediction accuracy of pedigree-based methods or that of models based on a reduced number of loci (de los Campos et al. 2009Crossa et al. 2010Crossa et al. , 2011Heslot et al. 2012;Pérez-Rodríguez et al. 2012). This has prompted the relatively fast adoption of GS in plant and animal breeding. GS models originally were developed for a single trait evaluated in a single environment, and most analyses published so far are based on within-environment analyses.
Recently, several studies have proposed using GS models that accommodate G·E. For instance, Burgueño et al.(2012) extended the single-trait, single-environment genomic best linear unbiased prediction (GBLUP) model to a multienvironment context and reported important gains in prediction accuracy with the multienvironment model relative to single-environment analysis. More recently, Heslot et al. (2014) and Jarquin et al. (2014) considered modeling G·E using both genetic markers and environmental covariates. These studies also showed that modeling G·E can give substantial gains in prediction accuracy.
Following ideas originally used for QTL analysis in multienvironment trials (Van Eeuwijk et al. 2005;Boer et al. 2007;Malosetti et al. 2004Malosetti et al. , 2008, we present GS models that accommodate G·E by explicitly modeling interactions between all available markers and environments. Relative to multivariate approaches where G·E is modeled using covariance parameters(e.g., Smith et al. 2005), the M·E approach has advantages and disadvantages. First, the M·E models presented here can be easily implemented using existing software for GS. Second, the model can be implemented using both shrinkage methods as well as variable selection methods. Third, the M·E model decomposes effects into components that are common across environments (stability) and environment-specific deviations; this information, which is not provided by standard multienvironment mixed models, can be used to identify genomic regions whose effects are stable across environments and others that are responsible for G·E. On the other hand, the M·E model imposes restrictions on the patterns of G·E, and, for reasons that we discuss in this article, the model is best suited for the joint analysis of positively correlated environments.
In this study, we applied the M·E model to extensive data sets where wheat lines were evaluated under contrasting environmental conditions in replicated field trials. This allowed us to identify under which conditions the M·E model is most effective. We show theoretically and demonstrate empirically that the magnitude of the main and interaction variance is directly related to the phenotypic correlation between environments and that the M·E model performs best when the set of environments analyzed showed positive and similar correlations. Indeed, when the set of environments analyzed had moderate or high positive correlations, the M·E model yielded substantial gains in prediction accuracy relative to an across-environment GS model that assumes homogeneity of effects across environments, and, depending on the prediction problem, it either performed similarly or outperformed the stratified (i.e., within-environment) analyses. In the rest of this article, we describe the methods used and present empirical results obtained when the M·E model was applied to three wheat data sets. We also provide, as online materials, scripts that implement the interaction models using the BGLR R-package (de los Campos and Pérez-Rodriguez 2014).

MATERIALS AND METHODS
The data used in this study are from CIMMYT's Global Wheat Program and consist of a set of wheat lines evaluated under managed environmental conditions; these conditions were designed to simulate target mega-environments. The wheat lines included in this study were later part of the 45th, 46th, and 47th International Bread Wheat Screening Nurseries and distributed worldwide.
Three files containing phenotypic and genotypic information on the three data sets used in this study (45th, 46th and 47th International Bread Wheat Screening Nurseries) are provided as Supporting Information, File S1, File S2, and File S3, respectively.

Phenotypic data
The phenotypic data consisted of adjusted grain yield(ton/ha) records collected during three evaluation cycles (W1: cycle 201022011, N = 732; W2: cycle 201122012, N = 672; and W3: cycle 201222013, N = 811); each cycle included a different set of advanced breeding lines. All trials were established at CIMMYT's main wheat breeding station at Cd. Obregon, Mexico. The experimental design was an alpha-lattice with three replicates per line and environment. Wheat lines were evaluated under three irrigation regimes(2i = two irrigations giving moderate drought stress, 5i = five irrigations representing an optimally irrigated crop, and 0i = no irrigation or drip irrigation, representing high drought stress), two planting systems (B = bed planting; F = planting on the flat) and two planting dates (N = normal and H = late, simulating heat at the grain-filling stage). In the 2i and 5i regimes, irrigation was applied without measuring soil moisture, and each irrigation added 100 mm of water. Some of the trials were managed using no-tillage(hereinafter denoted as Z). Table 1 gives the number of phenotypic records per simulated environment and cycle. The phenotype used in the analysis was the best linear unbiased estimate of grain yield obtained from a linear model applied to the alpha-lattice design of each cycle-environment combination.

Genotypic data
Genotypes were derived using genotyping-by-sequencing technology (GBS; Poland et al. 2012). GBS markers with a minor allele frequency lower than 0.05 were removed. As is typical of GBS genotypes, all markers had a high incidence of uncalled genotypes. In our quality control pipeline, we applied thresholds for incidence of missing values aimed at maintaining relatively large and similar numbers of markers per data set. To this end, we removed markers with more than 60% (W1 and W2) or 80% (W3) missing values; this left 15,744 (W1 and W2) and 14,217(W3) GBS markers available for analysis. Finally, only lines with more than 2000 called GBS markers were used in the data analysis; this left 693 (W1), 670 (W2), and 807(W3) lines.

Statistical models
For each evaluation cycle (W1, W2, and W3), we considered three approaches: (i) a stratified analysis obtained by regressing phenotypes on markers separately in each environment (we refer to this approach, indistinctively, as to single-environment, within-environment or stratified analysis); (ii) a combined analysis based on a GS model where marker effects are assumed to be constant across environments (i.e., ignoring M·E) (hereinafter referred to as the "across-environment" model); and (iii) using a M·E model that allows analyzing data from multiple environments jointly and accounts for G·E. Each of these approaches is discussed in the sections to follow.
Stratified analysis: This model is obtained by regressing the phenotype vector containing the records available in the j th environment, y j ¼ fy ij g, where i indexes lines (individuals) and j indexes environments, on markers using a linear model in the form: y ij ¼ m j þ P p k¼1 x ijk b jk þ e ij , (i = 1,2,. . .,n individuals; j = 1,2,. . .s environments; k = 1,2,. . .,p markers) or, in matrix notation, where m j is an intercept, X j ¼ fx ijk g is a matrix of marker-centered and standardized genotypes(i.e., each marker was centered by subtracting the mean and standardized by dividing by the sample standard deviation), b j ¼ fb jk g is a vector of marker effects and e j is a vector of model residuals. Note that, in a full-factorial design where all lines are evaluated in all environments, Following the standard assumptions of the GBLUP model (e.g., Vanraden, 2007Vanraden, , 2008, marker effects and model residuals were assumed to be independent of each other and both normally distributed: b j Nð0; Is 2 b j Þ, and e j Nð0; Is 2 ej Þ. Setting u j ¼ X j b j ; we have that the aforementioned model also can be represented as follows: was obtained using the cross-product of (centered and standarized) marker genotypes and scaled by dividing by the number of markers. Because all markers were standardized to a unit variance, this gives an average diagonal value of G j equal to one.
Box 1a in File S4 provides an R-script that implements the singleenvironment model described previously using the BGLR R-package (de los Campos and Pérez-Rodriguez 2014).
Across-environment GBLUP model: another approach consists of assuming that effects of markers are the same across environments, that becomes (assuming s = 3, for ease of notation): In a GBLUP-context, one will assume b Nð0; Is 2 b Þ and the aforementioned model can be represented as a random effect model as follows: 2 4 y 1 y 2 y 3 X 1 X 1 9 X 1 X 2 9 X 1 X 3 9 X 2 X 1 9 X 2 X 2 9 X 2 X 3 9 X 3 X 1 9 X 3 X 2 9 X 3 X 3 9 3 5 , p is a marker-derived genomic relationship matrix. It is worth noting that, for balanced data, the model of expression (2a) is equivalent to fitting a genomic regression model using the average performance of each line across environments as a phenotype. Box 2a in File S4 provides an R-script that implements the acrossenvironment model described above using the BGLR R-package (de los Campos and Pérez-Rodriguez 2014).

M·E GBLUP model:
In the model in expression (1a), marker effects (b j ) are estimated separately for each environment; therefore, in this model there is no borrowing of information across environments. On the other hand, in the model in expression (2a), all data are used to estimate marker effects; however, in that model borrowing of information is achieved by assuming that effects are constant across environments. We now consider an interaction model that aims at benefiting from borrowing information across environments while allowing marker effects to change across environments. In the M·E model, the effect of the k th marker on the j th environment (b jk ) is described as the sum of an effect common to all environments (b 0k ), plus a random deviation (b jk ) peculiar to the j th environment, that is b jk ¼ b 0k þ b jk . Therefore, the equation for data from the j th n or, in matrix notation and assuming, for ease of notation, only three environments, 2 4 y 1 y 2 y 3 where the vectors of main and interaction effects and model residuals were all assumed to be normally distributed, specifically: b 0 Nð0; Is 2 b0 Þ, b j Nð0; Is 2 bj Þ and e j Nð0; Is 2 e Þ. The aforementioned model can be represented as a two-variance component GBLUP model, specifically, letting y ¼ ðy 1 9; y 2 9; y 3 9Þ9, the model can be represented as where u 0 Nð0; s 2 u0 G 0 Þ, u 1 Nð0; G 1 Þ, e Nð0; Is 2 e Þ where G 0 is as described previously and 3 5 , p Figure 1 Box-plot of adjusted grain yield by environment and cycle. Environments are denoted by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage.
n In this model, the main effect (u 0 ) allows borrowing information between environments (through the off-diagonal blocks of G 0 ) and u 1 captures environment-specific effects. The relative importance of these two terms is determined by the corresponding variance components that are inferred from the data.
n a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat"(F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Model R 2 was computed as the ratio of the sum of the main and interaction variance, relative to the total variance (residual + main effect + interaction). Env, environment.
n Table 4 Estimates of variance components (estimated posterior SD) by model and environment, cycle 201122012 (W2) a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Model R 2 was computed as the ratio of the sum of the main and interaction variance, relative to the total variance (residual + main effect + interaction).

Models/Environments
Box 3a in File S4 provides an R-script that implements the M·E model described previously using the BGLR R-package (de los Campos and Pérez-Rodriguez 2014).

Software
The aforementioned models can be implemented using standard software for GS. For our implementation, we used the R (R Core Team 2013) package Bayesian generalized linear regression (BGLR, de los Campos and Pérez-Rodriguez 2014). This software does not allow fitting group-specific error variances; therefore, in our interaction model and in our across-environment analyses, we fit the models assuming homogeneous error variances across environments. The code used to implement these models is provided in the Supporting Information; technical details and several examples of the use of the package for genome-enabled prediction can be found in de los Campos and Pérez-Rodriguez (2014). We used a Bayesian model assuming Gaussian priors for the marker effects. The BGLR package assigns scaled-inverse x 2 densities to the variance parameters whose hyperparameters were given values using the default rules implemented in BGLR, which assign 5 degrees of freedom and calculates the scale parameter based on the sample variance of the phenotypes. Further details are given in de los Campos and Pérez-Rodriguez (2014).

Statistical analysis
The aforementioned models were fitted to data from each of the cycles (W12W3), separately. For each cycle data we performed analysis: (i) within-environment (see expression 1a), or (ii) by pairs of environments or, (iii) using data from all environments together. These last two approaches were implemented either using the model in expression (2a) or the interaction model expression (3a).
Models were fitted to each of the full data sets to derive estimates of variance components. Subsequently, we assessed prediction accuracy using training-testing (i.e., TRN-TST) random partitions (see below). For this validation procedure, all the parameters of the models, including variance components, were re-estimated from TRN data in each of the TRN-TST partitions. In all cases, inferences and predictions were based on 55,000 samples collected from the posterior distribution after discarding 5000 samples for burn-in.
Prediction accuracy was assessed using 50 TRN-TST random partitions; we used this approach because with a replicated TRN-TST design one can obtain as many partitions as one needs and this allows estimating SEs of estimates of prediction accuracy more precisely than with a cross-validation approach. Following Burgueño et al. (2012), we considered two different prediction problems. First (CV1), we assessed prediction accuracy of the models when TRN and TST data consist of disjoint sets of lines; this approach mimics the prediction problem n a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Model R 2 was computed as the ratio of the sum of the main and interaction variance, relative to the total variance (residual + main effect + interaction).
faced by breeders when lines have not been evaluated in any field trials. To generate TRN and TST sets in CV1 we simply assigned completely at random 70% of the lines to TRN and the remaining 30% to TST.
Second (CV2), we considered the problem of predicting the performance of lines in environments in which lines have not been evaluated. This validation design mimics the prediction problem faced by breeders in incomplete field trials where lines are evaluated in some but not all target environments. TRN-TST partitions for CV2 were obtained as follows: in each data set (w = 1,2,3), each line (i = 1,. . ., n w ) had records in s w environments (j = 1,. . ., s w ; s w ¼ 4 in W1 and W2, and s w ¼ 5 in W3); therefore, the total number of records available per data set was N w ¼ n w · s w . To select the entries in the TST data set, we first chose 0:3 · N w IDs (i.e., lines) at random and subsequently randomly picked one environment per line from the index j = 1,. . ., s w . The resulting cells ðijÞ were assigned to the TST data set, and the ones not selected through this algorithm were used for model TRN. Lines were sampled without replacement if n w $ 0:3 · N w and with replacement otherwise. Boxes 4a and 4b in File S4 provide the R-code used to generate TRN-TST partitions in CV1 and CV2, and Boxes 5 and 6 illustrate how to fit models and evaluate prediction accuracy for a TRN-TST partition.
For each TRN-TST partition, models were fitted to the TRN data set and prediction accuracy was assessed by computing Pearson's product-moment correlation between predictions and phenotypes in the TST data set, within environment. The same TRN-TST partitions were used to assess the prediction accuracy of each of the models; this yielded 50 correlation estimates for each model, data set, and we report the average correlation (across partitions) and SEs. Figure 1 shows box-plots of adjusted yield per data set and environmental condition. As expected, average yield increased with the number of irrigation events and, other factors being equal, late planting (H) produced lower yields than normal planting (N). In all cases, the empirical distribution of grain yield within data set and environment was reasonably symmetric. Table 2 gives the SDs of grain yield for each environment-cycle combination and the (empirical) phenotypic correlations of adjusted grain yield across environments, within cycle. The average SD of grain yield was rather stable across environments, ranging from 0.41 to 0.65. In the first cycle (W1), correlations across environments were moderately positive, ranging from 0.22 to 0.53 and, as one would expect, environments with similar irrigation levels (e.g., 0i and 2i) exhibited greater correlations than environments with very different numbers of irrigations (e.g., 0i vs. 5i). In the second evaluation cycle (W2), three environments received five irrigations (5i) and one received none (0i). The correlations among environments with five irrigations were also moderately positive, ranging from 0.33 to 0.41. However, the correlation between drought environments (0i) and environments having five irrigations was almost zero (20.05). Finally, in the third evaluation cycle (W3), the two environments with five irrigations and normal planting dates (5iBN and 5iFN) were positively correlated (0.55); however, environments with different irrigation levels or different n TRN-TST, training-testing. a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Change in prediction accuracy of the M·E model relative to the prediction accuracy of the single-environment (before semicolon) and relative to the prediction accuracy of the across-environment model (after semicolon). c Number of partitions (of 50) for which the M·E model gave greater accuracy than the single-environment (before semicolon) and the across-environment model (after semicolon).

RESULTS
planting dates (N/H) showed very low, and even negative, correlations. Interestingly, environments 0iFN and 5iBH, which differ across all factors, showed a moderate sample correlation (0.30). Table 3, Table 4, and Table 5 provide estimates of variance components per model and environment for cycles W1, W2, and W3, respectively. The estimates reported in these tables correspond to the full-data analyses.

Estimates of variance components
Stratified analysis: The proportion of variance explained by the regression on markers (R 2 computed based on estimates of variance components) estimated from the stratified (single-environment) analysis ranged from moderate (0.4) to high (0.7); in general, models fitted data better in W3 (Table 5) than in W1 or W2. Across-environment model: The estimated residual variance of the across-environment model was typically larger (and consequently the R 2 was lower) than that of the interaction model indicating that a sizable proportion of the G·E went to the residual of the across-environment model.

M·E model:
In the interaction models, the total genomic variance can be partitioned into a main effect and an interaction component. This partition showed that the relative importance of the main effect was greater when the environments analyzed jointly were positively correlated. On the other hand, as one would expect, when the environments analyzed jointly had low correlations, the estimated interaction variance was greater. For instance, in W1, the analysis of pairs of environments exhibiting sample phenotypic correlations smaller than 0.3 (0iBN + 5iBNZ, 0iBN + 5iFN, and 5iBNZ + 5iFN) yielded estimates of variance components in the M·E model where the main effect explained less than 50% of the total genomic variance, computed as the sum of the main effect plus interaction variance estimates (see Table 3). On the other hand, the pairs of environments showing correlations larger than 0.3 (2iBN + 5iBNZ and 2iBN + 5iFN) gave estimates of variance components where the main effect explained between 50 and 70% of the genomic variance. Finally, in W1, the pair of environments with the largest sample phenotypic correlation (0iBN + 2iBN) had estimates of variance components such that the main effect explained about 80% of the total genomic variance.
Similar patterns were observed in W2 and W3. Indeed, in W2, the main effects of markers explained more than 60% of the genomic variance for pairs of environments having sample phenotypic correlations greater than 0.33 (Table 4); on the other hand, in the two environments showing a low correlation (0iFN + 5iBN), the main effect explained only about 10% of the genomic variance. In W3 data set, pairs of environments with sample phenotypic correlations smaller than 0.1 had an estimated proportion of genomic variance explained by main effects that was smaller than 0.2. At the other extreme, for the pair of environments showing the greatest correlation (5iBN + 5iFN), the proportion of variance explained by main effects was close to 0.8 (Table 5). Finally, as expected, in the joint analysis of all environments, the variance of the main effect was largest in W1 and W2 (0.316 and 0.436, respectively), where several pairs of n TRN-TST, training-testing. a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Change in prediction accuracy of the M·E model relative to the prediction accuracy of the single-environment (before semicolon) and relative to the prediction accuracy of the across-environment model (after semicolon). c Number of partitions (of 50) for which the M·E model gave greater accuracy than the single-environment (before semicolon) and he across-environment model (after semicolon). environments had sample phenotypic correlations that were moderately high, and considerably smaller in W3 (0.113), where many pairs of environments had phenotypic correlations that were negative or close to zero.

Assessment of prediction accuracy
The average correlation and the estimated SD (both computed using 50 TRN-TST partitions, each with 70% of records in the TRN data set and 30% in TST data set) obtained in CV1 are reported in Table 6,  Table 7, and Table 8 and those obtained in CV2 are reported in Table  9, Table 10, and Table 11. A summary of these results is given in Figure 2. As one would expect, the levels of prediction accuracy (correlation) were slightly greater in CV2 than in CV1. In CV1, the stratified analysis and the interaction model performed similarly (average correlation of 0.48 and 0.47 for the stratified and interaction models, respectively), and the across-environment analysis was the worst one (the average correlation in CV1 was 0.33, that is about 30% lower correlation than the stratified analysis or the interaction model). On the other hand, in CV2, the interaction model gave the greatest levels of prediction accuracy (average correlation 0.53), this method was followed by the stratified analysis (average correlation of 0.48, that is, about 10% less, in the scale of correlation, than the interaction model), and the worst performing method was the across-environment analysis (this method had an average correlation in CV2 of 0.38, that is about 27% less than the interaction model).
Stratified analysis: The within-environment analysis yielded prediction correlations ranging from moderately low (0.307 for environment 5iFN in W3) to moderately high (0.630 for environment 5iBH in W3).
Across-environment analysis: Overall, this method was the worst performing one. In CV1 the across-environment analysis performed worse than the stratified analysis and the interaction model; this in every environment and dataset. In CV2, the joint analysis of data from different environments ignoring G·E performed worse than the n TRN-TST, training-testing. a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Change in prediction accuracy of the M·E model relative to the prediction accuracy of the single-environment (before semicolon) and relative to the prediction accuracy of the across-environment model (after semicolon). c Number of partitions (out of 50) for which the M·E model gave greater accuracy than the single-environment (before semicolon) and the across-environment model (after semicolon). stratified analysis when the pairs of environments analyzed together had a correlation lower than 0.3; however, the across-environment model tended to outperform the stratified analysis whenever the correlation between environments was larger than 0.4. On the other hand, the across-environment model was systematically outperformed by the M·E model both in CV1 and in CV2. In CV2, the difference in prediction accuracy between these two methods was low when the pairs of environments analyzed were positively correlated (e.g., in W1, environments 0iBN and 2iBN; see Table 9) and large, and in favor of the M·E model, when the set of environments analyzed were negatively correlated (e.g., in W3, in the joint analysis of 0iFN and 5iBN).
M·E model: As previously stated, in CV1, the M·E model performed similarly to the stratified analysis. However, in CV2, the joint analysis of all environments using the M·E model gave, relative to the stratified analysis, average gains in prediction accuracy ranging from 4.7 to 12.1% in W1, 4.1 to 29.6% in W2 and 20.2 to 22.3% in W3. The patterns of gain/loss in prediction accuracy achieved in CV2 with the M·E model, relative to the stratified analysis, were directly related to the correlation between environments and to the proportion of genomic variance explained by the main effects. Environments exhibiting positive correlations with other environments benefited greatly from the use of multienvironment models (e.g., 0iBN or 2iBN in W1, 5iBH and 5iFN in W2, and 5iFN and 5iBN in W3). For these environments in all 50 partitions, the multienvironment model fitted to all environments jointly had higher prediction accuracy than the within-environment model (Figure 2).
In CV2, the joint analysis of all environments using a M·E model gave, in general, a greater prediction accuracy than the one achieved with analyses of pairs of environments; this was particularly clear in W1 and W2 (Table 9 and Table 10, respectively), with the only exception of 5iBN in W2, where the average prediction accuracy was slightly higher in the bivariate analysis than in the multienvironment model applied to all environments jointly. The situation in W3 was slightly different; here, the predictive performance of analyses based on pairs of environments was similar to that of the joint multienvironment analysis of all conditions (Table 11).

DISCUSSION
Several studies have documented the benefits of using multienvironment models, relative to single-environment analysis (Burgueño et al. 2012;Dawson et al. 2013;Jarquin et al. 2014). Multienvironment analysis can model G·E interactions using covariance functions (Burgueño et al. 2012), markers and environmental covariates (Jarquin et al. 2014;Heslot et al. 2014) ,or by modeling M·E interactions. In this article, we adapted this approach, previously used in QTL models by Moreau et al.(2004), Van Eeuwijk et al. (2005), Boer et al. (2007, and Malosetti et al. (2004Malosetti et al. ( , 2008, to whole-genome regression models where phenotypes were regressed on large numbers of genome-wide markers. Relative to the standard multienvironment models with structured or unstructured covariances such as those used by Burgueño et al. (2012), the M·E model has advantages and disadvantages. On one hand, the interaction model: (i) is easy to implement using existing software for GS; (ii) leads to a decomposition of marker effects into n TRN-TST, training-testing. a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Change in prediction accuracy of the M·E model relative to the prediction accuracy of the single-environment (before semicolon) and relative to the prediction accuracy of the across-environment model (after semicolon). c Number of partitions (of 50) for which the M·E model gave higher accuracy than the single-environment (before semicolon) and the across-environment model (after semicolon).
components that are stable across environments (main effects) and environment-specific deviations (interactions) that can shed light on which genomic regions are most responsible for G·E; and (iii) can be implemented with any of the priors commonly used in GS, including not only shrinkage methods such as the GBLUP, but also variable selection methods. On the other hand, the M·E model imposes restrictions on co-variance patterns: the covariance between environments is forced to be positive and constant across pairs of environments. Therefore, the M·E model is best suited for the joint analysis of sets of environments that are positively and similarly correlated. In those cases, the parsimony of the M·E model can be advantageous; however, the pattern may be too restrictive in cases where the genomic variance cannot be approximated with the structure imposed by the interaction model (Meyer and Kirkpatrick 2008).
Among the three data sets considered here, two of them, (W1 and W2) had patterns of phenotypic (sample) correlations with moderately positive correlations between pairs of environments, while the third one (W3) exhibited a great deal of G·E, with some pairs of environments exhibiting either null or negative sample phenotypic correlations. The environments included in our data sets were managed field conditions aiming to approximate three basic target sets of environments (mega-environments) comprising three main agroclimatic regions previously defined and widely used by CIMMYT's Global Wheat Breeding Program (Braun et al. 1996). These three agroclimatic regions are represented by megaenvironment 1 (low rainfall and irrigated), megaenvironment 4 (drought), and megaenvironment 5 (heat). Results indicated that environments clustered more along the lines of full irrigation, drought, and heat than based on whether they include beds or flat planting systems, and zero or conventional tillage.

Variance components
The M·E model fitted the data much better than the across-environment model that ignored G·E. Furthermore, the estimates of variance components from the M·E model indicated that the proportion of genomic variance explained by the main effect of markers is directly related to the (sample empirical) phenotypic correlation between environments. Analytically, under the assumptions of the M·E model described and used in this article, if y 1i ¼ m 1 þ u 0i þ u 1i þ e 1i and y 2i ¼ m 2 þ u 0i þ u 2i þ e 2i are the equations for the phenotype of the i th line in environments 1 and 2, respectively, then the phenotypic correlation can be expressed as a function of variance components,  Table 3, Table 4, and Table 5) vs. the observed sample phenotypic correlations. Overall, the estimated phenotypic correlation based on variance components was linearly related to the sample phenotypic correlation for pairs of environments having positive sample phenotypic correlations. However, as the sample phenotypic correlation approached zero or became negative, the relationship flattened out. This happens because, in the interaction model, the covariance is represented by the variance of the main effects and, therefore, it is bound to be non-negative.
n Table 10 Estimated prediction accuracy (correlation between predicted and adjusted grain yield, averaged over 50 TRN-TST partitions, cycle 2011 TRN-TST, training-testing. a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Change in prediction accuracy of the M·E model relative to the prediction accuracy of the single-environment (before semicolon) and relative to the prediction accuracy of the across-environment model (after semicolon). c Number of partitions (of 50) for which the M·E model gave greater accuracy than the single-environment (before semicolon) and the across-environment model (after semicolon).

Prediction accuracy
The prediction analysis conducted in this study yielded levels of accuracy (correlation between phenotypes and predicted genomic values) that are consistent with previous reports for grain yield prediction accuracy using single-and multi-environment models (Burgueño et al. 2012). Overall, the interaction model was either the best performing method (CV2) or performed close to the best performing method (in CV1 the stratified analysis and the interaction model performed similarly). These results are consistent with those of previous studies (e.g., Burgueño et al. 2012;Jarquin et al. 2014) that have used similar validation designs (CV1 and CV2) and have reported similar predictive performance of the stratified and multivariate analysis in CV1, and clear superiority of the multivariate approach in CV2. Additionally, we also considered an across-environment analysis that ignores G·E. In our study this approach was clearly the worst-performing one; this finding highlights the importance of considering G·E when analyzing multi-environment data.
The gains in prediction accuracy obtained in CV2 with the M·E model were directly related to the correlations between environments.
Considering environments that had positive phenotypic correlations among them, the use of the M·E model yielded in CV2 gains, relative to the stratified analysis, that were either moderate (on the order of 5%) or very substantial (on the order of 29%). In CV2 the only cases where the stratified analysis was better than the interaction model are those based on the joint analysis of pairs of environments that had close to null or negative correlations (e.g., 5iBH in W3). This happened because, as discussed previously, the interaction model forces the covariance between environments to be non-negative.
In prediction problems such as that of CV2, the superiority of the M·E relative to the stratified analysis can be attributed to the fact that the M·E model allows borrowing of information within line across environments, that is: when deriving predictions for a given line, the M·E model benefits from records from the same line collected in correlated environments (this borrowing of information also happens in the across-environment GBLUP; however, in the across-environment GBLUP borrowing of information within line across environments is achieved at the expense of forcing the effects to be constant across environments). This feature of the M·E model can be exploited n Table 11 Estimated prediction accuracy (correlation between predicted and adjusted grain yield, averaged over 50 TRN-TST partitions, cycle 2012 TRN-TST, training-testing. a Environments are described by a sequence of codes: 0i, 2i, and 5i denote the number of irrigation cycles; B/F denotes whether the planting system was "bed" (B) or "flat" (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat); Z indicates no tillage. b Change in prediction accuracy of the M·E model relative to the prediction accuracy of the single-environment (before semicolon) and relative to the prediction accuracy of the across-environment model (after semicolon). c Number of partitions (out of 50) for which the M·E model gave greater accuracy than the single-environment (before semicolon) and the across-environment model (after semicolon).
in prediction problems such as CV2; however, such borrowing of information within line is not possible in CV1 and, consequently, the M·E model performs similarly to the stratified analysis for prediction of performance of lines that have no phenotypic records.

How many environments?
The interaction model can be applied to all available environments or to other sets (e.g., pairs) of environments. For data sets such as W1, where all environments showed positive correlations of similar magnitude, the joint analysis of all environments was clearly superior to analyses based on pairs of environments. However, in data sets exhibiting complex covariance patterns (such as W3), joint analysis of all environments using an interaction model imposes inadequate restrictions on co-variance patterns and, consequently, bivariate analysis seems more appropriate.

Extensions
In our study, because of the limitations of the software used, we implemented the M·E model in which the error variance was assumed to be homogeneous across environments. In principle, the model can be easily extended to accommodate environment-specific variances.
In this study, we presented and applied the interaction model using Gaussian priors. We did this because the GBS marker data contain a large proportion of missing values. However, with high-density panels of high-quality markers (e.g., single-nucleotide polymorphisms) it would make perfect sense to use other priors. For instance, it could be used with priors that induce differential shrinkage of estimates or variable selection (de los Campos et al. 2013); such treatment would potentially aid in identifying sets of markers with effects that are stable across environments and others that are responsible for G·E.
The M·E model presented in this article is an easy-to-implement and easy-to-interpret approach for modeling G·E in genomic models. The model allows decomposing marker effects and genomic variance into components that are stable across environments (main effects) and components that are environment-specific (interaction terms). The model can be implemented easily using existing software for GS. Predictions from the interaction model had either similar (CV1) or greater (CV2) accuracy than the single-environment analysis and were always more accurate than those derived from an across-environment analysis that ignored G·E; therefore, the proposed model should be useful for selection based on either stability (main effects only) or for target environments (based on total genomic value). The interaction model is not free of limitations; in particular, it is important to note that the genetic covariance between any pair of environments is represented by the variance of the main effect; therefore, it is restricted to being positive and the same for all pairs of environments analyzed jointly. Therefore, the model should be more effective when applied to subsets of environments that have positive and similar correlations.

ACKNOWLEDGMENTS
We thank CIMMYT's Global Wheat Program that performed the experiments and collected the data analyzed in this study. We ac- Figure 2 Correlation between phenotypes and predictions (average over 50 TRN-TST partitions) by model (stratified, interaction, and acrossenvironment analysis), validation design (CV1, left, CV2 right), data set and simulated environment (horizontal axis). TRN-TST, training-testing.

Figure 3
Estimated phenotypic correlation (based on estimates of variance components) vs. sample phenotypic correlation. The estimated phenotypic correlation was obtained from estimates of variance components (see Table 3, Table 4, and Table 5) from the bivariate analysis.