Skip to main content
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus
  • Other GSA Resources
    • Genetics Society of America
    • Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org
  • Log in
G3: Genes | Genomes | Genetics

Main menu

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Contact us
  • SERIES
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Selection
    • Multiparental Populations
  • ARTICLE TYPES
    • About Article Types
    • Genome Reports
    • Meeting Reports
    • Mutant Screen Reports
    • Software and Data Resources
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Email alerts
    • RSS feeds
  • Other GSA Resources
    • Genetics Society of America
    • Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org

User menu

Search

  • Advanced search
G3: Genes | Genomes | Genetics

Advanced Search

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Contact us
  • SERIES
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Selection
    • Multiparental Populations
  • ARTICLE TYPES
    • About Article Types
    • Genome Reports
    • Meeting Reports
    • Mutant Screen Reports
    • Software and Data Resources
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Email alerts
    • RSS feeds
Previous ArticleNext Article

Genomic-Enabled Prediction Kernel Models with Random Intercepts for Multi-environment Trials

View ORCID ProfileJaime Cuevas, View ORCID ProfileItalo Granato, View ORCID ProfileRoberto Fritsche-Neto, Osval A. Montesinos-Lopez, Juan Burgueño, Massaine Bandeira e Sousa and View ORCID ProfileJosé Crossa
G3: Genes, Genomes, Genetics April 1, 2018 vol. 8 no. 4 1347-1365; https://doi.org/10.1534/g3.117.300454
Jaime Cuevas
Universidad de Quintana Roo, Chetumal, Quintana Roo, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jaime Cuevas
Italo Granato
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Italo Granato
Roberto Fritsche-Neto
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Roberto Fritsche-Neto
Osval A. Montesinos-Lopez
Facultad de Telemática, Universidad de Colima, CP 28040 Colima, Edo. de Colima, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan Burgueño
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT). Apdo. Postal 6-641, 06600 México DF, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Massaine Bandeira e Sousa
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José Crossa
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT). Apdo. Postal 6-641, 06600 México DF, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for José Crossa
  • For correspondence: j.crossa@cgiar.org
  • Article
  • Figures & Data
  • Info & Metrics
Loading

Abstract

In this study, we compared the prediction accuracy of the main genotypic effect model (MM) without G×E interactions, the multi-environment single variance G×E deviation model (MDs), and the multi-environment environment-specific variance G×E deviation model (MDe) where the random genetic effects of the lines are modeled with the markers (or pedigree). With the objective of further modeling the genetic residual of the lines, we incorporated the random intercepts of the lines (Embedded Image) and generated another three models. Each of these 6 models were fitted with a linear kernel method (Genomic Best Linear Unbiased Predictor, GB) and a Gaussian Kernel (GK) method. We compared these 12 model-method combinations with another two multi-environment G×E interactions models with unstructured variance-covariances (MUC) using GB and GK kernels (4 model-method). Thus, we compared the genomic-enabled prediction accuracy of a total of 16 model-method combinations on two maize data sets with positive phenotypic correlations among environments, and on two wheat data sets with complex G×E that includes some negative and close to zero phenotypic correlations among environments. The two models (MDs and MDE with the random intercept of the lines and the GK method) were computationally efficient and gave high prediction accuracy in the two maize data sets. Regarding the more complex G×E wheat data sets, the prediction accuracy of the model-method combination with G×E, MDs and MDe, including the random intercepts of the lines with GK method had important savings in computing time as compared with the G×E interaction multi-environment models with unstructured variance-covariances but with lower genomic prediction accuracy.

  • Genomic-enabled prediction accuracy
  • genotype × environment interaction
  • main genetic effects
  • deviations from main genetic effects
  • random intercepts
  • Genomic Selection
  • shared data resource
  • GenPred

Genomic selection (GS) predicts breeding values of complex traits based on dense marker information (Meuwissen et al. 2001) and has shown good prediction accuracy achieved by random cross-validation partitions of plant breeding data (de los Campos et al. 2009, 2013; Crossa et al. 2010, 2011; 2013; Pérez-Rodríguez et al. 2012). As molecular markers become cheaper and more abundant, GS-assisted breeding has become commonly used in plant and animal improvement. When performing genomic prediction of breeding values of unobserved individuals, the relationship between individuals in the training and testing sets is computed through the genomic relationship matrix, and the prediction model is referred to as the Genomic Best Linear Unbiased Predictor (GBLUP) (VanRaden, 2007, 2008).

Multi-environment trials are routinely conducted in plant breeding to estimate and take advantage of genotype × environment interaction (G×E) for selecting stable and high performing lines across environments and within environments. Therefore, implementation of GS strategies in plant breeding should be useful for estimating the parameters of the model and predicting G×E, as is commonly done in conventional plant breeding. Modern statistical analyses of multi-environment trials assess G×E by using pedigree information with linear mixed models (Piepho, 1997, 1998; Smith et al. 2005; Crossa et al. 2006; Burgueño et al. 2007); however, these models do not incorporate marker information.

A Bayesian GBLUP regression model for assessing genomic-enabled prediction combining G×E introduces the main effects of environments and lines and the interaction effects of markers and environmental co-variables via random variance-covariance structures (Jarquín et al. 2014). The Bayesian regression model of López-Cruz et al. (2015) is similar to that of Jarquín et al. (2014) with one difference: that genomic values are partitioned into components that are stable across environments (main genomic effects) and others that are environment-specific (genomic G×E) (Crossa et al. 2016). Although both models assume positive sample correlations among environments and can be fitted using the BGLR package (de los Campos and Pérez-Rodríguez 2016), the advantage of the model of López-Cruz et al. (2015) over the model of Jarquín et al. (2014) is that it can be implemented using both shrinkage methods and variable selection methods and is efficient when applied to sets of environments that have positive correlations because the genetic covariance between any pair of environments is the variance of the main effect, which makes the covariance between pairs of environments positive (López-Cruz et al. 2015).

Cuevas et al. (2016) used the Bayesian model of López-Cruz et al. (2015) to compare methods that apply GS models with G×E using a linear kernel (GBLUP) (GB) and a non-linear Gaussian kernel (GK) for single-environment and multi-environment breeding data sets. The authors found the GK models had higher prediction accuracy than the GB models and explained that the GK models captured major and complex marker effects in addition to their interaction effects. Sousa et al. (2017) compared the prediction accuracy of the multi-environment, single variance G×E deviation model (MDs) of Jarquín et al. (2014) with GK (MDs-GK) and the prediction accuracy of the multi-environment environment-specific variance G×E deviation model (MDe) of López-Cruz et al. (2015) with the GK method (MDe-GK). Then, Sousa et al. (2017) compared the models including the GK method with the prediction accuracy of their counterpart models using the GB methods (MDs-GB and MDe-GB). In addition, Sousa et al. (2017) also compared the accuracy of the four previous models with the accuracy of the multi-environment, main genotypic effect (MM) of Jarquín et al. (2014) using the GB and GK methods (MM-GB, and MM-GK). Results show that for grain yield, a notable increase in prediction accuracy of GK over the GB methods ranged from 9 to 49% in one data set and from 34 to 70% in another data set.

In general, the previous linear mixed multi-environment models assumed the environments as fixed or random effects, and lines as random effects incorporating into the model the random slope of the genetic effect of the lines distributed as a normal random variable with zero mean and variance-covariance structure constructed from markers or pedigree; also, the genetic effect (intercept) of the lines can be considered as having a normal distribution with zero mean and constant variance (Mota et al. 2016). The random intercept of the lines is often not included in the model when no exchange of information occurs, assuming the intercepts are independent (Pérez-Rodríguez et al. 2015). However, recent studies have incorporated random intercepts (Mota et al. 2016; Cuevas et al. 2017; Sukumaran et al. 2017; Jarquín et al. 2017) in order to achieve higher genomic-prediction accuracy in cases where lines were observed in some environments but not in others (random cross-validation 2, CV2 of Burgueño et al. 2012); this is because the posterior distribution of the intercept generates a variance-covariance structure that allows exchanging information between the lines of the training and testing sets. When newly developed lines have never been observed (untested) (random cross-validation CV1, Burgueño et al. 2012), models do not improve the prediction accuracy with or without random intercept when compared with the single-environment model. One limitation of these multi-environment genomic G×E models for achieving relatively high genomic-enabled predictions is that correlations among environments should be positive. Also, none of the applications of the models of Jarquín et al. (2014), Sukumaran et al. (2017), and Jarquín et al. (2017) compared genomic-enabled prediction accuracy with GB kernel vs. GK kernel.

The previous Bayesian regression models of Jarquín et al. (2014) and López-Cruz et al. (2015) use the Hadamard product for modeling G×E and show that the exchange of information between environments is achieved by means of the variance-covariance matrix of the main effects. Thus, the variance component of the main effects measures the stability across environments and the variance component of the specific effects measures the deviations from the main effects due to specific combinations of lines in environments (G×E). This approach has the advantage that it can be used when the number of lines in each environment is the same, but also when there is an unbalanced number of lines in environments, as shown by Sousa et al. (2017).

On the other hand, GBLUP methodology (together with pedigree) can incorporate and model G×E effects, by means of the Kronecker product of the variance-covariance matrices of the genetic relationship between environments and the genomic or pedigree relationship between the lines (Burgueño et al. 2012; Oakey et al. 2016) where the structure of the models allows estimating negative genetic correlations between environments. Based on this, Cuevas et al. (2017) recently compared a Bayesian regression model for the genetic effects described by the Kronecker product of unstructured variance-covariance matrices of genetic correlations between environments and genomic kernels under the GB and GK methods. An extension includes an extra genetic residual component with random intercepts. Results of the analyses of five data sets indicated that including the random intercepts is still beneficial for increasing genomic prediction accuracy when lines have been tested in some environments. However, one drawback of the Bayesian regression models of Cuevas et al. (2017) is the computing time for the iteration required for the Monte Carlo Markov Chain (MCMC) method to achieve the convergence of the posterior and predictive distributions.

Recently Granato et al. (2017) proposed an R package called Bayesian Genomic G×E (BGGE) to obtain a rapid fit of Bayesian mixed linear models with homogeneous error variances for the models of Jarquín et al. (2014), López-Cruz et al. (2015) and also for the models used by Sousa et al. (2017) (MM, MDs, and MDE). The approach of Granato et al. (2017) uses an R library that saves time by using the structure of the block diagonal matrices with additional parameterizations to shorten the iteration time without losing precision.

Based on the above, the main objective of this study was to compute the prediction accuracy of 16 model-method combinations and compare their prediction accuracy for four different data sets (two maize and two wheat multi-environment trials) with an unbalanced number of lines in environments, and different complexity of the G×E interaction. The 16 model-methods comprise the multi-environment, main genotypic effect (MM), the multi-environment, single variance G×E deviation model (MDs) and the multi-environment environment-specific variance G×E deviation model (MDe) with the GB and GK kernel methods and with and without including random intercepts (12 model-methods) plus 4 Bayesian regression models for the genetic effects described by the Kronecker product of unstructured variance-covariance (MUC) matrices of genetic correlations between environments and genomic kernels under the GB and GK methods and their extensions, including an extra genetic residual component with random intercepts. We discuss the advantages and disadvantages of the different model-methods for sets of environments with different G×E characteristics and different degrees of unbalance among lines.

Materials and Methods

This study uses four multi-environment plant breeding data sets with different characteristics. Two maize data sets used by Sousa et al. (2017) (HEL and USP) had different numbers of maize hybrids in each environment and positive correlations between environments, whereas the two wheat data sets used by Cuevas et al. (2017) (WHE1 and WHE5) had environments with negative or zero correlations but with the same number of wheat lines in each location.

We used the same models of Sousa et al. (2017) (MM, MDs, and MDe) with linear (GB) and non-linear kernels (GK) (MM-GB, MM-GK, MDs-GB, MDs-GK, MDe-GB, MDe-GK) plus the addition of one random intercept component (Embedded Image) that captures the variation of genetic residuals (MMEmbedded Image-GB, MMEmbedded Image-GK, MDsEmbedded Image-GB, MDsEmbedded Image-GK, MDeEmbedded Image-GB, MDeEmbedded Image-GK). These 12 model-methods were fitted with the BGGE package (Granato et al. 2017).

In this study models 2 and 3 of Cuevas et al. (2017) are renamed as Multi-environment Unstructured Covariance (MUC) and Multi-environment Unstructured Covariance with random intercept vector f (MUCf), respectively, each fitted with the GB and GK kernel methods. Therefore, 4 additional models are included, MUC-GB, MUC-GK, MUCf-GB, and MUCf-GK. These models were fitted with the MTM package (de los Campos and Grüneberg 2016) and their prediction accuracy was compared with the other 12 model-method combinations.

In the first step, the phenotypic data were fitted according to the experimental design employed for each experiment, and the Best Linear Unbiased Estimates (BLUE) of the lines or hybrids for each location or environments were computed. In the second step, the various genomic models were fitted to perform random cross-validation and compute the prediction accuracy of the 16 model-method combinations.

Experimental data

Maize data set HEL:

This maize data set comprises 452 maize hybrids evaluated in 2015 at five sites in Brazil: Nova Mutum (NM) and Sorriso (SO) in the state of Mato Grosso; Pato de Minas (PM) and Ipiaçú (IP) in the state of Minas Gerais; and Sertanópolis (SE) in the state of Paraná. The experimental design was a randomized block with two replicates per genotype and environment. Different numbers of hybrids were planted in each environment. The HEL parent lines were genotyped with an Affymetrix Axiom Maize Genotyping Array of 616 K SNPs with standard quality controls removing markers with a Call Rate Embedded Image 0.95.

Maize data set USP:

This data set comprises 740 maize hybrids evaluated at Piracicaba and Anhumas, each with two levels of nitrogen (N) fertilization: Ideal N (IN) and Low N (LN) for a total of four artificial environments (P-IN, P-LN, A-IN, and A-IN). The hybrids were evaluated using an augmented block design including two replicated commercial hybrids as checks. There was an imbalance because not all hybrids were evaluated in all locations. Similar to the maize data set HEL, the USP parent lines were genotyped with an Affymetrix Axiom Maize Genotyping Array of 616 K SNPs with standard quality controls removing markers with a Call Rate Embedded Image 0.95.

Wheat data set WHE1:

A historical set of 599 wheat lines from CIMMYT’s Global Wheat Program was evaluated in four mega-environments (Crossa et al. 2010; Cuevas et al. 2016) and genotyped using 1447 Diversity Array Technology (DArT) markers generated by Triticarte Pty. Ltd. (Canberra, Australia; http://www.triticarte.com.au). Markers with a minor allele frequency lower than 0.05 were not included.

Wheat data set WHE5:

This data set is described by López-Cruz et al. (2015) and includes 807 wheat lines evaluated in five environments using an alpha-lattice design with three replicates in each environment at CIMMYT’s wheat breeding station at Cd. Obregon, Mexico. The environments were three irrigation regimes (0i = zero irrigation, 2i = two irrigations, and 5i = five irrigations), two planting systems (B = bed planting and F = flat planting) and two different planting dates (N = normal and L = late).

Genotypic data consisted of genotyping-by-sequencing (GBS) data, and markers with a minor allele frequency (MAF) lower than 0.05 were removed. After editing the missing markers, a total of 14,217 GBS markers were available for analyzing this data set.

Availability of the phenotypic and genotypic experimental data:

Sousa et al. (2017) describe the two maize data sets and Cuevas et al. (2017) give details of the two wheat data sets. The two maize data sets, HEL and USP, can be downloaded from the link http://hdl.handle.net/11529/10887, whereas the two wheat data sets can be found at the link http://hdl.handle.net/11529/10710, from where DATASET1.Wheat_GY.Rdata (Wheat data set WHE1) and DATASET5.Wheat_GY.Rdata (Wheat data set WHE5) were obtained.

Statistical models

The components of the 8 basic models are summarized in Table 1 and their full descriptions are given below and in Appendix 1. They include an overall mean (Embedded Image) and the fixed effects of the environments (other effects can be incorporated) modeled with the incident matrix Embedded Image and one vector of fixed effects Embedded Image for each environment. For the first group of six models (MM, MMEmbedded Image MDs, MDsEmbedded Image, MDe, and MDeEmbedded Image), it is assumed that their genetic random components g have a normal distribution with mean zero and a variance-covariance structure comprising a known matrix Embedded Image generated from markers (and computed using the GB or GK methods) multiplied by an unknown scaled parameter (variance component). Also 4 models in this group had different forms for modeling the G×E, MDs and MDe, with a variance-covariance structure constructed by the Hadamard product of the corresponding matrices and incorporating (or not) the random intercepts (Embedded Image).

View this table:
  • View inline
  • View popup
Table 1 Components of the 8 models included in this study. Each of these models is fitted with the linear kernel (GB) and the Gaussian kernel (GK)

A second group of models (MUC) considers that their random components have a normal distribution with zero mean and a variance-covariance structure modeled by the Kronecker product of a matrix with unknown covariances among environments multiplied by a known Embedded Image (computed using the GB or GK methods) and incorporating (or not) the random intercepts (f).

The multi-environment main genotypic effect model (MM):

Model MM (1) (Appendix 1) is equivalent to the across-environment model of Jarquín et al. (2014) and when in the distribution of the random genetic effects Embedded Image is used in model MM, Embedded Image is used in the covariance (de los Campos et al. 2013; VanRaden 2007, 2008); the model is the GBLUP across environments (MM-GB), where Embedded Image is the standardized matrix of molecular markers for the individuals of order Embedded Image, where Embedded Image is the number of markers.

However, markers can have a more complex function than the linear GBLUP. For example, the Gaussian kernel (GK) function (Cuevas et al. 2016) is computed as Embedded Image, where Embedded Image is the Euclidean distance between the Embedded Imageth and Embedded Image individuals given by the markers; Embedded Image is the bandwidth parameter that controls the rate of decay of Embedded Image values (de los Campos et al. 2009; Pérez-Rodríguez et al. 2012; Pérez-Elizalde et al. 2015; Cuevas et al. 2016). In this work, GK is Embedded Image), where Embedded Image and the median of the distances is used as a scaling factor (Crossa et al. 2010). When in the distribution of the random genetic effects Embedded Image of the MM model (1) is used with Embedded Image), in the covariance the model is the Gaussian kernel across environments (MM-GK) (Sousa et al. 2017).

The genetic variation between lines that is not explained by Embedded Image in (1) (Appendix 1) can be captured by the random vector Embedded Image that is considered a random intercept for each line; thus when random effects Embedded Image are added, model MM becomes model MMEmbedded ImageEmbedded Imagewhere the random intercepts Embedded Image with Embedded Image being the identity matrix of size Embedded Image, and Embedded Image the variance component that indicates the influence of Embedded Image; the incidence matrix Embedded Image connects the genotypes to the phenotypes. As in MM, the kernel matrix Embedded Image of the random effect g of model MMEmbedded Image can be fitted with GBLUP (MMEmbedded Image-GB) or with Gaussian kernel (MMEmbedded Image-GK).

The multi-environment single variance genotype × environment interaction deviation model (MDs):

Model (2) (Appendix 1) (MDs) adds to model (1) the random interaction effect of the environments with the genetic information of the lines (Embedded Image). When the random component Embedded Image is added to model (2), the MDs model becomes MDsEmbedded Image:Embedded ImageEach environment matrix K (Appendix 1) of models MDs and MDsEmbedded Image can be fitted with a linear kernel (MDs-GB, MDsEmbedded Image-GB) or a Gaussian kernel (MDs-GK, MDsEmbedded Image-GK).

Multi-environment environment-specific variance genotype × environment deviation model (MDe):

The environment-specific variance genotype × environment deviation model (MDe) (López-Cruz et al. 2015) differs from MDs on how the interaction component is considered; g is the main genetic effect across environments and Embedded Image is the specific genetic effect in each environment. When the random component Embedded Image is added to (3) (Appendix 1), the MDe model becomes MDeEmbedded Image:Embedded Imagewhere matrices K for g and KE for Embedded Image of models MDe and MDeEmbedded Image can be fitted with a linear kernel (MDe-GB, MDeEmbedded Image-GB) or with a Gaussian kernel (MDe-GK, MDeEmbedded Image-GK).

Multi-environment With unstructured variance-covariance (MUC):

This model considers that there is a genetic correlation between environments that can be modeled with matrices of order Embedded Image (where m denotes the environment) (Cuevas et al. 2017). The MUC is expressed asEmbedded Imagewhere Embedded Imageis a vector with the observation Embedded Image belonging to the jth environment Embedded Imageeach of the same size (Embedded Image; the random vector Embedded Image is the vector of genetic values, and Embedded Imagethe vector of random errors both assumed normally distributed with Embedded Image Embedded Image and Embedded Image where Embedded Image is the Kronecker product.

The variance-covariance matrix of Embedded Image is the Kronecker product of one unstructured matrix with information between environments Embedded Image that needs to be estimated and another known matrix with information between the lines based on Embedded Image. Then the Embedded Image matrix Embedded ImageisEmbedded Imagewhere the jth diagonal element is the genetic variance Embedded Imagewithin the jth environment, and the off-diagonal elements are the genetic covariances Embedded Imagebetween environments j and j’. For a large number of environments, a factor analytical model usually performs better than the unstructured model (Burgueño et al. 2012; Oakey et al. 2016). Furthermore, matrix Embedded Image is an error diagonal matrix of order Embedded Image, i.e., Embedded Image=diag(Embedded Image).

Multi-environment With un-structured variance-covariance and random intercepts (MUCf):

The MUC model can be extended by adding an extra variability to account for genetic variance among individuals across environments, that is, by adding the random vector f (Cuevas et al. 2017). Therefore, the extension of the previous random linear model isEmbedded Imagewhere Embedded Image with the random vectors Embedded Imagebeing independent of Embedded Image and normally distributed Embedded Image. Matrix Embedded Imageis unstructured and captures genetic variance-covariance effects between the individuals across environments that were not captured by the Embedded Image matrix; matrix Embedded Image can be expressed asEmbedded Imagewhere the jth diagonal element of theEmbedded Image matrix Embedded Image is the genetic environmental variance Embedded Image within the jth environment, and the off-diagonal element is the genetic covariance Embedded Image between environments j and j’. Similar to the previous cases, models MUC and MUCf can be fitted using GB or GK kernels to generate the four model-method MUC-GB,MUC-GK, MUCf-GB, MUCf-GK.

Model implementation and random cross-validation for assessing prediction accuracy in the four data sets:

For the two maize data sets, models MM-GB, MM-GK, MDs-GB, MDs-GK, MDe-GB, and MDe-GK were fitted with the new software BGGE (Granato et al. 2017). Models MMEmbedded Image-GB, MMEmbedded Image-GK, MDsEmbedded Image-GB, MDsEmbedded Image-GK, MDeEmbedded Image-GB, and MDeEmbedded Image-GK were also fitted with BGGE with the same random partitions used by Sousa et al. (2017) to make results comparable for random-cross-validation 1 (CV1) and random cross-validation 2 (CV2) (Burgueño et al. 2012). Models MUCf and MUC of Cuevas et al. (2017) were fitted using the software MTM (de los Campos and Grüneberg 2016) with the GB and GK kernel methods and with the same random partitions used for the 12 model-method combinations previously defined for random cross-validations CV1 and CV2. A fivefold random cross-validation was used assigning 80% of the observations to the training sets and 20% to the testing (validation) set. However, most of the results and discussion focus on cross-validation CV2. The two wheat data sets were fitted with the 12 model-method combinations (models MMEmbedded Image-GB, MMEmbedded Image-GK, MDsEmbedded Image-GB, MDsEmbedded Image-GK, MDeEmbedded Image-GB, MDeEmbedded Image-GK, MM-GB, MM-GK, MDs-GB, MDs-GK, MDe-GB, MDe-GK) using the BGGE software of Granato et al. (2017).

Two random cross-validations (CV1 and CV2) were generated; CV1 attempts to mimic a situation where a set of lines were never evaluated in a set of environments, whereas CV2 mimics a sparse testing scheme where some lines were evaluated in some environments but not in others. Results based on CV2 are shown in the main text, tables and figures. Results of random cross-validation CV1 are given in Tables S1-S4 of Appendix 2. To implement the proposed 12 model-method combinations, 50 random partitions were performed with 80% of the lines used for training and the remaining 20% of the lines used for testing. The metric for measuring the performance of prediction accuracy was the Pearson correlation calculated between the observed and predicted values of the testing sets.

Results

The results are given in four sections, one for each data set. In each section, we provide the results of the variance component estimates and the prediction accuracy for each of the 12 model-method combinations.

Maize data set HEL

This maize data set has a total of 452 maize hybrids with a different number in each of the five sites (Embedded Image 247, Embedded Image 330, Embedded Image 452, Embedded Image 367, Embedded Image 330). The sample phenotypic correlations among locations are positive with intermediate-to-low values, where location SE has low correlations with all the other locations, and locations NM, IP, and PM show relatively high correlations with the other locations (Table A1, Appendix 3).

Models without the random componentEmbedded Image always show a lower residual variance component in the GK models than in the GB models; for example, for model MDs-GK, Embedded Image = 0.278 and for MDs-GB, Embedded Image = 0.591 (Table 2). However, when the models include Embedded Image, these differences become smaller; for example, for MDsEmbedded Image-GK, Embedded Image = 0.277 and for MDsEmbedded Image-GB, Embedded Image = 0.368, indicating that for method GB, the random component Embedded Image explains the variation of the observations better, whereas for GK, including Embedded Image does not have much influence on the residual. This is also reflected in the small value of Embedded Image = 0.013 for MDsEmbedded Image-GK as compared with Embedded Image = 0.243 for MDsEmbedded Image-GB.

View this table:
  • View inline
  • View popup
Table 2 MAIZE HEL data set. Estimated variance components for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs) and environment-specific variance G×E deviation model (MDe) with two kernels, GBLUP (GB) and Gaussian (GK), with Embedded Image and without Embedded Image, for grain yield (standard deviation in parentheses)

The size of the genetic component, Embedded Image, is always much higher for MM-GK, MDs-GK, and MDe-GK than for models with the GB method. For models MMEmbedded Image, MDsEmbedded Image, and MDeEmbedded Image, the sum of Embedded Image and Embedded Image is higher than the component Embedded Image for models MM, MDs, and MDe. For example, for model MMEmbedded Image-GB, Embedded Image is 0.429, whereas for model MM-GB, Embedded Image is 0.356; MDsEmbedded Image-GB summation Embedded Image is 0.415 vs. MDs-GB with Embedded Image = 0.370, and for model MDeEmbedded Image-GB Embedded Image= 0.430 vs. MDs-GB with Embedded Image = 0.370. The variance explained by the G×E of MDs, Embedded Image, is higher for GK than for GB and slightly higher for models with the random component Embedded Image than for models without Embedded Image. The variance components for the specific environments show increases in MDeEmbedded Image-GK compared to MDeEmbedded Image-GB, and in MDe-GK compared to MDe-GK (Table 2).

Models including the random componentEmbedded Image with GK did not improve the prediction accuracy of the locations as compared with the prediction accuracy of models without Embedded Image with GK (Table 3 and Figure 1); however, models with Embedded Image had consistently higher prediction accuracies than models with GB. In all cases, MMEmbedded Image showed lower prediction accuracy than models with G×E (MDsEmbedded Image and MDeEmbedded Image). Similarly, model MM had lower prediction accuracies than models that incorporate G×E (MDs and MDe). These differences are smaller for locations that had higher sample phenotypic correlations with other locations than for locations with low phenotypic correlations. For example, location NM had prediction accuracies of 0.569, 0.589, and 0.588 for models MMEmbedded Image-GB, MDsEmbedded Image-GB, and MDeEmbedded Image-GB, respectively, whereas location SE with low sample phenotypic correlations among locations had prediction accuracies of 0.372, 0.544, and 0.548 for models MMEmbedded Image-GB, MDsEmbedded Image-GB, and MDeEmbedded Image-GB, respectively.

View this table:
  • View inline
  • View popup
Table 3 Maize HEL data set. Mean Pearson’s correlation (50 partitions) of each location for random cross-validation CV2, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)
Figure 1
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1

Plot of the prediction accuracy using Pearson’s correlation for each of the 5 locations (SO, SE, PM, NM, and IP) of maize data set HEL for the proposed models MDel-GK, MDel-GB, MUCf-GK, MUCf-GB, MDe-GK, MDe-GB, MUC-GK, and MUC-GB.

All models with kernel GK had higher prediction accuracies (with and without the random component Embedded Image) than models with kernel GB (Table 3 and Figure 1). However, these differences are lower for models that include the random component Embedded Image (Table 3). For example, for location SO, the prediction accuracies for models MDsEmbedded Image-GK and MDsEmbedded Image-GB were 0.673 and 0.639, respectively, whereas for MDs-GK and MDs-GB, the mean prediction accuracies were 0.666 and 0.466, respectively. Comparing models with kernel GB, with and without Embedded Image, the predictions are always higher when the model includes Embedded Image than when the model excludes Embedded Image; for example, for location IP, the mean prediction accuracies were 0.778 and 0.683 for MDsEmbedded Image-GB and MDs-GB, respectively (Table 3). Note that the variance component of the random effect Embedded Image Embedded Image was 0.243 for model MDsEmbedded Image-GB (Table 2). Furthermore, model 3 from Cuevas et al. (2017) with the unstructured variance-covariance component f or model 2 without f did not show any clear superiority, in terms of mean prediction accuracy, over models MDsEmbedded Image and MDeEmbedded Image and MDs and MDe with GK and GB (Table 3 and Figure 1).

Random cross-validation CV1 decreased the prediction accuracy as compared with results achieved for CV2 (Table S1, Appendix 2); the trends and patterns of the prediction accuracy of the locations between models and methods are similar to those found for CV2, including those found for models MUC and MUCf.

In summary, results from maize data HEL indicated that models with the random component Embedded Image with GK including G×E (MDsEmbedded Image-GK and MDeEmbedded Image-GK) show similar mean prediction accuracy as models excluding the random component Embedded Image. However, this did not occur with GB models where including the random component Embedded Image increased the prediction accuracy for all 5 locations. Prediction accuracy using GK was always higher than using GB with or without the random component Embedded Image. Also, the differences between the models with and without Embedded Image and between GK and GB were smaller for locations that had higher sample phenotypic correlations with other locations. Finally, the differences in prediction accuracy were negligible between the proposed models including G×E with GK and GB and with and without the random effect Embedded Image and models MUCf and MUC for all locations.

Maize data set USP

This maize data set is comprised of 739 maize hybrids with different numbers of lines in each of the four sites (Embedded Image 731, Embedded Image 732, Embedded Image 731, Embedded Image 737). Locations P-IN and A-IN had relatively high correlations with the other locations, whereas A-LN had low ones (Table A1, Appendix 3). The residual variance components for GK are smaller than those for GB for models MM, MDs and MDe; for instance, MM-GK had Embedded Image = 0.589 while MM-GB had Embedded Image = 0.854. Similarly, the residual variance components for MDsEmbedded Image and MDeEmbedded Image with GK are lower than for MDsEmbedded Image and MDeEmbedded Image with GB. The variance components of the random intercept (Embedded Image) of GK methods are not negligible (as in data set HEL) and are always lower than for the corresponding GB methods (Table 4).

View this table:
  • View inline
  • View popup
Table 4 Maize USP data set. Estimated variance components for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs) and environment-specific variance G×E deviation model (MDe) with two kernels, GBLUP (GB) and Gaussian kernel (GK), with Embedded Image and without Embedded Image for grain yield (standard deviation in parentheses)

The estimated genetic variance components Embedded Image for GB in models MM, MDs and MDe were 0.214, 0.209, and 0.206, respectively (Table 4), increasing the genetic environmental stability (Embedded Imageof models MMEmbedded Image-GB, MDsEmbedded Image-GB, and MDeEmbedded Image-GB to 0.511, 0.513, and 0.514, respectively. The specific components for each environment of models MDe-GB and MDeEmbedded Image-GB were negligible. The variance component (Embedded Image) of the G×E models MDs and MDsEmbedded Image for GB and GK was also negligible.

In general, models with Embedded Image-GB had similar prediction accuracy as models with Embedded Image-GK, whereas the increase in prediction accuracy of models without Embedded Image-GK over models with GB is clear. For example, for P-LN, models MDsEmbedded Image-GK and MDsEmbedded Image-GB had prediction accuracies of 0.545 and 0.546, respectively, whereas for MDs-GK and MDs-GB, the prediction accuracies were 0.524 and 0.325 (Table 5 and Figure 2). Models with Embedded Image-GB showed significant improvement in prediction accuracy compared to models GB without Embedded Image; for example, for location P-IN, the mean prediction accuracies of MDsEmbedded Image-GB and MDs-GB were 0.591 and 0.368, respectively (due to the influence of Embedded Image = 0.349 for model MDsEmbedded Image-GB; see Table 4). All models with GK with the random intercept Embedded Image and with high values of Embedded Imagegave higher prediction accuracies than GK models without Embedded Image. There are no clear differences between model MUCf and the proposed model with the random component Embedded Image with GK and GB in all the locations. Similar results were found for model MUC when compared to models without Embedded Image. For this data set, results from CV1 (Table S2, Appendix 2) were all similar and lower than those obtained for CV2.

View this table:
  • View inline
  • View popup
Table 5 Maize USP data set. Mean Pearson’s correlation (50 partitions) of each environment for random cross-validation CV2, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)
Figure 2
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2

Plot of the prediction accuracy using Pearson’s correlation for each of the 4 environments (P-LN, P-IN, A-LN, A-IN) of maize data set USP for the proposed models MDel-GK, MDel-GB, MUCf-GK, MUCf-GB, MDe-GK, MDe-GB, MUC-GK, and MUC-GB.

In summary, results from maize data USP indicate that models with the random component Embedded Image (MDsEmbedded Image-GK and MDeEmbedded Image-GK) show higher mean prediction accuracy than models without Embedded Image and using the linear kernel GB. The G×E variance component of models MDs and MDsEmbedded Image with GK and GB had negligible Embedded Image, indicating less complex G×E than that found for maize data set HEL. The differences in the mean prediction accuracy between the proposed models with or without the random effect Embedded Image and models MUCf and MUC are small for models with GK and not clearly superior to the proposed models with GB.

Wheat data set WHE1

For this data set, environment E1 had negative correlations with the other environments (E2-E4), whereas environments E2-E4 had high correlations among themselves (Table A1, Appendix 3). Models with GK fitted the WHE1 data better than models with kernel GB (low residual variances of GK models as compared to GB models). Also, models with random component Embedded Image had lower residual variance components than models without Embedded Image. As opposed to the previous two maize data sets, where the magnitude of the variance components determines the prediction ability, the presence of environments with negative correlations with other environments makes interpreting the variance components in relation to their predictive ability not as straightforward as in the previous two data sets (Table 6). For example, models MMEmbedded Image and MM with GK and GB had estimates of the random error variance that were much higher (∼0.8) than those of the other models; thus the prediction accuracy of these models is expected to be low for at least the environments with negative correlations.

View this table:
  • View inline
  • View popup
Table 6 Wheat WHE1 data set. Estimated variance components for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs) and environment-specific variance G×E deviation model (MDe) with two kernels, GBLUP (GB) and Gaussian (GK), with Embedded Image and without Embedded Image for grain yield (standard deviation in parentheses)

The genetic variance component Embedded Image varied for models MM-GB, MDs-GB, and MDe-GB (0.192, 0.219, and 0.414, respectively) as well as for the GK models (0.599, 0.752, and 1.404, respectively). The contribution of Embedded Image measured in Embedded Image was small for MDsEmbedded Image-GK and MDsEmbedded Image-GB (0.101 and 0.107) (Table 6) and negligible for the other models with Embedded Image. On the other hand, the G×E interaction variance components Embedded Image for GK and GB are important (MDsEmbedded Image-GK Embedded Image = 1.637, MDsEmbedded Image-GB Embedded Image = 0.42; MDs-GK Embedded Image = 1.349, MDs-GB Embedded Image = 0.349) and much higher than in the two maize data sets. Models MDeEmbedded Image-GK and MDeEmbedded Image-GB showed high specific variance components for E1 (3.356 and 1.058, respectively) and for E4 (1.147 and 0.3) causing most of the interaction in this data set (these are the environments with the lowest sample correlations with the other environments) and contributed the least to genetic environmental stability.

Models with G×E (MDs and MDe) had mean prediction accuracies higher than MM models with lower mean prediction accuracy in E1 and E4 as compared with E2 and E3 (Table 7 and Figure 3). The exceptions are models MM-GB and MM-GK, which had higher prediction accuracy than models MDs-GB and MDS-GB in E3. Models MDeEmbedded Image-GK and MDe-GK had higher prediction accuracy than models MM, MDs and MDe with and without Embedded Image for GB and GK in all locations, except MDe-GK in E1. However, in all cases and environments, models MUCf and MUC had better prediction accuracies than all 12 genomic model-method combinations (Figure 3). Lower prediction accuracies were found for CV1 (Table S3, Appendix 2) than for CV2; however, the decrease in prediction accuracy of CV1 was lower than for the two wheat data sets.

View this table:
  • View inline
  • View popup
Table 7 WHEAT WHE1 data set. Mean Pearson’s correlation (50 partitions) of each environment for random cross-validation CV2, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)
Figure 3
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3

Plot of the prediction accuracy using Pearson’s correlation for each of the 4 environments (E1-E4) of wheat data set WHE1 for the proposed models, MDel-GK, MDel-GB, MUCf-GK, MUCf-GB, MDe-GK, MDe-GB, MUC-GK, and MUC-GB.

In summary, G×E for this data set is more complex than for the two previous maize data sets. This is expressed by higher values of Embedded Image (given by models MDsEmbedded Image and MDs) compared to those computed for the maize data sets, as well as the higher values of the variance components specific to environments (Embedded Image and Embedded Image) compared to those computed for other environments in this data set, as well as in the maize data sets. For the 12 model-method combinations, the models with the highest prediction accuracy for the environments were MDeEmbedded Image and MDe. However, models MUf and MUC had the highest prediction accuracy for each environment and for both methods, GK and GB.

Wheat data set WHE5

This data set has sample phenotypic correlations between environments that are close to zero or negative (Table A1, Appendix 3). Only one high phenotypic correlation was observed between environments 5iBN and 5iFN (0.546). Table 8 shows the high residual variance components of models MMEmbedded Image-GK, MMEmbedded Image-GB, MM-GK and MM-GB, whereas for models incorporating G×E (MDs and MDe with GK and GB and with and without Embedded Image), the residual variance components were much smaller.

View this table:
  • View inline
  • View popup
Table 8 WHEAT WHE5 data set. Estimated variance components for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs) and environment-specific variance G×E deviation model (MDe) with two kernels, GBLUP (GB) and Gaussian kernel (GK), with Embedded Image and without Embedded Image (Sousa et al. 2017), for grain yield (standard deviation in parentheses)

The variance components of the genetic main effects with GB and Embedded Image were low (0.064 and 0.061 for MDsEmbedded Image-GB and MDeEmbedded Image-GB, respectively), indicating low exchange of information between environments. The most influential variance components were related to the G×E, Embedded Image. For example, for models MDs-GB, the variance component Embedded Image is 0.618 and 0.636 for MDsEmbedded Image-GB, whereas it increases to Embedded Image = 1.482 for MDs-GK and to Embedded Image = 1.49 for MDsEmbedded Image-GK (Table 8); this result indicates the importance of G×E interaction. The influence of the random component Embedded Image in this data set is negligible. The variance components related to specific environments are similar for the five environments and for MDe models with and without random component Embedded Image.

Among the 12 model-method combinations, the best predictive models were MDeEmbedded Image-GK and MDe-GK in all locations (Table 9, Figure 4). However, models MDsEmbedded Image-GK and MDs-GK also had relatively high prediction accuracies that were very similar to those of models MDeEmbedded Image-GK and MDe-GK. Similar results were found for models with linear kernel GB (Table 9). Models with the random intercept Embedded Image showed no increase in prediction accuracy (values of Embedded Image close to zero) as compared to models without Embedded Image.

View this table:
  • View inline
  • View popup
Table 9 WHEAT WHE5 data set. Mean Pearson’s correlation (50 partitions) of each environment for random cross-validation CV2, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)
Figure 4
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4

Plot of the prediction accuracy using Pearson’s correlation for each of the 5 environments (0iFN, 2iBH, 5iBH, 5iBN, 5iFN) of wheat data set WHE5 for the proposed models MDel-GK, MDel-GB, MUCf-GK, MUCf-GB, MDe-GK, MDe-GB, MUC-GK, and MUC-GB.

The comparison of the prediction accuracy of these 12 model-method combinations with the mean prediction accuracy of models MUCf and MUC (Figure 4) indicated the higher mean prediction accuracy of MUCf and MUC over the mean prediction accuracy of the proposed models with (or without) the random effect Embedded Image. For this data set, the prediction accuracies of CV1 were similar to those found under CV2 (Table S4, Appendix 2).

In summary, the complex G×E interaction in this data set is expressed by the large variance component Embedded Image. Models with random component Embedded Image did not increase the prediction accuracy of the corresponding models without Embedded Image (reflected in their values of Embedded Image close to zero). Of the 12 model-method combinations, models MDeEmbedded Image-GK and MDe-GK gave the highest prediction accuracies. However, the best predictive models overall and for each environment were MUf and MUC.

Discussion

Effect of random component Embedded Image

From a statistical perspective, the mixed models can better explain the variation among lines in environments (G×E) by considering two factors: environments and lines. The environmental effects (Embedded Image) are considered as fixed effects with the relationshipEmbedded Image; however, the effects of the lines are considered random in Embedded Image for model MMEmbedded Image. The Embedded Image is the common random effect of each line derived from the markers and Embedded Image is considered the random intercept for each line. If we make the transformation Embedded Image, as in López-Cruz et al. (2015), then Embedded Image0, Embedded Image where matrix Embedded Image comprises submatrices (or blocks) where the submatrices off the block diagonal generated the exchange of information between environments with positive correlations. As discussed by López-Cruz et al. (2015), this exchange of information is not effective when there are negative correlations between sites (or environments) due to the fact that they are based on Embedded Image. Similarly, if Embedded Image, then Embedded Image0, Embedded Image and the exchange of information occurred in the submatrices off the block diagonal between the environments with positive correlations and when Embedded Image is not zero. On the other hand, in models MDsEmbedded Image and MDeEmbedded Image, the component Embedded Image has influence only when there is exchange of information across environments and the G×E is simple; otherwise, as in the WHE5 data set, the contribution of Embedded Image is negligible when the G×E is complex.

The random effects l are independent and identically distributed (iid) thus do not have the possibility of exchanging of information from tested lines to untested lines and therefore do not have any estimate of these values if no evaluation data on a line exists (CV1). Then, when trying to predict values of untested lines, only available information between lines come from the g part of the model. In a number of cases, substantial variation for the l effects were found suggesting that the additive part of the model (g) is not capturing the total genetic value very well. In these cases, since usually the GK method did as well as the GB with l model, there is a major advantage to the GK method in that it can better predict untested genotypes since the marker information is being used in a way that captures more of the genetic variation. On the other hand, if the breeder is concerned about gain from selection following intermating and generating a new population, the breeder should only be selecting based on the additive breeding values and realizing that the breeding values are not the complete genotypic value (commercial value), such that response to selection after intermating will be less than expected based on total genetic variance.

Effects of including G×E interaction

In general, results show that when GBLUP is used for prediction under random cross-validation CV2, models MDsEmbedded Image-GB and MDeEmbedded Image-GB that incorporate G×E had higher prediction accuracy than models MDs-GB and MDe-GB also with G×E. This improvement depends on Embedded Image and the magnitude of the correlations between environments. For maize data sets (HEL and USP) with positive sample correlations between environments, models MDsEmbedded Image-GB and MDeEmbedded Image-GB had higher prediction accuracy than models MDs-GB and MDe-GB, whereas in wheat data set WHE1, models MDsEmbedded Image and MDeEmbedded Image had better prediction accuracy than models MDs and MDe only in environments with positive correlations. Finally, for environments in wheat data set WHE5 with negligible Embedded Image, the accuracy of models MDsEmbedded Image-GB and MDeEmbedded Image-GB did not improve much over that of models MDs-GB and MDe-GB without Embedded Image.

Effects of including the Gaussian kernel

In general, models MDs and MDe with the Gaussian kernel (GK) had higher prediction accuracy than models with GB, although these differences were smaller for models MDsEmbedded Image and MDeEmbedded Image. When GK models were better than GB models, results show that Embedded Image was negligible for GK models and when the prediction accuracy of MDsEmbedded Image and MDeEmbedded Image was only slightly superior to that of models MDs and MDe (as in maize data set HEL). On the contrary, when using GK, the prediction accuracy was not better than when using GB, as in the case of maize data set USP; then the contribution of Embedded Image was important and the prediction accuracy of MDsEmbedded Image and MDeEmbedded Image was superior to that of their counterparts MDs and MDe. These results indicate that models with random intercepts are useful when used with the linear kernel (GB) but not when used with the Gaussian kernel (GK). This is because the GK method without Embedded Image explains most of the genetic variance (additive and epistasis effects) between lines with negligible genetic residuals that are not picked up by the l.

The effect of the sample covariance among environments

The behavior of the covariance between observations of the ith line in the jth and j’th environments explains some of the results obtained in the four data sets. The covariance between Embedded Image and Embedded Image of models MM, MDs and MDe is the same; it is determined by the genetic variance component Embedded Image. It would be expected that the estimate of Embedded Image would be proportional to the sample covariance of the observations. This only occurred when the sample covariances were positive because Embedded Image can take only positive values; when the sample covariances between some environments are negative, this distorts the estimations of the genetic variance component (Embedded Image) and therefore affects the prediction accuracy of the unobserved phenotypes of the lines in the testing set.

On the other hand, when the sample covariance between Embedded Image and Embedded Image of models MMEmbedded Image, MDsEmbedded Image and MDeEmbedded Image is determined by the summation Embedded Image + Embedded Image, then the higher Embedded Image, the higher the estimated sample covariance (association) of the lines in environments and, therefore, the higher the prediction accuracy compared with those achieved by models MM, MDs and MDe (without the random effect Embedded Image. Again, the presence of negative sample covariances distorts the behavior of the estimated genetic variance components and this negatively affects the prediction accuracy of these models.

Models With G×E With the Kronecker product vs. models With G×E With the Hadamard product

Less restrictive G×E genomic-enabled prediction models that allow any covariance value between environments had better prediction accuracy than models with more restrictive assumptions at the level of association between lines in environments affecting the estimation of the genetic variance components. Less restrictive models consider variance-covariance matrices represented by the Kronecker product of the variances and covariances of the environmental and genetic values (with the linear or non-linear kernels constructed with the markers) (Burgueño et al. 2012; Cuevas et al. 2017). When a random intercept (Embedded Image is added to these models based on the Kronecker product (Cuevas et al. 2017), the genomic-enabled prediction accuracy increased for random cross-validation CV2 and for environments with negative sample covariance. These advantages of the G×E genomic-enabled prediction models using the Kronecker product for defining variance-covariance environmental matrices with negative or zero environmental relationship over the Hadamard product defined by models MDsEmbedded Image and MDeEmbedded Image are less when sample covariances between environments are all positive. The disadvantages of models with Kronecker products are that defining and measuring environmental stability is not clear, plus they demand higher computing resources compared to G×E genomic-enabled prediction models using the Hadamard product.

Required computing time for fitting the models

We performed all the analyses in an Ubuntu Linux server with 256 GB of RAM and 32 CPUs core. To compare the computing time, we counted the mean computing time in seconds for fitting one random partition for random cross-validation CV1 for the maize data set HELIX with the same number of 50 partitions and the same number of iterations in the model. For the models with G×E without Embedded Imageor f, the mean computing time for one random partition was 290, 319, and 3110 for models MDs, MDe, and MUC, respectively. For models with G×E with random intercept Embedded Image or f, the mean computing time for one random partition was 489, 541, and 4938 for models MDsEmbedded Image, MDeEmbedded Image, and MUCf, respectively. The differences in computing time between models MDs and MDe are low, but for model MUC, the required mean computing time needed to fit the model increased 10 times for one random partition.

Advantages and disadvantages of the proposed models

In general, G×E genomic-enabled prediction models MDsEmbedded Image and MDeEmbedded Image had similar prediction accuracy and, in both cases, environmental stability and G×E can be assessed and measured. Furthermore, in models MDsEmbedded Image and MDeEmbedded Image, when the sample correlation among environments is positive, their prediction accuracy is similar or slightly higher than the accuracy achieved with the more flexible Kronecker product models (Burgueño et al. 2012; Cuevas et al. 2017) for the variance-covariance matrices. The advantage of models MDsEmbedded Image and MDeEmbedded Image with the Hadamard product for the variance-covariance is that they can perform highly dimensional matrix operations very fast and, therefore, save time when fitting these models. The BGGE software developed by Granato et al. (2017) is indeed an example of this efficiency for fitting models MDsEmbedded Image and MDeEmbedded Image by means of the Hadamard product.

When the main objective is prediction accuracy, we recommend checking for sample covariance (or correlations) between environments before using MDsEmbedded Image and MDeEmbedded Image G×E genomic-enabled prediction models. Models MDsEmbedded Image, MDeEmbedded Image, MDs and MDe are recommended when the sample correlations are positive and not close to zero. We also recommend fitting models MDsEmbedded Image, and MDeEmbedded Image to the training set and estimating the variance component of the random intercept Embedded Image; if it is negligible, only models MDs and MDe should be used. When the number of lines in each environment is not the same, models MDsEmbedded Image, MDeEmbedded Image, MDs, and MDe can be efficiently fitted with the BGGE software, whereas models MUCf and MUC of Cuevas et al. (2017) with an unbalanced number of lines in each environments require intensive computational resources.

CONCLUSIONS

Results indicate that when the sample phenotypic correlations between environments were intermediate to moderate (HEL, USP), models with G×E with random intercept Embedded Image (MDsEmbedded Image, MDeEmbedded Image and Gaussian kernel (GK) had the advantages of other models without their disadvantages. These models allow: (i) finding regions of the chromosomes with environmental stability (Jarquín et al. 2014; López-Cruz et al. 2015), (ii) the fitted computing time is fast (Granato et al. 2017), and (iii) increasing the prediction accuracy in the CV2 to a level of the Gaussian kernels of Cuevas et al. (2016) and Sousa et al. (2017) or other more flexible models such as those used by Burgueño et al. (2012) and Cuevas et al. (2017). For sample low or negative phenotypic correlations like in data sets WHE1, WHE5, the prediction accuracy of model MUCf with GK of Cuevas et al. (2017) is the one that should be used.

Including the random intercept Embedded Image for each line made it possible to capture some extra genetic variability. Models MDs and MDe assessed the complexity of the genomic G×E present in the two maize data sets (with all environments with positive correlations) by means of the Hadamard product between markers and environments as in models from Jarquín et al. (2014) (MM, and MDs) and López-Cruz et al. (2015) (MDe). For the two maize data sets with positive sample correlations among environments, the Hadamard models MM, MDs and MDe with Embedded Image had similar prediction accuracies as models MUCf and MUC that use a Kronecker product for assessing G×E. The advantage of models MMEmbedded Image, MDsEmbedded Image, and MDeEmbedded Image over models MUCf and MUC is shorter computing time when the number of lines in different environments is very unbalanced, as in the case of the two maize data sets.

For the two wheat data sets, the number of lines in each environment is the same. However, in view of the fact that the sample correlation among environments is not positive for all pair-wise environment combinations, using models MM, MDs and MDe with or without Embedded Image is less favorable than using models MUCf and MUCwith a Kronecker product for modeling G×E. The reduced prediction accuracy of the Hadamard product models vs. the Kronecker product models indicated the flexibility of models MUCf and MUC for assessing complex G×E multi-environment data sets. Regardless of: (i) whether Embedded Image is included or not, and (ii) the type of data set at hand (with more or less complex G×E) and the balanced or unbalanced data structure, the prediction accuracy of the Gaussian kernel was better than the prediction accuracy of the linear kernel GBLUP for all four data sets.

APPENDIX 1

The multi-environment main genotypic effect model (MM)

The multi-environment model (MM) considers the fixed effects of environments (Embedded Image), as well as the random genetic effects across environments (Embedded Image)Embedded Image(1)where Embedded Imageis a vector with the observations Embedded Image of the jth environment Embedded Imageeach of size Embedded Image, such that one line in one environment represents the Embedded Image observation of the Embedded Imageth line Embedded Image in the jth environment. The scalar Embedded Imageis a general mean and the vector Embedded Image is of size Embedded Image The fixed effects of the environment for the data used in this study are modeled with the incidence matrix of the environments Embedded Image of order Embedded Image, where the parameters to be estimated are the intercept for each environment (Embedded Image) with the vector Embedded Image of order Embedded Image. Incorporating other fixed effects into the model is straightforward.

The random vector of genetic effects Embedded Imagefollows a multivariate normal distribution with mean zero and a covariance matrixEmbedded Image, that is, Embedded Image, where the vector Embedded Image of order Embedded Image represents the genetic random effects across all environments for each lineEmbedded Image and the kernel matrix Embedded Image is a symmetric semidefinite positive matrix constructed with molecular markers of order Embedded Image. If the number of lines is the same in each environment, then Embedded Image; otherwise, when there are different numbers of lines in each environment, Embedded Image represents the number of unique lines included in the model in some environments. The incidence matrix Embedded Image connects genotypes with phenotypes for each environment, with order Embedded Image. Variance component Embedded Image is the genetic variance of the lines across all environments and represents the sensitivity or environmental stability. Finally, the random errors are assumed to be homoscedastic and independent, Embedded Image, where Embedded Image is the error variance.

The multi-environment single variance genotype × environment interaction deviation model (MDs)

This model adds to the MM model the random interaction effects of the environments with the genetic information of the lines (Embedded Image) (Sousa et al., 2017; Jarquín et al., 2014):Embedded Image(2)The vector of random effects G×E interaction, Embedded Image, has a multivariate normal distribution, Embedded Image), where (Embedded Image is the Hadamard product operator, and Embedded Image is the variance component of the G×E interaction. Matrix Embedded Imageis a block diagonal constructed with the matrices Embedded Image (Embedded Image…Embedded Imagefor each environment; therefore, there is no exchange (borrowing) of information between environments:

Embedded Image

Multi-environment environment-specific variance genotype × environment deviation model (MDe)

The multi-environment, environment-specific variance genotype × environment deviation model (MDe) (López-Cruz et al., 2015) differs from MDs in how the random interaction component is modeled:Embedded Image(3)where Embedded Image is the main genetic effect across all the environments and Embedded Image represents the specific genetic effects in each environment such that Embedded Image, where Embedded Image is a matrix block diagonal generated with individuals included in each environment:Embedded Imagewith a variance component specific for each environment Embedded Image(Sousa et al., 2017).

APPENDIX 2

Proposed models with random effects Embedded Image and f
Location*MMEmbedded Image-GKMMEmbedded Image-GBMDsEmbedded Image-GKMDsEmbedded Image-GBMDeEmbedded Image-GKMDeEmbedded Image-GBMUCf-GK MUCf-GB
IP0.571 (0.1)0.439 (0.12)0.745 (0.05)0.644 (0.1)0.749 (0.06)0.634 (0.09)0.756 (0.06)0.659 (0.06)
NM0.503 (0.08)0.354 (0.09)0.532 (0.08)0.385 (0.11)0.525 (0.09)0.365 (0.11)0.537 (0.08)0.381 (0.11)
PM0.661 (0.06)0.574 (0.07)0.753 (0.05)0.682 (0.07)0.753 (0.04)0.685 (0.05)0.751 (0.04)0.685 (0.05)
SE0.347 (0.09)0.202 (0.11)0.505 (0.08)0.370 (0.1)0.513 (0.08)0.366 (0.09)0.489 (0.09)0.349 (0.09)
SO0.442 (0.1)0.287 (0.09)0.552 (0.08)0.402 (0.1)0.552 (0.08)0.395 (0.12)0.551 (0.08)0.39 (0.09)
Proposed models without random effects Embedded Image and f
Location*MMEmbedded Image-GKMM-GBMDs-GKMDs-GBMDe-GKMDe-GBMUC-GK MUC-GB
IP0.575 (0.09)0.426 (0.11)0.752 (0.06)0.607 (0.08)0.755 (0.05)0.618 (0.09)0.758 (0.05)0.641 (0.08)
NM0.506 (0.09)0.361 (0.07)0.54 (0.09)0.394 (0.08)0.538 (0.09)0.394 (0.08)0.545 (0.06)0.391 (0.1)
PM0.662 (0.06)0.533 (0.07)0.758 (0.05)0.662 (0.07)0.754 (0.05)0.671 (0.04)0.754 (0.04)0.669 (0.05)
SE0.346 (0.1)0.219 (0.1)0.527 (0.06)0.321 (0.1)0.524 (0.08)0.339 (0.09)0.505 (0.07)0.319 (0.11)
SO0.455 (0.1)0.293 (0.11)0.576 (0.07)0.376 (0.1)0.555 (0.09)0.383 (0.11)0.56 (0.07)0.377 (0.1)
  • ↵* Locations are: IP: Ipiaçú-MG, NM: Nova Mutum-MT, PM: Pato de Minas-MG, SE: Sertanópolis-PR, and SO: Sorriso-MT.

Table S1. Maize HEL data set. Mean Pearson’s correlation (50 partitions) of each location for random cross-validation CV1, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)
Proposed models with random effects Embedded Image and f
Environment*MMEmbedded Image-GKMMEmbedded Image-GBMDsEmbedded Image-GKMDsEmbedded Image-GBMDeEmbedded Image-GKMDeEmbedded Image-GBMUCf-GK MUCf-GB
P-LN0.28 (0.06)0.272 (0.07)0.307 (0.06)0.293 (0.07)0.303 (0.06)0.286 (0.07)0.294 (0.07)0.286 (0.06)
P-IN0.304 (0.06)0.298 (0.08)0.335 (0.08)0.329 (0.06)0.335 (0.07)0.332 (0.08)0.331 (0.08)0.327 (0.06)
A-LN0.287 (0.07)0.283 (0.05)0.305 (0.08)0.31 (0.06)0.303 (0.06)0.309 (0.06)0.321 (0.08)0.309 (0.07)
A-IN0.389 (0.07)0.386 (0.08)0.42 (0.07)0.413 (0.07)0.425 (0.07)0.422 (0.06)0.418 (0.05)0.417 (0.07)
Proposed models without random effects Embedded Image and f
Environment*MMEmbedded Image-GKMM-GBMDs-GKMDs-GBMDe-GKMDe-GBMUC-GK MUC-GB
P-LN0.286 (0.07)0.278 (0.05)0.305 (0.05)0.289 (0.07)0.313 (0.08)0.295 (0.07)0.311 (0.06)0.30 (0.06)
P-IN0.285 (0.08)0.313 (0.06)0.324 (0.06)0.332 (0.07)0.324 (0.07)0.33 (0.05)0.318 (0.05)0.341 (0.06)
A-LN0.262 (0.07)0.292 (0.07)0.278 (0.06)0.313 (0.06)0.285 (0.07)0.308 (0.06)0.300 (0.06)0.318 (0.07)
A-IN0.365 (0.06)0.391 (0.07)0.395 (0.06)0.415 (0.07)0.403 (0.07)0.417 (0.06)0.406 (0.05)0.424 (0.07)
  • ↵* Environments are: Anhumas ideal N (A-IN), Anhumas low N (A-LN), Piracicaba ideal N (P-IN) and Piracicaba low N (P-LN)

Table S2. Maize USP data set. Mean Pearson’s correlation (50 partitions) of each location for random cross-validation CV1, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)
Proposed models with random effects Embedded Image and f
EnvironmentMMEmbedded Image-GKMMEmbedded Image-GBMDsEmbedded Image-GKMDsEmbedded Image-GBMDeEmbedded Image-GKMDeEmbedded Image-GBMUCf-GK MUCf-GB
E10.048 (0.06)0.054 (0.07)0.545 (0.05)0.512 (0.05)0.558 (0.05)0.510 (0.05)0.560 (0.05)0.515 (0.06)
E20.397 (0.06)0.405 (0.06)0.49 (0.06)0.476 (0.06)0.48 (0.05)0.474 (0.05)0.472 (0.05)0.478 (0.05)
E30.368 (0.07)0.373 (0.06)0.405 (0.05)0.366 (0.06)0.416 (0.06)0.399 (0.05)0.413 (0.06)0.386 (0.06)
E40.341 (0.06)0.329 (0.05)0.472 (0.04)0.439 (0.05)0.467 (0.05)0.441 (0.06)0.464 (0.06)0.450 (0.04)
Proposed models without random effects Embedded Image and f
EnvironmentMMEmbedded Image-GKMM-GBMDs-GKMDs-GBMDe-GKMDe-GBMUC-GK MUC-GB
E10.066 (0.06)0.049 (0.06)0.544 (0.05)0.472 (0.06)0.539 (0.04)0.495 (0.05)0.571 (0.04)0.513 (0.04)
E20.416 (0.06)0.414 (0.06)0.476 (0.05)0.475 (0.06)0.472 (0.05)0.464 (0.05)0.465 (0.05)0.454 (0.05)
E30.377 (0.05)0.384 (0.05)0.397 (0.05)0.388 (0.06)0.423 (0.05)0.392 (0.05)0.405 (0.05)0.381 (0.05)
E40.339 (0.05)0.339 (0.05)0.469 (0.04)0.437 (0.04)0.46 (0.05)0.416 (0.05)0.456 (0.05)0.418 (0.05)
Table S3. Wheat data set WHE1. Mean Pearson’s correlation (50 partitions) of each location for random cross-validation CV1, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)
Proposed models with random effects Embedded Image and f
Environment*MMEmbedded Image-GKMMEmbedded Image-GBMDsEmbedded Image-GKMDsEmbedded Image-GBMDeEmbedded Image-GKMDeEmbedded Image-GBMUCf-GK MUCf-GB
0iFN0.348 (0.05)0.301 (0.05)0.601 (0.04)0.553 (0.03)0.611 (0.04)0.555 (0.04)0.614 (0.03)0.554 (0.03)
2iBN0.217 (0.05)0.201 (0.05)0.474 (0.04)0.431 (0.04)0.47 (0.05)0.439 (0.04)0.475 (0.04)0.448 (0.04)
5iBH0.321 (0.05)0.35 (0.05)0.67 (0.03)0.635 (0.03)0.668 (0.03)0.634 (0.03)0.679 (0.03)0.633 (0.03)
5iBN0.163 (0.06)0.136 (0.06)0.399 (0.04)0.353 (0.05)0.395 (0.05)0.345 (0.06)0.401 (0.04)0.358 (0.05)
5iFN0.084 (0.06)0.082 (0.06)0.334 (0.04)0.309 (0.04)0.328 (0.05)0.306 (0.05)0.336 (0.05)0.315 (0.04)
Proposed models without random effects Embedded Image and f
Environment*MMEmbedded Image-GKMM-GBMDs-GKMDs-GBMDe-GKMDe-GBMUC-GK MUC-GB
0iFN0.341 (0.05)0.288 (0.05)0.61 (0.04)0.562 (0.03)0.612 (0.03)0.557 (0.03)0.625 (0.04)0.558 (0.04)
2iBN0.205 (0.05)0.216 (0.05)0.478 (0.05)0.439 (0.05)0.473 (0.05)0.436 (0.04)0.476 (0.05)0.429 (0.06)
5iBH0.323 (0.04)0.333 (0.05)0.67 (0.02)0.624 (0.03)0.662 (0.03)0.627 (0.03)0.680 (0.03)0.638 (0.03)
5iBN0.171 (0.05)0.163 (0.05)0.397 (0.05)0.357 (0.04)0.405 (0.04)0.356 (0.04)0.407 (0.04)0.354 (0.05)
5iFN0.107 (0.05)0.114 (0.06)0.33 (0.05)0.311 (0.04)0.329 (0.05)0.307 (0.05)0.337 (0.04)0.303 (0.04)
  • ↵* Environments are described by a sequence of codes: 0i, 2i and 5i denote the number of irrigation; B/F denotes whether the planting system was ‘bed’ (B) or ‘flat’ (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat).

Table S4. Wheat data set WHE5. Mean Pearson’s correlation (50 partitions) of each location for random cross-validation CV1, for the multi-environment models, main genotypic effect model (MM), single variance G×E deviation model (MDs), environment-specific variance G×E deviation model (MDe), multi-environment unstructured covariance models (MUC and MUCf) with two kernels, GBLUP (GB) and Gaussian kernel (GK) for grain yield with the proposed random effect Embedded Image and without the random effect Embedded Image (standard deviation in parentheses)

APPENDIX 3

HEL (452 maize lines) (Sousa et al. 2017)
Location*Ipiaçú (IP) (247)Nova Mutum (NM) (330)Pato de Minas (PM) (452)Sertanópolis (SE) (367)Sorriso (SO) (330)
Nova Mutum (NM)0.46————
Pato de Minas (PM)0.510.44———
Sertanópolis (SE)0.290.360.30——
Sorriso (SO)0.430.480.390.38—
USP (739 maize lines) (Sousa et al. 2017)
EnvironmentPiracicaba-LN (P-LN) (731)Piracicaba-IN (P-IN) (732)Anhumas-LN (A-LN) (731)Anhumas-IN (L-IN) (737)
Piracicaba-IN (P-LN)0.54———
Anhumas-LN (P-IN)0.310.35——
Anhumas-IN (A-IN)0.430.470.47—
WHE1 (599 wheat lines)
Location*E1E2E3E4
E2−0.19———
E3−0.190.661——
E4−0.120.4110.388—
WHE5 (807 wheat lines)
Location*0iFN2iBN5iBH5iBN5iFN
2iBN0.166————
5iBH0.30−0.033———
5iBN−0.100.122−0.091——
5iFN−0.010.0350.0230.546—
  • ↵* Locations in HEL data set are: IP: Ipiaçú-MG, NM: Nova Mutum-MT, PM: Pato de Minas-MG, SE: Sertanópolis-PR, and SO: Sorriso-MT. Locations in USP data set are: IN = ideal Nitrogen; LN = low nitrogen. In WHE5 data set, environments are described by a sequence of codes: 0i, 2i and 5i denote the number of irrigations; B/F denotes whether the planting system was ‘bed’ (B) or ‘flat’ (F); N/H denotes whether planting date was normal (N) or late (H, simulating heat).

Table A1. Table A1. Phenotypic Pearson’s correlations among locations for grain yield for the four data sets HEL (maize), USP (maize), WHE1 (wheat), WHE2 (wheat). For HEL and USP maize data sets, the number in parentheses below each location’s name indicates the number of lines sown. For the two data sets in the wheat experiments (WHE1 and WHE2), the number of wheat lines is given in parentheses

Footnotes

  • Communicating Editor: J. Holland

  • Received November 19, 2017.
  • Accepted February 21, 2018.
  • Copyright © 2018 Cuevas et al.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Literature Cited

  1. ↵
    1. Burgueño J.,
    2. Crossa J.,
    3. Cornelius P. L.,
    4. Trethowan R.,
    5. McLaren G.,
    6. et al.
    , 2007 Modeling additive x environment and additive x additive x environment using genetic covariance of relatives of wheat genotypes. Crop Sci. 47: 311–320.
    OpenUrlCrossRefWeb of Science
  2. ↵
    1. Burgueño J.,
    2. de los Campos G.,
    3. Weigel K.,
    4. Crossa J.
    , 2012 Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci. 52: 707–719.
    OpenUrlCrossRef
  3. ↵
    1. Crossa J.,
    2. Burgueño J.,
    3. Cornelius P. L.,
    4. Trethowan R.,
    5. Krishnamachari A.
    , 2006 Modeling genotype × environment interaction using additive genetic covariances of relatives for predicting breeding values of wheat genotypes. Crop Sci. 46: 1722–1733.
    OpenUrlCrossRefWeb of Science
  4. ↵
    1. Crossa J.,
    2. de los Campos G.,
    3. Pérez P.,
    4. Gianola D.,
    5. Burgueño J.,
    6. et al.
    , 2010 Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186: 713–724. doi:10.1534/genetics
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. Crossa J.,
    2. Pérez P.,
    3. de los Campos G.,
    4. Mahuku G.,
    5. Dreisigacker S.,
    6. et al.
    , 2011 Genomic selection and prediction in plant breeding. J. Crop Improv. 25: 239–261.
    OpenUrlCrossRef
  6. ↵
    1. Crossa J.,
    2. Beyene Y.,
    3. Kassa S.,
    4. Pérez-Rodríguez P.,
    5. Hickey J. M.,
    6. Chen C.,
    7. de los Campos G.,
    8. Burgueño J.,
    9. Windhausen V. S.,
    10. Bucker E.,
    11. Jannink J-L.,
    12. López-Cruz M. A.,
    13. Babu R.
    , 2013 Genomic prediction in maize breeding populations with genotyping-by-sequencing. G3: Genes|Genomes|Genetics doi: 10.1534/g3.113.008227
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Crossa J.,
    2. de los Campos G.,
    3. Maccaferri M.,
    4. Tuberosa R.,
    5. Burgueño J.,
    6. et al.
    , 2016 Extending the marker × environment interaction model for genomic-enabled prediction and genome-wide association analysis in durum wheat. Crop Sci. 56(5): 2193–2209.
    OpenUrl
  8. ↵
    1. Cuevas J.,
    2. Crossa J.,
    3. Montesinos-Lopez O.,
    4. Burgueno J.,
    5. Pérez-Rodríguez P.,
    6. de los Campos G.
    , 2017 Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models. G3: Genes|Genomes|Genetics 7:41–53. doi: 10.1534/g3.116.035584
    OpenUrlAbstract/FREE Full Text
  9. ↵
    1. Cuevas J.,
    2. Crossa J.,
    3. Soberanis V.,
    4. Pérez-Elizalde S.,
    5. Pérez-Rodríguez P.,
    6. et al.
    , 2016 Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models. Plant Genome 9(3): 1–20. doi:10.3835/plantgenome2016.03.0024
    OpenUrlCrossRef
  10. ↵
    1. de los Campos G.,
    2. Naya H.,
    3. Gianola D.,
    4. Crossa J.,
    5. Legarra A.,
    6. et al.
    , 2009 Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182: 375–385.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. de los Campos G.,
    2. Hickey J. M.,
    3. Pong-Wong R.,
    4. Daetwyler H. D.,
    5. Calus M. P. L.
    , 2013 Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193: 327–345.
    OpenUrlAbstract/FREE Full Text
  12. ↵
    de los Campos, G., and A. Grüneberg, 2016. MTM (Multiple-Trait Model) package. http://quantgen.github.io/MTM/vignette.html.
  13. ↵
    de los Campos, G., and P. Pérez-Rodríguez, 2016. BGLR: Bayesian generalized linear regression. R package version 1.0.5: https://CRAN.R.
  14. ↵
    Granato, I., J. Cuevas, and F. Luna, 2017. BGGE (Bayesian Genomics G×E). https://github.com/italo-granato/BGGE/tree/master/R.
  15. ↵
    1. Jarquín D.,
    2. Crossa J.,
    3. Lacaze X.,
    4. Cheyron P. D.,
    5. Daucourt J.,
    6. et al.
    , 2014 A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor. Appl. Genet. 127: 595–607. doi:10.1007/s00122-013-2243-1
    OpenUrlCrossRefPubMed
  16. ↵
    1. Jarquín D.,
    2. Lemes da Silva C.,
    3. Gaynor R. C.,
    4. Poland J.,
    5. Fritz A.,
    6. et al.
    , 2017 Increasing Genomic-Enabled Prediction Accuracy by Modeling Genotype × Environment Interactions in Kansas Wheat. Plant Genome 10. doi:10.3835/plantgenome2016.12.0130
    OpenUrlCrossRef
  17. ↵
    1. López-Cruz M.,
    2. Crossa J.,
    3. Bonnett D.,
    4. Dreisigacker S.,
    5. Poland J.,
    6. Jannink J.-L.,
    7. Singh R. P.,
    8. Autrique E.,
    9. de los Campos G.
    , 2015. Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3: Genes|Genomes|Genetics 5(4): 569–582.
  18. ↵
    1. Meuwissen T. H. E.,
    2. Hayes B. J.,
    3. Goddard M. E.
    , 2001 Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829.
    OpenUrlAbstract/FREE Full Text
  19. ↵
    1. Mota R.,
    2. Tempelman R.,
    3. Lopes P.,
    4. Aguilar I.,
    5. Silva F.,
    6. et al.
    , 2016 Genotype by environment interaction for tick resistence of Hereford and Braford beef cattle using reaction norm models. Genet. Sel. Evol. 48(3). doi:10.1186/s12711–015–0178–5
    OpenUrlCrossRef
  20. ↵
    1. Oakey H.,
    2. Cullis B.,
    3. Thompson R.,
    4. Comadran J.,
    5. Halpin C.,
    6. Waugh R.
    . 2016. Genomic Selection in Multi-environment Crop Trials. G3:Genes|Genomes|Genetics 6: 1313–1326 doi: 10.1534/g3.116.027524
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Pérez-Rodríguez P.,
    2. Gianola D.,
    3. González-Camacho J. M.,
    4. Crossa J.,
    5. Manès Y.,
    6. Dreisigacker S.
    , 2012. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3: Genes|Genomes|Genetics 2(12): 1595–605.
  22. ↵
    1. Pérez-Rodríguez P.,
    2. Crossa J.,
    3. Bondalapati K.,
    4. De Meyer G.,
    5. Pita F.,
    6. et al.
    , 2015 A Pedigree-Based Reaction Norm Model for Prediction of Cotton Yield in Multienvironment Trials. Crop Sci. 55: 1143–1151. doi:10.2135/cropsci2014.08.0577
    OpenUrlCrossRef
  23. ↵
    1. Pérez-Elizalde S.,
    2. Cuevas J.,
    3. Pérez-Rodríguez P.,
    4. Crossa J.
    , 2015 Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction. J. Agric. Biol. Environ. Stat. 20(4): 512–532.
    OpenUrl
  24. ↵
    1. Piepho H. P.
    , 1997 Analyzing genotype-environment data by mixed models with multiplicative effects. Biometrics 53: 761–766.
    OpenUrlCrossRefWeb of Science
  25. ↵
    1. Piepho H. P.
    , 1998 Empirical best linear unbiased prediction in cultivar trials using factor analytic variance covariance structure. Theor. Appl. Genet. 97: 195–201.
    OpenUrlCrossRefWeb of Science
  26. ↵
    1. Smith A. B.,
    2. Cullis B. R.,
    3. Thompson R.
    , 2005 The analysis of crop cultivar breeding and evaluation trials: an overview of current mixed model approaches. J. Agric. Sci. 143: 449–462. doi:10.1017/S0021859605005587
    OpenUrlCrossRef
  27. ↵
    1. Sousa M. B.,
    2. Cuevas J.,
    3. Couto E. G. O.,
    4. Pérez-Rodríguez P.,
    5. Jarquín D.,
    6. et al.
    , 2017 Genomic-enabled prediction in maize using kernel models with genotype × environment interaction. G3 7: 1995–2014. doi:10.1534/g3.117.042341
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. Sukumaran S.,
    2. Crossa J.,
    3. Jarquin D.,
    4. Lopes M.,
    5. Reynolds M. P.
    ., 2017. Genomic Prediction with Pedigree and Genotype × Environment Interaction in Spring Wheat Grown in South and West Asia, North Africa, and Mexico. G3: Genes|Genomes|Genetics, 7(2), 481–495. //doi.org/10.1534/g3.116.036251
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. VanRaden P. M.
    , 2007 Genomic measures of relationship and inbreeding. Interbull Bull. 37: 33–36.
    OpenUrl
  30. ↵
    1. VanRaden P. M.
    , 2008 Efficient methods to compute genomic predictions. J. Dairy Sci. 91: 4414–4423. doi:10.3168/jds.2007-0980
    OpenUrlCrossRefPubMedWeb of Science
View Abstract
Previous ArticleNext Article
Back to top

PUBLICATION INFORMATION

Volume 8 Issue 4, April 2018

G3: Genes|Genomes|Genetics: 8 (4)

ARTICLE CLASSIFICATION

Genomic Selection
View this article with LENS
Email

Thank you for sharing this G3: Genes | Genomes | Genetics article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Genomic-Enabled Prediction Kernel Models with Random Intercepts for Multi-environment Trials
(Your Name) has forwarded a page to you from G3: Genes | Genomes | Genetics
(Your Name) thought you would be interested in this article in G3: Genes | Genomes | Genetics.
Print
Alerts
Enter your email below to set up alert notifications for new article, or to manage your existing alerts.
SIGN UP OR SIGN IN WITH YOUR EMAIL
View PDF
Share

Genomic-Enabled Prediction Kernel Models with Random Intercepts for Multi-environment Trials

View ORCID ProfileJaime Cuevas, View ORCID ProfileItalo Granato, View ORCID ProfileRoberto Fritsche-Neto, Osval A. Montesinos-Lopez, Juan Burgueño, Massaine Bandeira e Sousa and View ORCID ProfileJosé Crossa
G3: Genes, Genomes, Genetics April 1, 2018 vol. 8 no. 4 1347-1365; https://doi.org/10.1534/g3.117.300454
Jaime Cuevas
Universidad de Quintana Roo, Chetumal, Quintana Roo, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jaime Cuevas
Italo Granato
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Italo Granato
Roberto Fritsche-Neto
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Roberto Fritsche-Neto
Osval A. Montesinos-Lopez
Facultad de Telemática, Universidad de Colima, CP 28040 Colima, Edo. de Colima, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan Burgueño
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT). Apdo. Postal 6-641, 06600 México DF, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Massaine Bandeira e Sousa
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José Crossa
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT). Apdo. Postal 6-641, 06600 México DF, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for José Crossa
  • For correspondence: j.crossa@cgiar.org
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation

Genomic-Enabled Prediction Kernel Models with Random Intercepts for Multi-environment Trials

View ORCID ProfileJaime Cuevas, View ORCID ProfileItalo Granato, View ORCID ProfileRoberto Fritsche-Neto, Osval A. Montesinos-Lopez, Juan Burgueño, Massaine Bandeira e Sousa and View ORCID ProfileJosé Crossa
G3: Genes, Genomes, Genetics April 1, 2018 vol. 8 no. 4 1347-1365; https://doi.org/10.1534/g3.117.300454
Jaime Cuevas
Universidad de Quintana Roo, Chetumal, Quintana Roo, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jaime Cuevas
Italo Granato
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Italo Granato
Roberto Fritsche-Neto
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Roberto Fritsche-Neto
Osval A. Montesinos-Lopez
Facultad de Telemática, Universidad de Colima, CP 28040 Colima, Edo. de Colima, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan Burgueño
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT). Apdo. Postal 6-641, 06600 México DF, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Massaine Bandeira e Sousa
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José Crossa
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT). Apdo. Postal 6-641, 06600 México DF, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for José Crossa
  • For correspondence: j.crossa@cgiar.org

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Related Articles

Cited By

More in this TOC Section

  • Accounting for Genotype-by-Environment Interactions and Residual Genetic Variation in Genomic Selection for Water-Soluble Carbohydrate Concentration in Wheat
  • Selection on Expected Maximum Haploid Breeding Values Can Increase Genetic Gain in Recurrent Genomic Selection
  • Genomic Predictions and Genome-Wide Association Study of Resistance Against Piscirickettsia salmonis in Coho Salmon (Oncorhynchus kisutch) Using ddRAD Sequencing
Show more Genomic Selection
  • Top
  • Article
    • Abstract
    • Materials and Methods
    • Results
    • Discussion
    • APPENDIX 1
    • APPENDIX 2
    • APPENDIX 3
    • Footnotes
    • Literature Cited
  • Figures & Data
  • Info & Metrics

GSA

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.

Online ISSN: 2160-1836

  • For Authors
  • For Reviewers
  • For Advertisers
  • Submit a Manuscript
  • Editorial Board
  • Press Releases

SPPA Logo

GET CONNECTED

RSS  Subscribe with RSS.

email  Subscribe via email. Sign up to receive alert notifications of new articles.

  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus

Copyright © 2018 by the Genetics Society of America

  • About G3
  • Terms of use
  • Permissions
  • Contact us
  • International access