Skip to main content
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus
  • Other GSA Resources
    • Genetics Society of America
    • Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org
  • Log in
G3: Genes | Genomes | Genetics

Main menu

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Contact us
  • SERIES
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Selection
    • Multiparental Populations
  • ARTICLE TYPES
    • About Article Types
    • Genome Reports
    • Meeting Reports
    • Mutant Screen Reports
    • Software and Data Resources
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Email alerts
    • RSS feeds
  • Other GSA Resources
    • Genetics Society of America
    • Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org

User menu

Search

  • Advanced search
G3: Genes | Genomes | Genetics

Advanced Search

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Contact us
  • SERIES
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Selection
    • Multiparental Populations
  • ARTICLE TYPES
    • About Article Types
    • Genome Reports
    • Meeting Reports
    • Mutant Screen Reports
    • Software and Data Resources
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Email alerts
    • RSS feeds
Previous ArticleNext Article

Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

Paulino Pérez-Rodríguez, Daniel Gianola, Juan Manuel González-Camacho, José Crossa, Yann Manès and Susanne Dreisigacker
G3: Genes, Genomes, Genetics December 1, 2012 vol. 2 no. 12 1595-1605; https://doi.org/10.1534/g3.112.003665
Paulino Pérez-Rodríguez
Colegio de Postgraduados, Montecillo, Texcoco 56230, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: perpdgo@gmail.com
Daniel Gianola
Departments of Animal Sciences, Dairy Science, and Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53706
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan Manuel González-Camacho
Colegio de Postgraduados, Montecillo, Texcoco 56230, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José Crossa
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yann Manès
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susanne Dreisigacker
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Supplemental
  • Info & Metrics
Loading

Abstract

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  • GenPred
  • Shared data resources

Genome-enabled prediction of complex traits based on marker data are becoming important in plant and animal breeding, personalized medicine, and evolutionary biology (Meuwissen et al. 2001; Bernardo and Yu 2007; de los Campos et al. 2009, 2010; Crossa et al. 2010, 2011; Ober et al. 2012). In the standard, infinitesimal, pedigree-based model of quantitative genetics, the family structure of a population is reflected in some expected resemblance between relatives. The latter is measured as an expected covariance matrix among individuals and is used to predict genetic values (e.g. Crossa et al. 2006; Burgueño et al. 2007, 2011). Whereas pedigree-based models do not account for Mendelian segregation and the expected covariance matrix is constructed using assumptions that do not hold (e.g. absence of selection and mutation and random mating), the marker-based models allow tracing Mendelian segregation at several positions of the genome and observing realized (as opposed to expected) covariances. This enhances the potential for improving the accuracy of estimates of genetic values, thus increasing the genetic progress attainable when these predictions are used for selection purposes in lieu of pedigree-based predictions. Recently, de los Campos et al. (2009, 2010) and Crossa et al. (2010, 2011) used Bayesian estimates from genomic parametric and semi-parametric regressions, and they found that models that incorporate pedigree and markers simultaneously had better prediction accuracy for several traits in wheat and maize than models based only on pedigree or only on markers.

The standard linear genetic model represents the phenotypic response of the ith individual (Embedded Image) as the sum of a genetic value, Embedded Image, and of a model residual, Embedded Image, such that the linear model for n individuals Embedded Image is represented as Embedded Image. However, building predictive models for complex traits using a large number of molecular markers (p) with a set of lines comprising individuals (n) with p ≫n is challenging because individual marker effects are not likelihood-identified. In this case, marker effects can be estimated via penalized parametric or semi-parametric methods or their Bayesian counterparts, rather than via ordinary least squares. This reduces the mean-squared error of estimates; it also increases prediction accuracy of out-of-sample cases and prevents over-fitting (de los Campos et al. 2010). In addition to the well-known Bayes A and B linear regression models originally proposed by Meuwissen et al. (2001) for incorporating marker effects into Embedded Image, there are several penalized parametric regression methods for estimating marker effects, such as ridge regression, the least absolute shrinkage and selection operator (LASSO), and the elastic net (Hastie et al. 2009). The Bayesian counterparts of these models have proved to be useful because appropriate priors can be assigned to the regularization parameter(s), and uncertainty in the estimations and predictions can be measured directly by applying the Bayesian paradigm.

Regression methods assume a linear relationship between phenotype and genotype, and they typically account for additive allelic effects only; however, evidence of epistatic effects on plant traits is vast and well documented (e.g. Holland 2001, 2008). In wheat, for instance, detailed analyses have revealed a complex circuitry of epistatic interactions in the regulation of heading time involving different vernalization genes, day-length sensitivity genes, and earliness per se genes, as well as the environment (Laurie et al. 1995; Cockram et al. 2007). Epistatic effects have also been found to be an important component of the genetic basis of plant height and bread-making quality traits (Zhang et al. 2008; Conti et al. 2011). It is becoming common to study gene × gene interactions by using a paradigm of networks that includes aggregating gene × gene interaction that exists even in the absence of main effects (McKinney and Pajewski 2012). Interactions between alleles at two or more loci could theoretically be represented in a linear model via use of appropriate contrasts. However, this does not scale when the number of markers (p) is large, as the number of 2-locus, 3-locus, etc., interactions is mind boggling.

An alternative approach to the standard parametric modeling of complex interactions is provided by non-linear, semi-parametric methods, such as kernel-based models (e.g. Gianola et al. 2006; Gianola and van Kaam 2008) or artificial neural networks (NN) (Okut et al. 2011; Gianola et al. 2011), under the assumption that such procedures can capture signals from high-order interactions. The potential of these methods, however, depends on the kernel chosen and on the neural network architecture. In a recent study, Heslot et al. (2012) compared the predictive accuracy of several genome-enabled prediction models, including reproducing kernel Hilbert space (RKHS) and NN, using barley and wheat data; the authors found that the non-linear models gave a modest but consistent predictive superiority (as measured by correlations between predictions and realizations) over the linear models. In particular, the RKHS model had a better predictive ability than that obtained using the parametric regressions.

The use of RKHS for predicting complex traits was first proposed by Gianola et al. (2006) and Gianola and van Kaam (2008). de los Campos et al. (2010) further developed the theoretical basis of RHKS with “kernel averaging” (simultaneous use of various kernels in the model) and showed its good prediction accuracy. Other empirical studies in plants have corroborated the increase in prediction accuracy of kernel methods (e.g. Crossa et al. 2010, 2011; de los Campos et al. 2010; Heslot et al. 2012). Recently, Long et al. (2010), using chicken data, and González-Camacho et al. (2012), using maize data, showed that NN methods provided prediction accuracy comparable to that obtained using the RKHS method. In NN, the bases functions (adaptive “covariates”) are inferred from the data, which gives the NN great potential and flexibility for capturing complex interactions between input variables (Hastie et al. 2009). In particular, Bayesian regularized neural networks (BRNN) and radial basis function neural networks (RBFNN) have features that make them attractive for use in genomic selection (GS).

In this study, we examined the predictive ability of various linear and non-linear models, including the Bayes A and B linear regression models of Meuwissen et al. (2001); the Bayesian LASSO, as in Park and Casella (2008) and de los Campos et al. (2009); RKHS, using the “kernel averaging” strategy proposed by de los Campos et al. (2010); the RBFNN, proposed and used by González-Camacho et al. (2012); and the BRNN, as described by Neal (1996) and used in the context of GS by Gianola et al. (2011). The predictive ability of these models was compared using a cross-validation scheme applied to a wheat data set from CIMMYT’s Global Wheat Program.

Materials and Methods

Experimental data

The data set included 306 elite wheat lines, 263 lines that are candidates for the 29th Semi-Arid Wheat Screening Nursery (SAWSN), and 43 lines from the 18th Semi-Arid Wheat Yield Trial (SAWYT) from CIMMYT’s Global Wheat Program. These lines were genotyped with 1717 diversity array technology (DArT) markers generated by Triticarte Pty. Ltd. (Canberra, Australia; http://www.triticarte.com.au). Two traits were analyzed: grain yield (GY) and days to heading (DTH) (see Supporting Information, File S1).

The traits were measured in a total of 12 different environments (1–12) (Table 1): GY in environments 1–7 and DTH in environments 1–5 and 8–12 (10 in all). Different agronomic practices were used. Yield trials were planted in 2009 and 2010 using prepared beds and flat plots under controlled drought or irrigated conditions. Yield data from experiments in 2010 were replicated, whereas data from trials in 2009 were adjusted means from an alpha lattice incomplete block design with adjustment for spatial variability in the direction of rows and columns using the autoregressive model fitted in both directions.

View this table:
  • View inline
  • View popup
Table 1 Twelve environments representing combinations of diverse agronomic management (drought or full irrigation, sowing in standard, bed, or flat systems), sites in Mexico, and years for two traits, grain yield (GY) and days to heading (DTH), with their broad-sense heritability (h2) measured in 2010

Data used to train the models for GY and DTH in 2009 were the best linear unbiased estimator (BLUE) after spatial analysis, whereas the BLUE data for 2010 were obtained after performing analyses in each of the 12 environments and combined. The experimental designs in each location consisted of alpha lattice incomplete block designs of different sizes, with two replicates each.

Broad-sense heritability at individual environments was calculated as Embedded Image, where Embedded Image and Embedded Image are the genotype and error variance components, respectively, and nreps is the number of replicates. For the combined analyses across environments, broad-sense heritability was calculated as Embedded Image, where the term Embedded Image is the genotype × environment interaction variance component, and nenv is the number of environments included in the analysis.

Statistical models

One method for incorporating markers is to define Embedded Image as a parametric linear regression on marker covariates Embedded Image with form Embedded Image, such that Embedded Image ( j = 1,2,…,p markers); here, Embedded Image is the partial regression of Embedded Image on the jth marker covariate (Meuwissen et al. 2001). Extending the model to allow for an interceptEmbedded Image (1)

We adopted Gaussian assumptions for model residuals; specifically, the joint distribution of model residuals in Equation 1 was assumed normal with mean zero and variance Embedded Image. The likelihood function isEmbedded Image (2)where Embedded Image is a normal density for random variable Embedded Image centered at Embedded Image and with variance Embedded Image. Depending on how priors on the marker effects are assigned, different Bayesian linear regression models result.

Linear models: Bayesian ridge regression, Bayesian LASSO, Bayes A, and Bayes B

A standard penalized regression method is ridge regression (Hoerl and Kennard 1970); its Bayesian counterpart, Bayesian ridge regression (BRR), uses a prior density of marker effects, Embedded Image, that is, Gaussian, centered at zero and with variance common to all the markers, that is, Embedded Image, where Embedded Image is a prior-variance of marker effects. Marker effects are assumed independent and identically distributed a priori. We assigned scaled inverse chi distributions Embedded Image to the variance parameters Embedded Image and Embedded Image. The prior degrees of freedom parameters were set to Embedded Image and Embedded Image. It can be shown that the posterior mean of marker effects is the best linear unbiased predictor (BLUP) of marker effects, so Bayesian ridge regression is often referred to as RR-BLUP (de los Campos et al. 2012).

The Bayesian LASSO, Bayes A, and Bayes B relax the assumption of common prior variance to all marker effects. The relationship among these three models is as follows: Bayes B can be considered as the most general of the three, in the sense that Bayes A and Bayesian ridge regression can be viewed as special cases of Bayes B. This is because Bayes A is obtained from Bayes B by setting Embedded Image (the proportion of markers with null effects), and Bayesian ridge regression is obtained from Bayes B by setting Embedded Image and assuming that all the markers have the same variance.

Bayes B uses a mixture distribution with a mass at zero, such that the (conditional) prior distribution of marker effects is given by

Embedded Image (3)

The prior assigned to Embedded Image is the same for all markers, i.e. a scaled inverted chi squared distribution Embedded Image, where Embedded Image are the degrees of freedom and Embedded Image is a scaling parameter. Bayes B becomes Bayes A by setting π = 0.

In the case of Bayes B, we took Embedded Image Embedded Image, and Embedded Image with Embedded Image, where Embedded Image is the allele frequency for marker j and Embedded Image is the additive genetic variance explained by markers [see Habier et al. (2011) and Resende et al. (2012) for more details]. In the case of Embedded Image, we assigned a flat prior as in Wang et al. (1994).

The Bayesian LASSO assigns a double exponential (DE) distribution to all marker effects (conditionally on a regularization parameter Embedded Image), centered at zero and with marker-specific variance, that is, Embedded Image. The DE distribution does not conjugate with the Gaussian likelihood, but it can be represented as a mixture of scaled normal densities, which allows easy implementation of the model (Park and Casella 2008; de los Campos et al. 2009). The priors used were exactly the same as those used in González-Camacho et al. (2012).

The models used in this study, the Bayesian ridge regression, Bayesian LASSO (BL), Bayes A, and Bayes B, are explained in detail in several articles; for example, Bayes A and Bayes B are described in Meuwissen et al. (2001), Habier et al. (2011), and Resende et al. (2012), and an account of BL is given in de los Campos et al. (2009, 2012), Crossa et al. (2010, 2011), Perez et al. (2010), and González-Camacho et al. (2012).

Non-linear models: RBFNN, BRNN, and RKHS

In this section, we describe the basic structure of the non-linear single hidden layer feed-forward neural network (SLNN) with two of its variants, the radial basis function neural network and the Bayesian regularized neural network. We also give a brief explanation of RKHS with the averaging kernel method at the end of this section.

Single hidden layer feed-forward neural network:

In a single-layer feed-forward (SLNN), the non-linear activation functions in the hidden layer enable a NN to have universal approximation ability, giving it great potential and flexibility in terms of capturing complex patterns. The structure of the SLNN is depicted in Figure 1, which illustrates the structure of the method for a phenotypic continuous response. This NN can be thought of as a two-step regression (e.g. Hastie et al. 2009). In the first step, in the non-linear hidden layer, S data-derived basis functions (k = 1, 2,…, S neurons), Embedded Image, are inferred, and in the second step, in the linear output layer, the response is regressed on the basis functions (inferred in the hidden layer). The inner product between the input vector and the weight vector (Embedded Image) of each neuron of the hidden layer, plus a bias (intercept Embedded Image), is performed, that is, Embedded Image (j = 1,…,p markers); this is then transformed using a non-linear activation function Embedded Image. One obtains Embedded Image, where Embedded Image is an intercept and (β1[1], …, βp[1] ;…, β1[S], …, βp[S])′ is a vector of regression coefficients or “weights” of each neuron k in the hidden layer. The Embedded Image is the activation function, which maps the inputs into the real line in the closed interval [−1,1]; for example, Embedded Image is known as the tangent hyperbolic function. Finally, in the linear output layer, phenotypes are regressed on the data-derived features, Embedded Image, according to

Figure 1 
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1 

Structure of a single-layer feed-forward neural network (SLNN) adapted from González-Camacho et al. (2012). In the hidden layer, input variables Embedded Image = Embedded Image (j = 1,…,p markers) are combined for each neuron (k=1,…,S neurons) using a linear function, Embedded Image, and subsequently transformed using a non-linear activation function, yielding a set of inferred scores, Embedded Image. These scores are used in the output layer as basis functions to regress the response using the linear activation function on the data-derived predictors Embedded Image.

Embedded Image(4)

Radial basis function neural network:

The RBFNN was first proposed by Broomhead and Lowe (1988) and Poggio and Girosi (1990). Figure 2 shows the architecture of a single hidden layer RBFNN with S non-linear neurons. Each non-linear neuron in the hidden layer has a Gaussian radial basis function (RBF) defined as Embedded Image, where Embedded Image is the Euclidean norm between the input vector Embedded Image and the center vector Embedded Image and Embedded Image is the bandwidth of the Gaussian RBF. Subsequently, in the linear output layer, phenotypes are regressed on the data-derived features, Embedded Image, according to Embedded Image, where Embedded Image is a model residual.

Figure 2 
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2 

Structure of a radial basis function neural network adapted from González- Camacho et al. (2012). In the hidden layer, information from input variables Embedded Image (j = 1,…,p markers) is first summarized by means of the Euclidean distance between each of the input vectors Embedded Image with respect to S (data-inferred) (k=1,…,S neurons) centers Embedded Image, that is, Embedded Image. These distances are then transformed using the Gaussian function Embedded Image. These scores are used in the output layer as basis functions for the linear regression Embedded Image.

Estimating the parameters of the RBFNN:

The vector of weights Embedded Image of the linear output layer is obtained using the ordinary least-squares fit that minimizes the mean squared differences between the Embedded Image (from RBFNN) and the observed responses Embedded Imagein the training set, provided that the Gaussian RBFs for centers Embedded Image and Embedded Image of the hidden layer are defined. The centers are selected using an orthogonalization least-squares learning algorithm, as described by Chen et al. (1991) and implemented in Matlab 2010b. The centers are added iteratively such that each new selected center is orthogonal to the others. The selected centers maximize the decrease in the mean-squared error of the RBFNN, and the algorithm stops when the number of centers (neurons) added to the RBFNN attains a desired precision (goal error) or when the number of centers is equal to the number of input vectors, that is, when S = n. The bandwidth Embedded Image of the Gaussian RBF is defined in terms of a design parameter of the net spread, that is, Embedded Image for each Gaussian RBF of the hidden layer. To select the best RBFNN, a grid for training the net was generated, containing different values of spread and different precision values (goal error). The initial value of the spread was the median of the Euclidean distances between each pair of input vectors (Embedded Image), and an initial value of 0.02 for the goal error was considered. The parameter spread allows adjusting the form of the Gaussian RBF such that it is sufficiently large to respond to overlapping regions of the input space but not so big that it might induce the Gaussian RBF to have a similar response.

Bayesian regularized neural networks:

The difference between SLNN and BRNN is in the function to be minimized (see the penalized function below); therefore, the basic structure of a BRNN can be represented in Figure 1 as well. The SLNN described above is flexible enough to approximate any non-linear function; this great flexibility allows NN to capture complex interactions among predictor variables (Hastie et al. 2009). However, this flexibility also leads to two important issues: (1) as the number of neurons increases, the number of parameters to be estimated also increases; and (2) as the number of parameters rises, the risk of over-fitting also increases. It is common practice to use penalized methods via Bayesian methods to prevent or palliate over-fitting.

MacKay (1992, 1994) developed a framework for obtaining estimates of all the parameters in a feed-forward single neural network by using an empirical Bayes approach. Let Embedded Image(w1,…,wS; b1,…,bS ; β1[1], …, βp[1] ;…, β1[S], …, βp[S], µ)′ be the vector containing all the weights, biases, and connection strengths. The author showed that the estimation problem can be solved in two steps, followed by iteration:

  • (1) Obtain the conditional posterior modes of the elements in Embedded Image assuming that the variance components Embedded Image and Embedded Image are known and that the prior distribution for the all the elements in Embedded Image is given by Embedded Image. It is important to note that this approach assigns the same prior to all elements of Embedded Image, even though this may not always be the best thing to do. The density of the conditional (given the variance parameters) posterior distribution of the elements of Embedded Image, according to Bayes’ theorem, is given byEmbedded Image (5)

The conditional modes can be obtained by maximizing Equation 5 over Embedded Image. However, the problem is equivalent to minimizing the following penalized sum of squares [see Gianola et al. (2011) for more details]Embedded Imagewhere Embedded Image, Embedded Image, Embedded Image is the difference between observed and predicted phenotypes for the fitted model, and Embedded Image (Embedded Image) is the jth element of vector Embedded Image.

  • (2) Update Embedded Image andEmbedded Image. The updating formulas are obtained by maximizing an approximation to the marginal likelihood of the data Embedded Image (the “evidence”) given by the denominator of Equation 5.

  • (3) Iterate between (1) and (2) until convergence.

The original algorithm developed by MacKay was further improved by Foresee and Hagan (1997) and adopted by Gianola et al. (2011) in the context of genome and pedigree-enabled prediction. The algorithm is equivalent to estimation via maximum penalized likelihood estimation when “weight decay” is used, but it has the advantage of providing a way of setting the extent of “weight decay” through the variance componentEmbedded Image. Neal (1996) pointed out that the procedure of MacKay (1992, 1994) can be further generalized. For example, there is no need to approximate probabilities via Gaussian assumptions; furthermore, it is possible to estimate the entire posterior distributions of all the elements in Embedded Image, not only their (conditional) posterior modes. Next, we briefly review Neal’s approach to solving the problem; a comprehensive revision can be found in Lampinen and Vehtari (2001).

Prior distributions:
a) Variance component of the residuals:

Neal (1996) used a conjugate inverse Gamma distribution as a prior for the variance associated with the residual, Embedded Image, given in Equation 4, that is, Embedded Image, where Embedded Image and Embedded Image are the scale and degrees of freedom parameters, respectively. These parameters can be set to the default values given by Neal (1996), Embedded Image=0.05, Embedded Image=0.5. These values were also used by Lampinen and Vehtari (2001).

b) Connection strengths, weights, and biases:

Neal (1996) suggested dividing the network parameters in Embedded Image into groups and then using hierarchical models for each group of parameters; for example, connection strengths (β1[1], …, βp[1] ;…; β1[S], …, βp[S]), biases (b1,…,bS) of the hidden layer, and output weights (w1,…,wS), and general mean or bias (µ) of the linear output layer. Suppose that u1,…,uk are parameters of a given group; then assume

Embedded Image

And, at the last stage of the model, assign the prior Embedded Image. The scale parameter of the distribution associated with the group of parameters containing the connection strengths (β1[1], …, βp[1] ;…; β1[S], …, βp[S]) changes according to the number of inputs, in this case, Embedded Image with Embedded Image and p is the number of markers in the data set.

By using Markov chain Monte Carlo (MCMC) techniques through an algorithm called hybrid Monte Carlo, Neal (1996) developed a software termed flexible Bayesian modeling (FBM) capable of obtaining samples from the posterior distributions of all unknowns in a neural network (as in Figure 1).

Reproducing kernel Hilbert spaces regression:

RKHS models have been suggested as an alternative to multiple linear regression for capturing complex interaction patterns that may be difficult to account for in a linear model framework (Gianola et al. 2006). In RKHS model, the regression function takes the formEmbedded Image (6)where Embedded Image and Embedded Image are input vectors of marker genotypes in individuals i and i′; Embedded Image are regression coefficients; and Embedded Image is the reproducing kernel defined (here) with a Gaussian RBF, where Embedded Image is a bandwidth parameter and Embedded Image is the Euclidean norm between each pair of input vectors. The strategy termed “kernel averaging” for selecting optimal values of Embedded Image within a set of candidate values was implemented using the Bayesian approach described in de los Campos et al. (2010). Similarities and connections between the RKHS and the RBFNN are given in González-Camacho et al. (2012).

Assessment of the models’ predictive ability

The predictive ability of the models given above was compared using Pearson’s correlation and predictive mean-squared error (PMSE) using predicted and realized values. A total of 50 random partitions were generated for each of the data sets, and each partition randomly assigned 90% of the lines to the training set and the remaining 10% to the validation set. The partition scheme used was similar to that in Gianola et al. (2011) and González-Camacho et al. (2012).

All scripts were run in a Linux work station; for Bayesian ridge regression and Bayesian LASSO, we used the R package BLR (de los Campos and Perez 2010), whereas for RKHS, we used the R implementation described in de los Campos et al. (2010), which was kindly provided by the authors. In the case of Bayes A and Bayes B, we used a program described by Hickey and Tier (2009), which is freely available at http://sites.google.com/site/hickeyjohn/alphabayes. For the BRNN, we used the FMB software available at http://www.cs.toronto.edu/∼radford/fbm.software.html. Because the computational time required to evaluate the predictive ability of the BRNN network was great, we used the Condor high throughput computing system at the University of Wisconsin-Madison (http://research.cs.wisc.edu/condor). The RBFNN model was run using Matlab 2010b for Linux. The differences in computing times between the models were great. The computing times for evaluating the prediction ability of the 50 partitions for each trait were as follows, 10 min for RBFNN, 1.5 hr for RKHS, 3 hr for BRR, 3.5 hr for BL, 4.5 hr for Bayes B, 5.5 hr for Bayes A, and 30 days for BRNN. In the case of RKHS, BRR, BL, Bayes A, and Bayes B, inferences were based on 35,000 MCMC samples, and on 10,000 samples for BRNN. The estimated computing times were obtained using, as reference, a single Intel Xeon CPU 5330 2.4 GHz and 8 Gb of RAM memory. Significant reduction in computing time was achieved by parallelizing the tasks.

RESULTS

Data from replicated experiments in 2010 were used to calculate the broad-sense heritability for each trait in each environment (Table 1). Broad-sense heritability across locations for 2010 data were 0.67 for GY and 0.92 for DTH. These high estimates can be explained, at least in part, by the strict environmental control of trials conducted at CIMMYT’s experiment station at Ciudad Obregon. The heritability of the two traits for 2009 was not estimated because the only available phenotypic data were adjusted means for each environment.

Predictive assessment of the models

The predictive ability of the different models for GY and DTH varied among the 12 environments. The model deemed best using correlations (Table 2) tended to be the one with the smallest average PMSE (Table 3). The three non-parametric models had higher predictive correlations and smaller PMSE than the linear models for both GY and DTH. Within the linear models, the results are mixed, and all models gave similar predictions. Within the non-parametric models, RBFNN and RKHS always gave higher correlations between predicted values and realized phenotypes, and a smaller average PMSE than the BRNN. The mean of the correlations and the associated standard errors can be used to test for statistically significant improvements in the predictability of the non-linear models vs. the linear models. The t-test (with Embedded Image) showed that RKHS gave significant improvements in prediction in 13/19 cases (Table 3) compared with the BL, whereas RBFNN was significantly better than the BL in 10/19 cases. Similar results were obtained when comparing RKHS and RBFNN with Bayes A and Bayes B.

View this table:
  • View inline
  • View popup
Table 2 Average correlation (SE in parentheses) between observed and predicted values for grain yield (GY) and days to heading (DTH) in 12 environments for seven models
View this table:
  • View inline
  • View popup
Table 3 Predictive mean- squared error (PMSE) between observed and predicted values for grain yield (GY) and days to heading (DTH) in 12 environments for seven models

Correlations between observed and predicted values for DTH were lowest overall in environments 4 and 8, in Cd. Obregon, 2009, and in Toluca, 2009. Average PMSE was in agreement with the findings based on correlations. Although accuracies in environment 4 were much lower than in other environments, the higher accuracy of the non-parametric models (RKHS, RBFNN, and BRNN) over that of the linear models (BL, BRR, Bayes A, and Bayes B) was consistent with what was observed in the other environments. Figures 3 and 4 give scatter plots of the correlations obtained with the three non-parametric models vs. the BL for DTH and GY, respectively; each circle represents the estimated correlations for each of the two models included in the plot. In Figure 3, A–C, DTH had a total of 500 points (10 environments and 50 random training-testing partitions). In Figure 4, A–C, GY had a total of 350 points (7 environments and 50 random partitions in each environment). A point above the 45-degree line represents an analysis where the method whose predictive correlation is given on the vertical axis (RKHS, RBFNN, BRNN) outperformed the one whose correlation is given on the horizontal axis (BL). Both figures show that although there is a great deal of variability due to partition, for both DTH and GY, the overall superiority of RKHS and RBFNN over the linear model BL is clear. For both traits, BL had slightly better prediction accuracy than the BRNN in terms of the number of individual correlation points. It is interesting to note that some cross-validation partitions picked subsets of training data that had negative, zero, or very low correlations with the observed values in the validation set. These results indicate that lines in the training set are not necessarily related to those in the validation set.

Figure 3 
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3 

Plots of the predictive correlation for each of 50 cross-validation partitions and 10 environments for days to heading (DTH) in different combinations of models. (A) When the best non-parametric model is RKHS, this is represented by an open circle; when the best linear model is BL, this is represented by a filled circle. (B) When the best non-parametric model is RBFNN, this is represented by an open circle; when the best linear model is BL, this is represented by a filled circle. (C) When the best non-parametric model is BRNN, this is represented by an open circle; when the best linear model is BL, this is represented by a filled circle. The histograms depict the distribution of the correlations in the testing set obtained from the 50 partitions for different models. The horizontal (vertical) dashed line represents the average of the correlations for the testing set in the 50 partitions for the model shown on the Y (X) axis. The solid line represents Y = X; i.e. both models have the same prediction ability.

Figure 4 
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4 

Plot of the correlation for each of 50 cross-validation partitions and seven environments for grain yield (GY) in different combinations of models. (A) When the best model is RKHS, this is represented by an open circle; when the best model is BL, this is represented by a filled circle. (B) When best model is RBFNN, this is represented by an open circle; when the best model is BL, this is represented by a filled circle. (C) When the best model is BRNN, this is represented by an open circle; when the best model is BL, this is represented by a filled circle. The histograms depict the distribution of the correlations in the testing set obtained from the 50 partitions for different models. The horizontal (vertical) dashed line represents the average of the correlations for the testing set in the 50 partitions for the model shown on the Y (X) axis. The solid line represents Y = X; i.e. both models have the same prediction ability.

DISCUSSION AND CONCLUSIONS

Understanding the impact of epistasis on quantitative traits remains a major challenge. In wheat, several studies have reported significant epistasis for grain yield and heading or flowering time (Goldringer et al. 1997). Detailed analyses have shown that vernalization, day-length sensitivity, and earliness per se genes are mainly responsible for regulating heading time. The vernalization requirement relates to the sensitivity of the plant to cold temperatures, which causes it to accelerate spike primordial formation. Transgenic and mutant analyses, for example, have suggested a pathway involving epistatic interactions that combines environment-induced suppression and upregulation of several genes, leading to final floral transition (Shimada et al. 2009).

There is evidence that the aggregation of multiple gene × gene interactions (epistasis) with small effects into small epistatic networks is important for explaining the heritability of complex traits in genome-wide association studies (McKinney and Pajewski 2012). Epistatic networks and gene × gene interactions can also be exploited for GS via suitable statistical-genetic models that incorporate network complexities. Evidence from this study, as well as from other research involving other plant and animal species, suggests that models that are non-linear in input variables (e.g. SNPs) predict outcomes in testing sets better than standard linear regression models for genome-enabled prediction. However, it should be pointed out that better predictive ability can have several causes, one of them the ability of some non-linear models to capture epistatic effects. Furthermore, the random cross-validation scheme used in this study was not designed to specifically assess epistasis but rather to compare the models’ predictive ability.

It is interesting to compare results from different predictive machineries when applied to either maize or wheat. Differences in the prediction accuracy of non-parametric and linear models (at least for the data sets included in this and other studies) seem to be more pronounced in wheat than in maize. Although differences depend, among other factors, on the trait-environment combination and the number of markers, it is clear from González-Camacho et al. (2012) that for flowering traits (highly additive) and traits such as grain yield (additive and epistatic) in maize, the BL model performed very similarly to the RKHS and RBFNN. On the other hand, in the present study, which involves wheat, the RKHS, RBFNN, and BRNN models clearly had a markedly better predictive accuracy than BL, BRR, Bayes A, or Bayes B. This may be due to the fact that, in wheat, additive × additive epistasis plays an important role in grain yield, as found by Crossa et al. (2006) and Burgueño et al. (2007, 2011) when assessing additive, additive × additive, additive × environment, and additive × additive × environment interactions using a pedigree-based model with the relationship matrix A.

As pointed out first by Gianola et al. (2006) and subsequently by Long et al. (2010), non-parametric models do not impose strong assumptions on the phenotype-genotype relationship, and they have the potential of capturing interactions among loci. Our results with real wheat data sets agreed with previous findings in animal and plant breeding and with simulated experiments, in that a non-parametric treatment of markers may account for epistatic effects that are not captured by linear additive regression models. Using extensive maize data sets, González-Camacho et al. (2012) found that RBFNN and RKHS had some similarities and seemed to be useful for predicting quantitative traits with different complex underlying gene action under varying types of interaction in different environmental conditions. These authors suggested that it is possible to make further improvements in the accuracy of the RKHS and RBFNN models by introducing differential weights in SNPs, as shown by Long et al. (2010) for RBFs.

The training population used here was not developed specifically for this study; it was made up of a set of elite lines from the CIMMYT rain-fed spring wheat breeding program. Our results show that it is possible to achieve good predictions of line performance by combining phenotypic and genotypic data generated on elite lines. As genotyping costs decrease, breeding programs could make use of genome-enabled prediction models to predict the values of new breeding lines generated from crosses between elite lines in the training set before they reach the yield testing stage. Lines with the highest estimated breeding values could be intercrossed before being phenotyped. Such a “rapid cycling” scheme would accelerate the fixation rate of favorable alleles in elite materials and should increase the genetic gain per unit of time, as described by Heffner et al. (2009).

It is important to point out that proof-of-concept experiments are required before genome-enabled selection can be implemented successfully in plant breeding programs. It is necessary to test genomic predictions on breeding materials derived from crosses between lines of the training population. If predictions are reliable enough, an experiment using the same set of parental materials could be carried out to compare the field performance of lines coming from a genomic-assisted recurrent selection program scheme vs. lines coming from a conventional breeding scheme. The accuracies reported in this study represent prediction of wheat lines using a training set comprising lines with some degree of relatedness to lines in the validation set. When the validation and the training sets are not genetically related (unrelated families) or represent populations with different genetic structures and different linkage disequilibrium patterns, then negligible accuracies are to be expected. It seems that successful application of genomic selection in plant breeding requires some genetic relatedness between individuals in the training and validation sets, and that linkage disequilibrium information per se does not suffice (e.g. Makowsky et al. 2011).

Acknowledgments

Financial support by the Wisconsin Agriculture Experiment Station and the AVIAGEN, Ltd. (Newbridge, Scotland) to Paulino Pérez and Daniel Gianola is acknowledged. We thank the Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT) researchers who carried out the wheat trials and provided the phenotypic data analyzed in this article.

Footnotes

  • Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.112.003665/-/DC1

  • Communicating editor: J. B. Holland

  • Received July 9, 2012.
  • Accepted October 5, 2012.
  • Copyright © 2012 Pérez-Rodríguez et al.

This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Literature Cited

  1. ↵
    1. Bernardo R.,
    2. Yu J. M.
    , 2007 Prospects for genome-wide selection for quantitative traits in maize. Crop Sci. 47(3): 1082–1090.
    OpenUrlCrossRefWeb of Science
  2. ↵
    1. Broomhead D. S.,
    2. Lowe D.
    , 1988 Multivariable functional interpolation and adaptive networks. Complex Systems 2: 321–355.
    OpenUrl
  3. ↵
    1. Burgueño J.,
    2. Crossa J.,
    3. Cornelius P. L.,
    4. Trethowan R.,
    5. McLaren G.,
    6. et al.
    , 2007 Modeling additive × environment and additive × additive × environment using genetic covariances of relatives of wheat genotypes. Crop Sci. 47(1): 311–320.
    OpenUrlCrossRefWeb of Science
  4. ↵
    1. Burgueño J.,
    2. Crossa J.,
    3. Cotes J. M.,
    4. San Vicente F.,
    5. Das B.
    , 2011 Prediction assessment of linear mixed models for multienvironment trials. Crop Sci. 51(3): 944–954.
    OpenUrlCrossRef
  5. ↵
    1. Chen S.,
    2. Cowan C. F. N.,
    3. Grant P. M.
    , 1991 Orthogonal least squares learning algorithm for radial basis function networks. Neural Networks, IEEE Transactions on 2(2): 302–309.
    OpenUrlCrossRef
  6. ↵
    1. Cockram J.,
    2. Jones H.,
    3. Leigh F. J.,
    4. O’Sullivan D.,
    5. Powell W.,
    6. et al.
    , 2007 Control of flowering time in temperate cereals: genes, domestication, and sustainable productivity. J. Exp. Bot. 58(6): 1231–1244.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Conti V.,
    2. Roncallo P. F.,
    3. Beaufort V.,
    4. Cervigni G. L.,
    5. Miranda R.,
    6. et al.
    , 2011 Mapping of main and epistatic effect QTLs associated to grain protein and gluten strength using a RIL population of durum wheat. J. Appl. Genet. 52(3): 287–298.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Crossa J.,
    2. Burgueño J.,
    3. Cornelius P. L.,
    4. McLaren G.,
    5. Trethowan R.,
    6. et al.
    , 2006 Modeling genotype × environment interaction using additive genetic covariances of relatives for predicting breeding values of wheat genotypes. Crop Sci. 46(4): 1722–1733.
    OpenUrlCrossRefWeb of Science
  9. ↵
    1. Crossa J.,
    2. de los Campos G.,
    3. Perez P.,
    4. Gianola D.,
    5. Burgueño J.,
    6. et al.
    , 2010 Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2): 713–724.
    OpenUrlCrossRefPubMedWeb of Science
  10. ↵
    1. Crossa J.,
    2. Perez P.,
    3. de los Campos G.,
    4. Mahuku G.,
    5. Dreisigacker S.,
    6. et al.
    , 2011 Genomic selection and prediction in plant breeding. J. Crop Improv. 25(3): 239–261.
    OpenUrlCrossRef
  11. ↵
    1. de los Campos G.,
    2. Perez P.
    , 2010. BLR: Bayesian Linear Regression R package, version 1.2.
  12. ↵
    1. de los Campos G.,
    2. Naya H.,
    3. Gianola D.,
    4. Crossa J.,
    5. Legarra A.,
    6. et al.
    , 2009 Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182(1): 375–385.
    OpenUrlCrossRefPubMed
  13. ↵
    1. de los Campos G.,
    2. Gianola D.,
    3. Rosa G. J. M.,
    4. Weigel K. A.,
    5. Crossa J.
    , 2010 Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet. Res. 92(4): 295–308.
    OpenUrlCrossRefPubMed
  14. ↵
    1. de los Campos G.,
    2. Hickey J. M.,
    3. Pong-Wong R.,
    4. Daetwyler H. D.,
    5. Calus M. P. L.
    , 2012 Whole genome regression and prediction methods applied to plant and animal breeding. Genetics DOI: 10.1534/genetics.112.14331.
  15. ↵
    1. Foresee D.,
    2. Hagan M. T.
    , 1997. Gauss-Newton approximation to Bayesian learning. International Conference on Neural Networks, June 9–12, Houston, TX.
  16. ↵
    1. Gianola D.,
    2. van Kaam J. B. C. H. M.
    , 2008 Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178(4): 2289–2303.
    OpenUrlCrossRefPubMedWeb of Science
  17. ↵
    1. Gianola D.,
    2. Fernando R. L.,
    3. Stella A.
    , 2006 Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173(3): 1761–1776.
    OpenUrlCrossRefPubMed
  18. ↵
    1. Gianola D.,
    2. Okut H.,
    3. Weigel K. A.,
    4. Rosa G. J. M.
    , 2011 Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet. 12: 87.
    OpenUrlCrossRefPubMed
  19. ↵
    1. Goldringer I.,
    2. Brabant P.,
    3. Gallais A.
    , 1997 Estimation of additive and epistatic genetic variances for agronomic traits in a population of doubled-haploid lines of wheat. Heredity 79: 60–71.
    OpenUrlCrossRef
  20. ↵
    1. González-Camacho J. M.,
    2. de los Campos G.,
    3. Perez P.,
    4. Gianola D.,
    5. Cairns J.,
    6. et al.
    , 2012 Genome-enabled prediction of genetic values using radial basis function. Theor. Appl. Genet. 125: 759–771.
    OpenUrlCrossRefPubMed
  21. ↵
    1. Habier D.,
    2. Fernando R. L.,
    3. Kizilkaya K.,
    4. Garrik D. J.
    , 2011 Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12: 186.
    OpenUrlCrossRefPubMed
  22. ↵
    1. Hastie T.,
    2. Tibshirani R.,
    3. Friedman J.
    , 2009 The Elements of Statistical Learning: Data Mining, Inference and Prediction, Ed. 2. Springer, New York.
  23. ↵
    1. Heffner E. L.,
    2. Sorrells M. E.,
    3. Jannink J. L.
    , 2009 Genomic selection for crop improvement. Crop Sci. 49(1): 1–12.
    OpenUrlCrossRefWeb of Science
  24. ↵
    1. Heslot N.,
    2. Yang H. P.,
    3. Sorrells M. E.,
    4. Jannink J. L.
    , 2012 Genomic selection in plant breeding: a comparison of models. Crop Sci. 52(1): 146–160.
    OpenUrlCrossRef
  25. ↵
    1. Hickey J. M.,
    2. Tier B.
    , 2009 AlphaBayes (Beta): Software for Polygenic and Whole Genome Analysis. User Manual. University of New England, Armidale, Australia.
  26. ↵
    1. Hoerl A. E.,
    2. Kennard R. W.
    , 1970 Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1): 55–67.
    OpenUrlCrossRefWeb of Science
  27. ↵
    1. Holland J. B.
    , 2001 Epistasis and plant breeding. Plant Breeding Reviews 21: 27–92.
    OpenUrl
  28. ↵
    1. Holland J. B.
    , 2008 Theoretical and biological foundations of plant breeding, pp. 127–140 in Plant Breeding: The Arnel R. Hallauer International Symposium, edited by K. R. Lamkey and M. Lee. Blackwell Publishing, Ames, IA.
  29. ↵
    1. Lampinen J.,
    2. Vehtari A.
    , 2001 Bayesian approach for neural networks - review and case studies. Neural Netw. 14(3): 257–274.
    OpenUrlCrossRefPubMedWeb of Science
  30. ↵
    1. Laurie D. A.,
    2. Pratchett N.,
    3. Snape J. W.,
    4. Bezant J. H.
    , 1995 RFLP mapping of five major genes and eight quantitative trait loci controlling flowering time in a winter × spring barley (Hordeum vulgare L.) cross. Genome 38(3): 575–585.
    OpenUrlPubMed
  31. ↵
    1. Long N. Y.,
    2. Gianola D.,
    3. Rosa G. J. M.,
    4. Weigel K. A.,
    5. Kranis A.,
    6. et al.
    , 2010 Radial basis function regression methods for predicting quantitative traits using SNP markers. Genet. Res. 92(3): 209–225.
    OpenUrlCrossRefPubMed
  32. ↵
    1. MacKay D. J. C.
    , 1992 A practical Bayesian framework for backpropagation networks. Neural Comput. 4(3): 448–472.
    OpenUrlCrossRefWeb of Science
  33. ↵
    1. MacKay D. J. C.
    , 1994 Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions 100(Pt. 2): 1053–1062.
    OpenUrl
  34. ↵
    1. Makowsky R.,
    2. Pajewski N. M.,
    3. Klimentidis Y. C.,
    4. Vazquez A. I.,
    5. Duarte C. W.,
    6. et al.
    , 2011 Beyond missing heritability: prediction of complex traits. PLoS Genet. 7(4): e1002051.
    OpenUrlCrossRefPubMed
  35. ↵
    1. McKinney B. A.,
    2. Pajewski N. M.
    , 2012. Six degrees of epistasis: statistical network models for GWAS. Front. Genet. 2: 109.
    OpenUrlPubMed
  36. ↵
    1. Meuwissen T. H. E.,
    2. Hayes B. J.,
    3. Goddard M. E.
    , 2001 Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4): 1819–1829.
    OpenUrlAbstract/FREE Full Text
  37. ↵
    1. Neal R. M.
    , 1996. Bayesian Learning for Neural Networks (Lecture Notes in Statistics), Vol. 118. Springer-Verlag, NY.
  38. ↵
    1. Ober U.,
    2. Ayroles J. F.,
    3. Stone E. A.,
    4. Richards S.,
    5. Zhu D.,
    6. et al.
    , 2012 Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster. PLoS Genet. 8(5): e1002685.
    OpenUrlCrossRefPubMed
  39. ↵
    1. Okut H.,
    2. Gianola D.,
    3. Rosa G. J.,
    4. Weigel K. A.
    , 2011 Prediction of body mass index in mice using dense molecular markers and a regularized neural network. Genet. Res. Camb. 93: 189–201.
    OpenUrlCrossRef
  40. ↵
    1. Park T.,
    2. Casella G.
    , 2008 The Bayesian LASSO. J. Am. Stat. Assoc. 103: 681–686.
    OpenUrlCrossRefWeb of Science
  41. ↵
    1. Perez P.,
    2. de los Campos G.,
    3. Crossa J.,
    4. Gianola D.
    , 2010 Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3(2): 106–116.
    OpenUrlCrossRefPubMed
  42. ↵
    1. Poggio T.,
    2. Girosi F.
    , 1990 Networks for approximation and learning. Proc. IEEE 78(9): 1481–1497.
    OpenUrlCrossRef
  43. ↵
    1. Resende M. F. R.,
    2. Muñoz P.,
    3. Resende M. D. V.,
    4. Garrick D. J.,
    5. Fernando R. L.,
    6. et al.
    , 2012 Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.). Genetics 4: 1503–1510.
    OpenUrl
  44. ↵
    1. Shimada S.,
    2. Ogawa T.,
    3. Kitagawa S.
    , 2009 A genetic network of flowering-time genes in wheat leaves, in which an APETALA1/FRUITFULL-like gene, VRN-1, is upstream of FLOWERING LOCUS T. Plant J. 58: 668–681.
    OpenUrlCrossRefPubMedWeb of Science
  45. ↵
    1. Wang C. S.,
    2. Rutledge J. J.,
    3. Gianola D.
    , 1994 Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26: 91–115.
    OpenUrlCrossRefWeb of Science
  46. ↵
    1. Zhang K.,
    2. Tian J.,
    3. Zhao L.,
    4. Wang S.
    , 2008 Mapping QTLs with epistatic effects and QTL × environment interactions for plant height using a doubled haploid population in cultivated wheat. J. Genet. Genomics 35(2): 119–127.
    OpenUrlCrossRefPubMed
View Abstract
Previous ArticleNext Article
Back to top

PUBLICATION INFORMATION

Volume 2 Issue 12, December 2012

G3: Genes|Genomes|Genetics: 2 (12)

ARTICLE CLASSIFICATION

Genomic Selection
View this article with LENS
Email

Thank you for sharing this G3: Genes | Genomes | Genetics article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat
(Your Name) has forwarded a page to you from G3: Genes | Genomes | Genetics
(Your Name) thought you would be interested in this article in G3: Genes | Genomes | Genetics.
Print
Alerts
Enter your email below to set up alert notifications for new article, or to manage your existing alerts.
SIGN UP OR SIGN IN WITH YOUR EMAIL
View PDF
Share

Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

Paulino Pérez-Rodríguez, Daniel Gianola, Juan Manuel González-Camacho, José Crossa, Yann Manès and Susanne Dreisigacker
G3: Genes, Genomes, Genetics December 1, 2012 vol. 2 no. 12 1595-1605; https://doi.org/10.1534/g3.112.003665
Paulino Pérez-Rodríguez
Colegio de Postgraduados, Montecillo, Texcoco 56230, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: perpdgo@gmail.com
Daniel Gianola
Departments of Animal Sciences, Dairy Science, and Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53706
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan Manuel González-Camacho
Colegio de Postgraduados, Montecillo, Texcoco 56230, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José Crossa
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yann Manès
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susanne Dreisigacker
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation

Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

Paulino Pérez-Rodríguez, Daniel Gianola, Juan Manuel González-Camacho, José Crossa, Yann Manès and Susanne Dreisigacker
G3: Genes, Genomes, Genetics December 1, 2012 vol. 2 no. 12 1595-1605; https://doi.org/10.1534/g3.112.003665
Paulino Pérez-Rodríguez
Colegio de Postgraduados, Montecillo, Texcoco 56230, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: perpdgo@gmail.com
Daniel Gianola
Departments of Animal Sciences, Dairy Science, and Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53706
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan Manuel González-Camacho
Colegio de Postgraduados, Montecillo, Texcoco 56230, México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José Crossa
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yann Manès
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susanne Dreisigacker
Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), 06600 Mexico, D.F., México
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Related Articles

Cited By

More in this TOC Section

  • Accounting for Genotype-by-Environment Interactions and Residual Genetic Variation in Genomic Selection for Water-Soluble Carbohydrate Concentration in Wheat
  • Selection on Expected Maximum Haploid Breeding Values Can Increase Genetic Gain in Recurrent Genomic Selection
  • Genomic Predictions and Genome-Wide Association Study of Resistance Against Piscirickettsia salmonis in Coho Salmon (Oncorhynchus kisutch) Using ddRAD Sequencing
Show more Genomic Selection
  • Top
  • Article
    • Abstract
    • Materials and Methods
    • RESULTS
    • DISCUSSION AND CONCLUSIONS
    • Acknowledgments
    • Footnotes
    • Literature Cited
  • Figures & Data
  • Supplemental
  • Info & Metrics

GSA

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.

Online ISSN: 2160-1836

  • For Authors
  • For Reviewers
  • For Advertisers
  • Submit a Manuscript
  • Editorial Board
  • Press Releases

SPPA Logo

GET CONNECTED

RSS  Subscribe with RSS.

email  Subscribe via email. Sign up to receive alert notifications of new articles.

  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus

Copyright © 2018 by the Genetics Society of America

  • About G3
  • Terms of use
  • Permissions
  • Contact us
  • International access