Abstract
Genomic selection (GS) has become a tool for selecting candidates in plant and animal breeding programs. In the case of quantitative traits, it is common to assume that the distribution of the response variable can be approximated by a normal distribution. However, it is known that the selection process leads to skewed distributions. There is vast statistical literature on skewed distributions, but the skew normal distribution is of particular interest in this research. This distribution includes a third parameter that drives the skewness, so that it generalizes the normal distribution. We propose an extension of the Bayesian whole-genome regression to skew normal distribution data in the context of GS applications, where usually the number of predictors vastly exceeds the sample size. However, it can also be applied when the number of predictors is smaller than the sample size. We used a stochastic representation of a skew normal random variable, which allows the implementation of standard Markov Chain Monte Carlo (MCMC) techniques to efficiently fit the proposed model. The predictive ability and goodness of fit of the proposed model were evaluated using simulated and real data, and the results were compared to those obtained by the Bayesian Ridge Regression model. Results indicate that the proposed model has a better fit and is as good as the conventional Bayesian Ridge Regression model for prediction, based on the DIC criterion and cross-validation, respectively. A computing program coded in the R statistical package and C programming language to fit the proposed model is available as supplementary material.
- Genomic Selection
- data augmentation
- asymmetric distributions
- GBLUP
- Ridge regression
- GenPred
- Shared Data Resources
In genetic studies of plants or animals, it is common to find quantitative traits whose distribution is not normal; this is because the data are obtained from multiple sources or contain isolated observations (Li et al. 2015). Landfors et al. (2011) noted that it is often necessary to normalize the data to remove variation introduced during the experiment’s development. However, such standard normalization techniques are not always able to remove bias because a large number of observations are positively or negatively affected by some treatment. In addition, suitable transformation for the data may be difficult to find, and may bring further complications when estimating and interpreting the results obtained (Fernandes et al. 2007). To avoid transformations, different methods have been developed that are flexible enough to represent the data and reduce unrealistic assumptions (Arellano-Valle et al. 2005). In the genomic selection framework (GS; Meuwissen et al. 2001), dense molecular marker genotypes and phenotypes are used to predict the genetic values of candidates for selection. The availability of high density molecular markers of many agricultural species, together with promising results from simulations (e.g., Meuwissen et al. 2001) and empirical studies in plants (de los Campos et al. 2009; de los Campos et al. 2010; Crossa et al. 2010, 2011) and animals (e.g., VanRaden 2008; Weigel et al. 2009), are prompting the adoption of GS in several breeding programs. The parametric model for GS explains phenotypes (; i = 1,…,n) as functions of marker genotypes (
) using a linear model of the form:
, where
is the number of phenotypic records,
is the number of markers,
represents the number of copies of a bi-allelic marker (e.g., an SNP), and
is the additive effect of the reference allele at the jth marker,
. In matrix notation, the model is expressed as
, where
,
and
are vectors of phenotypes, marker effects and Gaussian zero mean errors, respectively, and
is a matrix of marker genotypes of dimensions n×p. However, when the data are not normal, the normal regression methods generate inconsistent estimates with the natural distribution of the data and, therefore, the estimation of the conditional mean given the covariates is also inconsistent (Bianco et al. 2005).
With dense molecular markers, the number of markers exceeds the number of records in the reference population (p > >n) and, therefore, penalized regression estimation methods and their Bayesian counterparts are commonly used. Penalized estimation methods produce regression parameter estimates that are often equivalent to posterior modes. The literature on GS offers a long list of Bayesian models whose main difference is the prior distributions assigned to marker effects, which leads to what is known as the Bayesian Alphabet (Gianola 2013; de los Campos et al. 2013). The above-mentioned models assume that the response follows a normal distribution. Several phenotypic traits have distributions that are skewed, for example, female flowering, male flowering, the interval between male and female flowering, categorical measurements of diseases such as ordinal scale, counting data, etc. In these cases, either a normalizing transformation for the response variable (e.g., using Box-Cox transformation) or a model that deals with skew responses may be used. Varona et al. (2008) proposed using linear mixed models with asymmetric distributions in the residuals to tackle the problem in the context of animal breeding when pedigree information is available. Nascimento et al. (2017) proposed the Regularized Quantile Regression as a way to overcome the issue of non-symmetric distributions when marker information is available.
If a population is selected based on one trait and another trait of interest
results that exceeds (does not exceed) some threshold, then the conditional distribution of
, for a fixed
leads to a distribution that is skewed (Arnold and Beaver 2000), such as the skew-normal (SN) distribution, which is of particular interest in this research. This distribution is a generalization of the normal distribution (Azzalini 1985) with a shape parameter added to adopt skewed forms. It has the advantage of being mathematically tractable and shares properties with the normal distribution; for example, the density of the SN is unimodal (Genton 2004). Varona et al. (2008) argues that the asymmetric distributions observed for the phenotypes are the result of environmental factors and that data can be modeled using non-symmetric residual distributions.
Based on the previous considerations and motivated by the fact that a great deal of traits in plant breeding have skew normal distributed, such as flowering time in most crop species, as well categorical traits such as diseases (ordinal, binomial, or counting data), in this study we propose a general Bayesian genomic regression model for skew-normal phenotypic traits with skew-normal random errors. The model uses a stochastic representation of the response variable (Arnold and Beaver 2000) in order to ease computations and it also works in the case that when n>p. It should point out, however, that the aim of the paper is not to describe and study the causes of the skew distribution but rather we assume that the skew data are given and thus the objective is to propose a robust statistical model that deals with the skew-normal distribution of the phenotypic and residuals.
The structure of this paper is as follows. In section 2, we present the statistical models and describe the latent variable model used in the regression. In section 3, we describe a simulation experiment that is performed to evaluate the predictive power of the proposed model. In section 4, we present an application with real data; section 5 includes the discussion and concluding remarks.
MATERIALS AND METHODS
Statistical models
In this section we introduce the statistical models to be used in the manuscript. We begin by giving a brief review of the skew normal model. Then we introduce the concept of data augmentation and we use this concept in order to generate a skew normal random variable. After that we introduce the “centered parameterization” in the skew normal model, regression with skew random errors. Finally, we present the pior, posterior and full conditional distributions for the regression model with skew normal residuals.
Skew-normal model:
A continuous random variable is said to follow the skew-normal law with shape parameter
, denoted by
if its density function is:
(1)where
and
denote the density and cumulative distribution functions of a standard normal random variable, respectively. The subscript
indicates the use of “direct parametrization” (Azzalini 1985). Note that the skew normal distribution reduces to the normal case when
.
The mean and variance of are given by:
.
.The coefficient of skewness of
is:
.If
is a random variable defined by
, then
is said to have a skew-normal distribution with location (
), scale (
), and shape (
) parameters, and is denoted as
. The density function of
is given by:
.It can be shown that the coefficient of skewness of
corresponds to the skewness coefficient of
.
Hidden truncation:
Let V and W be two random variables whose joint distribution is given as follows:,where
denotes a bivariate random variable with mean
and variance-covariance matrix
and
the random variable U is defined as follows:
then
, with
(Arnold and Beaver 2000; Hea-Jung 2005; Liseo and Parisi 2013). The above representation allows writing an augmented likelihood function (Hea-Jung 2005; Liseo and Parisi 2013), “as if” we had observed the latent variable
. The conditional distribution of
is
and
, which is a truncated normal random variable with location parameter 0, scale parameter 1, lower truncation bound 0 and upper truncation bound
. Therefore, the joint distribution of
and
is
, that is:
.(2)Note that the density function of
can be obtained by integrating
with respect to z; that is,
.
Estimating the parameters in the direct parametrization is troublesome, so “centered parametrization” is more appropriate for parameter estimation and interpretation (Azzalini 1985; Pewsey 2000; Azevedo et al. 2011, among others). If and
, then
and
so that
. Figure 1 shows the density function of U for several values of the shape parameter
and the corresponding values of
. The random variable:
,(3)is said to be a skew normal random variable with parameters
,
and
, where
is Pearson’s skewness coefficient given by
, and the range of
is (-0.99527, 0.99527). In this case,
,
. The usual notation is
.
Densities of the standard skew normal distribution for different values of λ and the corresponding values for ,
.
If we consider the following transformations:,(4)it can be shown, using Jacobians (Casella and Berger, 2002, Chapter 2), that the joint density of
and
is given by:
(5)where
. Note that the density function of
can be obtained by integrating
with respect to
; that is,
, with
. This representation is very convenient, because it allows us to write an augmented likelihood function (Hea-Jung 2005; Liseo and Parisi 2013), “as if” we had observed the latent value
. The density function of the skew normal random variable under “centered parametrization” is a complicated function that was given by Azevedo et al. (2011):
,where
,
,
with
,
.
Regression with skew normal random errors:
Azzalini and Capitanio (1999) and Rusell and González (2002) proposed a simple linear regression model where the error terms are independent and identically distributed as . The proposed model is:
;from the properties of the skew normal distribution, it follows that
. The model can be easily extended to include more covariates; that is:
.Azzalini and Capitanio (1999) and Rusell and González (2002) used the maximum likelihood method to estimate the parameters in the model. These ideas can be extended to the case of errors that are independent and identically distributed as
.
Bayesian regression with skew normal random errors (BSN):
Let Then, the likelihood function is given by:
.Let
and
the prior distribution of
and
a set of hyper-parameters that index the prior distributions. Then, by Bayes’ theorem, the joint posterior distribution of
is as follows:
Neither the joint posterior distribution nor the full conditional distributions of the parameters of interest have a closed form; therefore, the implementation of this model within the Bayesian framework is computing intensive. We propose using hidden truncation together with two standard MCMC techniques in Bayesian analysis: (i) Gibbs Sampling (Geman and Geman 1984) and (ii) Random Walk Metropolis Sampling to alleviate some of the computing burden.
Prior, posterior and full conditional distributions:
Consider the joint distribution of and Z given in (5). In the regression context, we set
; then the augmented likelihood function is:
(6)In order to fully specify the Bayesian model, prior distributions for the unknown parameters must be defined. Let
; for
based on the following transformation
, where
the prior for
is denoted as
, and depending on the hyper-parameters
and
, it can lead to a rich variety of shapes, just as in the case of the Beta distribution. For the intercept, the prior distribution is
; for the scale parameter, a scaled inverted chi-squared prior distribution was used, that is,
and finally
(see Sorensen and Gianola, 2002, p. 85, for details about the parametrization used in this paper). The joint prior distribution is
.(7)By combining equations 6 and 7 throught the Bayes’ theorem, the posterior distribution of
is given by:
(8)The full conditional distributions of the parameters are given in Appendix A, which are the inputs for the Gibbs and the Random Walk Metropolis Samplers. In Appendix B, some pragmatical rules to elicitate values for the hyper-parameters
,
and
, are given. The rules for setting
,
are based on those given by de los Campos et al. (2013) and Pérez and de los Campos (2014). In this paper, we set the hyper-parameters as follows:
,
,
,
, where
is the sample phenotypic variance and
. We set
in order to reduceshrinkage and because in practice it mimics a non informative but proper distribution. To sample from the full conditionals of
and
, we implemented a Random Walk Metropolis Sampler whose parameters are tuned so that the acceptation rate is about 0.23 (see Appendix A for details).
The BSN can be re-parametrized by replacing with
; if the prior distribution of marker effects is normal with mean 0 and variance
, then the prior of
is
, which leads to a G-BLUP model (see de los Campos et al. 2013, for details about G-BLUP) but with skew normal residuals, that is,
or, in matrix notation,
, which is a skew linear mixed model, a particular case of the model proposed by Arellano-Valle et al. (2005, 2007), who relaxed all normality assumptions in a standard mixed model.
Bayesian ridge regression With random normal errors (BRR):
Regression with random normal errors is a special case of the proposed model when . The model is widely used in the GS selection literature (e.g., de los Campos et al. 2013). In the GS context, the model is given by:
,where
are independent and identically distributed as
.
The prior distributions for the unknown are: ,
for the scale parameter, a scaled inverted chi-squared distribution, that is,
and finally
.
The joint prior distribution is.(9)Thus, the posterior distribution of
is
The required full conditional distributions of the parameters for implementing a Gibbs sampler can be found elsewhere (e.g., de los Campos et al. 2013). We set the hyper-parameters using the same rules as in the BSN model. The BRR model can be fitted easily using the BGLR statistical package (Pérez and de los Campos 2014).
Monte Carlo Simulation
In this section, we use simulated data using marker genotypes from a wheat dataset made publicly available by Crossa et al. (2010). The dataset includes genotypic information for 599 wheat lines which were genotyped for 1279 DArT markers coded as 0 and 1. We simulated the phenotypes using the following additive genetic model:,(9)where
, with
,
, which leads to different degrees of skewness. The intercept parameter,
was set equal to 3; 10 marker effects were sampled from a normal distribution with mean 0 and variance 0.5/10 (Pérez and de los Campos 2014), and the rest were set equal to 0, that is:
The idea here is to verify, through simulation, whether the proposed model works satisfactorily. We therefore obtained point estimates for
,
,
and
. We also fitted the Bayesian Ridge Regression model and compared the estimates of regression coefficients, predictions and estimates of genetic values of both models. Let
be the vector of posterior means for regression coefficients. Pearson’s correlation between the observed (
) and predicted values (
) is a goodness-of-fit measure; Pearson’s correlation between the “true” genetic values (
) and the predicted values (
) is a measure of how well the genetic values are estimated; finally, Pearson’s correlation between the “true” marker effects (
) and the estimated effects (
) is a measure that indicates how good a model is at uncovering marker effects (de los Campos et al. 2009). We also computed the effective number of parameters (
) and deviance information criterion (
) for the two fitted models (see Spiegelhalter et al. 2002, for more details).
The algorithm used in this simulation experiment is described briefly below.
Set
,
,
and
.
Simulate the phenotypes using equation (9).
Fit the regression model with skew normal random errors and obtain point estimates for
,
,
and
, that is,
,
,
and
. The point estimates correspond to the posterior means of the posterior distribution of the parameters of interest.
Fit the Bayesian Ridge Regression model and obtain point estimates for
,
,
, that is,
,
,
.
Compute the correlation between observed and predicted phenotypes, “true” and predicted genetic values, and “true” and estimated regression coefficients with both regression models.
Compute the effective number of parameters (
) and deviance information criterion (
) for the two fitted models.
Repeat steps 1 to 5 one hundred times and obtain the averages of correlations, intercept (
),
and
.
Application to real data
This dataset is from the Drought Tolerance Maize (DTMA) project of CIMMYT’s Global Maize Program (http://www.cimmyt.org). The dataset comes from a large study aimed at detecting chromosomal regions affecting drought tolerance. The genotypic data consist of information from 300 tropical inbred lines that were genotyped using 1,152 SNPs (Single Nucleotide Polymorphisms). The analyzed trait is Gray Leaf Spot (GLS) caused by the fungus Cercospora zeae-maydis, which was evaluated at three different sites, Kakamega (Kenya), San Pedro Lagunillas (Mexico) and Santa Catalina (Colombia) (see Supporting information). Crossa et al. (2011) analyzed a subset of these data; the response variable was transformed using Box-Cox transformation (Box and Cox 1964). Figure 2 shows density plots for GLS rating at the three sites. Kernel density was estimated using a Gaussian kernel, and the bandwidth for the kernel was estimated according to Venables and Ripley (2002). Figure 2 also shows the sample skewness index, , where
,
is the sample mean and
is the sample standard deviation (see Joanes and Gill 1998); in the three cases, the distribution is skewed to the right, so most of the distribution is concentrated around small values of the response variable.
Density plot for Gray Leaf Spot (GLS) rating (disease resistance), at each site: Kakamega (Kenya), San Pedro Lagunillas (Mexico) and Santa Catalina (Colombia).
We propose using the regression model with skew normal random errors to predict disease resistance. We fitted two models: (1) the standard Bayesian Ridge Regression, where the errors , where “NIID” stands for “normally, independent and identically distributed”; and (2) the proposed model with skew normal random errors. The Bayesian Ridge Regression was fitted using the BGLR package (Pérez and de los Campos 2014), whereas the proposed model was fitted using the algorithm described in Appendix A. The models were first fitted using the full data, and subsequently 100 random partitions with 80% of observations in the training set and 20% of observations in the testing set were generated. The two models were fitted for each of these random partitions; then the phenotypes of the individuals in the testing set were predicted and the ability of each model to make predictions was evaluated using Pearson’s correlation between observed and predicted values. Inferences for each fit were based on 100,000 samples obtained after discarding 50,000 samples that were taken as burn-in. Convergence was checked by inspecting trace plots of the parameters.
Data and program availability
The data and programs are available as File S1 which corresponds to a compressed zip folder. The zip folder also contains a description of the data and commands to read it into the R statistical software.
RESULTS
Monte Carlo Simulation
Table 1 shows point estimates for ,
,
and
for the BSN and BRR models for different values of
. It also shows an estimate of
, a regularization parameter that is widely used in Bayesian Ridge Regression. Higher values of the parameter are associated with more shrinkage; note that the estimates of
are very similar in both models, so small values of
could be associated with more precise estimates of
. It is also clear from this table that the point estimates for
and
are very close to the real values used in the simulation. The correlation between observed and predicted values and the mean squared error is quite similar for both models and there is no clear winner. Finally, the algorithm is not able to estimate precisely the parameter
for distributions that are slightly asymmetric.




Table 2 shows the effective number of parameters , the deviance information criterion (DIC), the correlation between “true” and estimated marker effects and the correlation between “true” and estimated signals. The table also shows that in general the
and the DIC (small is better) favored the BSN model. The correlation between “true” and estimated marker effects is slightly better for BSN and the difference between the two models becomes clearer as
increases. The same pattern is observed for the correlations between true and estimated genetic signals.



Application to real data
Full data:
Table 3 shows estimates of the posterior means of parameters ,
and
, as well as the effective number of parameters
and the deviance information criterion (DIC). From Table 3 it is clear that the estimation of marker effects is more precise for the BSN model than for the BRR model; the
and the DIC also favored the BSN model. The estimated
parameter also supports the assumption that the skew normal random error is correct, and that the point estimate is not around 0, except in the case of San Pedro Lagunillas.



Figure 3 shows scatterplots of the predicted GLS using the BSN and BRR models. As expected, Pearson’s correlation between both predictions is very high (higher than 0.95). That implies that even when the data are skewed, if a BRR model is fitted in order to obtain candidates for selection, we can expect to obtain about the same individuals. Two models were fitted for each site by BRR and BSN.
Scatterplot of predicted Gray Leaf Spot (GLS) obtained when fitting the BSN model and the BRR model. In the three cases considered, the Pearson’s correlation between predictions was higher than 0.95.
Cross-validation:
Figure 4 shows scatterplots for Pearson’s correlation between observed and predicted values for individuals in the testing set obtained after fitting the BSN and BRR models for the three locations. When the correlations are higher for BSN than for BRR, this is represented by a filled circle, and by an open circle otherwise. The figure also shows the number of times Pearson’s correlation is higher for the BSN than for the BRR model. From this figure, it is clear that the BSN model predicts slightly better than the BRR model. Figure 5 shows a scatterplot for the mean squared errors in the testing set for the three locations. When the MSE in BSN is smaller than the MSE in BRR, this is represented by an open circle and by a filled circle otherwise. The number of times that the MSE in BRR is greater than the MSE in BSN is also shown in the plots. From this figure, it is clear that in general, the MSE for BRR is greater than the MSE for BSN. Table 4 shows the average Pearson’s correlation and mean squared error (MSE) between observed and predicted values in the testing set. The averages and the standard deviations are very similar for both models and the differences between the models are non-significant, but the figures suggest that the BSN model predicts slightly better than the BRR model.
Plots of the predictive correlation for each of the 100 cross-validations and 3 locations. When the best model is BSN, this is represented by a filled circle, and when the best model is BRR, this is represented by an open circle. The number of times that Pearson’s correlation in BSN is better than Pearson’s correlation in BRR is also shown in the plots.
Plots of the mean squared error (MSE) in the testing set for each of the 100 cross-validations and 3 locations. When the MSE in BSN is smaller than the MSE in BRR, this is represented by an open circle and when the MSE in BRR is bigger than the MSE in BSN, this is represented by a filled circle. The number of times that the MSE in BRR is bigger than the MSE in BSN is also shown in the plots.
DISCUSSION AND CONCLUSIONS
We have proposed a Bayesian regression model for skewed responses with applications when p > >n in the GS context, but it can also be employed in other cases and, of course, when p < n. In addition, to generalize linear whole genome regression models for various discrete distributions (ordinal, binomial, etc.), this study further completes the Bayesian toolbox for whole genome regression. The proposed model uses a stochastic representation of a skew normal random variable in order to facilitate the computations; it also allows using standard MCMC techniques to fit the proposed model. Results of the simulation and of applications with real data suggest that the proposed model fits the data better and also predicts slightly better than the standard Ridge Regression model. The Ridge Regression model is a particular case of our model when . On the other hand, our results also suggest that BRR is a very robust model, although in the simulations data we already knew that it was the wrong model to fit; still, the predictive power of the model was very good. Although the conventional Bayesian whole-genome regression is robust, it does not correctly deal with skew phenotypic data, and this can decrease its genomic-enabled prediction accuracy and its goodness of fit to the data. Thus, the advantages of the proposed Bayesian whole-genome regression compensate its complexity and possible increases in computational time as compared to the conventional Bayesian ridge regression. The model proposed in this study is conceptually and operationally different, and presumably simpler than the skew-normal linear mixed model of Arellano-Valle et al. (2005) that uses a multivariate skew-normal distribution in order to relax normality.
Despite the fact that skewness is a major concern for breeding data analyses and may often be a result of uneven sampling of “high” and “low” performing individuals, selection, environmental effects, etc., the theoretical developments presented in this study are also applicable to many other areas of research in agronomy and in agriculture in general. For example, most crop flowering time data are indeed skewed, as well as categorical data representing different types of diseases as those presented in this research. So, skewness in phenotypical response can be the result of an artificial phenomena, the aim of this study was to propose a statistical model that will be more appropriate to deal with that problem.
Results of this study can be compared to results of two other studies, Crossa et al. (2011) and González-Camacho et al. (2012). Crossa et al. (2011) included one site in Mexico (San Pedro Lagunillas) that was also analyzed by transforming the original GLS ordinal scale using Box-Cox transformation; the prediction accuracy of different models (e.g., Bayesian Lasso and Reproducing Kernel Hilbert Spaces) ranged from 0.416 to 0.462. Although strict comparison with the results obtained in this study is not possible because other random cross-validations were generated, the prediction accuracies of BSN (0.5489) and BRR (0.5450) models were higher than those previously reported by Crossa et al. (2011) for the same site.
Stochastic representation can be used to extend Reproducing Kernel Hilbert Space (de los Campos et al. 2010) models that in many empirical studies have led to more accurate predictions than Bayesian Ridge Regression models and Bayesian LASSO, among others (e.g., Pérez-Rodríguez et al. 2012), so this is a topic for future research. Further studies to extend the model proposed in this study to include genotype × environment interaction should not be complicated. The proposed model can also be extended by assigning different prior distributions to the marker effects, for example, to induce variable selection. This could potentially lead to a new Bayesian alphabet with skew normal random errors.
Acknowledgments
We thank the researchers of the Global Maize Program of the International Maize and Wheat Improvement Center (CIMMYT) who carried out the maize trials and provided the phenotypic and genotypic data analyzed in this article.
Authors’ contributions: PPR, RAP, SPE, CVC, JSE and JC conceived the idea and designed the study; RAP, JC and PPR developed the statistical models; RAP wrote the R scripts; RAP and PPR carried out the statistical analysis; RAP, JC and PPR wrote the first version of the manuscript; JC, SPE, CVC, and JSE contributed critical comments and to the writing of the final manuscript.
APPENDIX A: FULL CONDITIONAL DISTRIBUTIONS
The full conditional distributions of the unknown parameters in the Bayesian Genomic Regression Model with Skew Normal Random Errors are given below. The derivation uses results from Bayesian linear models (e.g., Sorensen and Gianola, 2002), as well as results from Rusell and González (2002) and Liseo and Parisi (2013).
The joint posterior distribution is as follows:where
and
,
.
1. Intercept 

where . Note that the right-hand side of the last expression is the kernel of a normal distribution with mean
and variance
with
.
2. Regression coefficients 
Here we obtain the conditional distribution of each of the elements of the vector , i.e.,
,
.
where
,
is the i-th row of
without the j-th column and
is the
vector without the j-th element. After some algebraic manipulations, it can be shown that the right-hand side of the above expression corresponds to the kernel of a normal distribution with mean
and variance
, with
,
.
3. Latent variables 

where . After simplifying some terms, the right-hand side of the above equation can be written as:
.The above expression is the kernel of a truncated normal distribution with location parameter
and scale parameter
, lower truncation bound 0 and upper truncation bound
. In this work, we used the R library truncnorm (Trautmann et al., 2014) to sample from this distribution.
4. Residual variance (
)

Note that is a very complex function of
and its kernel does not correspond to any known univariate density function, so we have to sample this parameter by Metropolis algorithm or another MCMC technique. Following Hea-Jung (2005), we considered the Random Walk Metropolis Algorithm with a de-constraint transformation to sample
. Given the fact that
let
so that
has support in
. Note that the density function of
can be obtained by using the transformation method (Casella and Berger, 2002, Chapter 2), which is given by:
.In the Random Walk Metropolis Algorithm, we generated
by choosing a proposal transition kernel to add noise to the current state. Assuming that the actual value of
is
, and that we wanted to update its value so that in the next iteration we had
, we followed steps a-c below.
a. Sample ,
, where
.
b. Sample ,
.
c. If then
; otherwise
.
Once has been obtained, compute
. The parameter
can be modified so that we have an optimal acceptation rate (Roberts et al., 1997).
5. Variance of markers (
)

6. Correlation coefficient (
)

Note that is a very complex function of
and its kernel does not correspond to any known univariate density function, so we must obtain samples using the Metropolis algorithm or another MCMC technique. Here we propose using Fisher’s (1915) transformation of
defined as
, so that
has support in
. Note that the density function of
can be obtained using the transformation method (Casella and Berger, 2002, Chapter 2), which is given by
.In the Random Walk Metropolis Algorithm, we generated
by choosing a proposal transition kernel to add noise to the current state. Assuming that the actual value of
is
and that we wanted to update its value so that in the next iteration we had
, we followed steps d-f below.
d. Sample ,
, where
.
e. Sample ,
.
f. If , then
; otherwise
.
Once has been obtained, compute
. The parameter
can be modified so that we have an optimal acceptation rate (Roberts et al., 1997).
The samples from the posterior distribution can be obtained using the Gibbs Sampler (Geman and Geman, 1984) and the Random Walk Metropolis algorithm. In the algorithm, we sampled each of the fully conditional distributions until we obtained a sample of the desired size. We implemented the algorithm in the R Statistical package (R Core Team, 2016). In order to speed up computations, the routines that sample from ,
were implemented in C programming language (Kernighan and Ritchie, 1988), a shared library was generated and then the compiled routines were used in R. The R script and the C source code are available upon request from the first author.
APPENDIX B: SETTING THE HYPER-PARAMETERS FOR THE PRIOR DISTRIBUTIONS
The hyper-parameters can be set using a set of default rules similar to those used in the BGLR software (de los Campos et al., 2013; Pérez and de los Campos 2014). With these rules we assigned proper but weakly informative prior distributions so that we partitioned the total variance of the phenotypes into two components: (1) the error and (2) the linear predictor, that is:

where . A priori, the total genetic variance is
and the a priori average genetic variance is:
(B.2)where
. So from (B.1) and (B.2), the partition of the phenotypic variance is given by:
(B.3)where
.
Setting the hyper-parameters for 
In the parametrization used in this work, and
. We set
=5 so that the prior distribution has a finite mean. From equation (B.2),
,(B.4)thus, if we replace the left-hand side of (B.4) with the mode of
then:
.(B.5)From (B.5) and solving for
, we obtain
. From the definition of heritability,
, so that
; then:
(B.6)Once we set
we can set
, and we only need to compute the phenotypic variance
,
and set
as the proportion of the variance that is explained a priori by the markers. By default we set
.
Setting the hyper-parameters for 
In the parametrization used in this work, and
,
we set and
.
Setting the hyper-parameters for 
We set so that the prior assigned to the intercept is effectively a flat one.
Setting the hyper-parameters for 
The density function of the prior assigned to is:
Figure B.1 shows the density function of
for different values of the parameters
. Setting different values for those hyper-parameters could lead to a rich variety of shapes, just as in the Beta distribution. Note that if we set
, then we obtain a
prior that corresponds to the one used in this work.
Density function of a Beta type random variable with support in the interval for different values of parameters
.
Footnotes
Communicating editor: J.-L. Jannink
Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.117.300406/-/DC1.
- Received November 2, 2017.
- Accepted March 23, 2018.
- Copyright © 2018 Pérez-Rodríguez et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.