## Abstract

Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous *vs.* heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit.

- whole-genome prediction
- heteroskedastic errors
- genomic breeding values
- hierarchical Bayesian model
- genPred
- shared data resource

Use of whole-genome prediction (WGP) models to predict individual genetic merit in complex traits is being increasingly utilized in modern animal, plant and human genetics. By incorporating genotypic information from single-nucleotide polymorphism (SNP) markers, WGP models can enhance accuracies on genetic merit prediction compared to the use of pedigree information alone (Meuwissen *et al.* 2001; de los Campos *et al.* 2013a). Currently popular WGP models include ridge-regression best linear unbiased prediction (RR-BLUP), BayesA and BayesB, all proposed by Meuwissen *et al.* (2001), and subsequently modified or extended to a wide array of models collectively dubbed as Bayesian alphabet models (Habier *et al.* 2011; Gianola 2013). Typically, these Bayesian models specify either heavy-tailed distributions (*i.e.*, BayesA), variable selection (BayesCπ), or both (BayesB) on the distribution of the SNP effects.

An often underappreciated, though pervasive, assumption underlying classical WGP models across the “Bayesian alphabet” (Gianola *et al.* 2009) is that of homogeneous residual variances, often referred to as residual homoskedasticity. Yet, heterogeneity of residual variances across environments, or residual heteroskedasticity, is a well-documented phenomenon in livestock production systems (Cardoso *et al.* 2005, 2007; Kizilkaya and Tempelman 2005; Bello *et al.* 2012; Cernicchiaro *et al.* 2013), thereby raising concern about the implications of the residual homoskedasticity assumption almost universally assumed in current WGP models (Gianola and Rosa 2015). Indeed, the use of heteroskedastic models for genetic evaluations dates back to work by Foulley and Gianola (1996), who first modeled the logarithm of residual variances as a linear function of fixed effects. SanCristobal-Gaudy *et al.* (2001) presented further extensions to incorporate random effects, including correlated genetic effects, followed by a unifying framework for the structural modeling of heterogeneous variances proposed by Kizilkaya and Tempelman (2005). With the advent of genomic selection, Yang *et al.* (2011) was among the first to propose a WGP model with heterogeneous residual variances for livestock populations, though only a genetic component was specified on both mean and residual variance.

Environmentally-driven heteroskedasticity has been shown to have practical implications for the prediction of genetic merit. Hill (1984) demonstrated that proportionally more individuals were likely to be selected from more variable groups if substantial heteroskedasticity was ignored using homoskedastic error models, especially if selection pressure was intense. Early attempts to remedy this problem included the preadjustment of phenotypes, *e.g.*, by centering and scaling (Hill 1984). More modern approaches include explicit specification and modeling of sources of heteroskedasticity. For instance, Kizilkaya and Tempelman (2005) showed improved precision in estimated sire genetic merit for a birth weight trait when residual variance was specified as a function of sex and herds, finding that estimates of residual variances differed by as much as 20 times across herds.

The objectives of this study were 1) to extend classical parametric WGP models, specifically RR-BLUP, BayesA, BayesB and BayesCπ, to explicitly account for residual heteroskedasticity, and 2) to assess potential gains in prediction accuracy by explicit modeling of residual variances as a function of various environmental or management factors in simulated and actual livestock performance data.

We first introduce extensions to classical WGP models to accommodate heteroskedasticity, including a delineation of the criteria used for model performance. We then describe and present comparisons based on a simulation study and also an application to carcass traits data from pigs.

## Materials and Methods

### Classical WGP models

The classical base WGP model expresses the phenotype *y _{i}* of an individual

*i*(

*i*

*=*1, 2,…,

*n)*as a linear regression function of SNP marker effects, as follows:(1)where

**β**is a vector of unknown fixed effects connected to the phenotypes via a known design vector ; is the unknown random effect of SNP marker

*j*

*=*1, 2,…,

*p*connected to via a known genotype coded as 0, 1 or 2 to represent the dosages of the minor allele, and is the residual for animal

*i*. Most current WGP models assume .

Differences between WGP models RR-BLUP, BayesA, BayesB and BayesCπ are based upon the specification of the distribution of . For RR-BLUP, , whereas for BayesA , *i.e.*, independently and identically distributed scaled Student-*t* with common degrees of freedom and scale parameter . This prior specification on BayesA is marginally equivalent to such that with (de los Campos *et al.* 2009; Yang and Tempelman 2012). BayesB further extends BayesA and specifies the distribution of as a mixture of with probability and a point mass at 0 with probability (Meuwissen *et al.* 2001). BayesCπ is a particular special case of BayesB with such that the nonzero component of the mixture is (Habier *et al.* 2011).

### Heteroskedastic extension of WGP models

Following Kizilkaya and Tempelman (2005), we extend WGP models to flexibly model the residual variance as a multiplicative function of both systematic and nonsystematic environmental components, thereby explicitly accounting for heteroskedasticity. Expressed in the natural logarithmic scale, this is equivalent to writing the following ,(2)where is the residual variance corresponding to the environmental or management circumstances for individual *i*; **τ** is a vector of unknown fixed effect parameters connected to via a known design vector ; and similarly, **v** is a vector of unknown random effects connected to via a known design vector . Random effects on the residual variance may include environmental effects (*i.e.*, contemporary groups), genetic effects or both. *A priori*, elements of can be assumed independently distributed as inverted gamma , >2 such that and . As a result, variation across can be characterized by defining a coefficient of variation . So specified, the magnitude of the heteroskedasticity across levels of the random effect factor **v** diminishes (*i.e.*, ) with larger values of (*i.e.*, ) (Kizilkaya and Tempelman 2005).

### Prior specifications

We specify a flat prior on **β** such that constant. Here, in the homoskedastic error model, as well as elements of **τ** in Equation (2), were specified with noninformative priors (Gelman 2006). The hyperparameter was assigned the vague, though proper, prior density , which is commonly used for strictly positive parameters (Kizilkaya and Tempelman 2005). As previously shown by Albert (1988), this prior defines a uniform prior density *U*(0,1) on the transformed variable . Then, by change of variables, where *f* denotes the probability density function.

For RR-BLUP and BayesCπ, we specify whereas for BayesA and BayesB, we specify with and . Finally, for BayesB and BayesCπ, π is assigned a Beta(10,1) prior to reflect a relatively weak assumption that most markers have null effects for any given trait.

### Simulation study

We compared the performance of classical WGP models, namely RR-BLUP, BayesA, BayesB and BayesCπ, to that of their heteroskedastic error counterparts using a simulation study.

Ten data set replicates were each generated from base populations of 150 unrelated individuals subjected to random mating for 6000 generations. Population size was kept constant until generation 6000, after which it was expanded 10 times to 1500 individuals. The genome was composed of three chromosomes, each of length 1 Morgan, and each containing a total of 10,000 equally-spaced monomorphic loci. The number of crossover events per meiosis was simulated from a Poisson distribution with mean 1 and the location of crossover was assumed uniformly distributed in a chromosome. The mutation rate for all loci was specified to be per locus per generation and to be recurrent so as to ensure biallelic loci.

In Generation 6001, loci with minor allele frequency (MAF) < 0.1 or loci that failed to meet Hardy-Weinberg equilibrium based on an exact test (Wigginton *et al.* 2005) at a significant level of 0.0001 were discarded. For each dataset replicate, 60 loci were randomly selected to serve as quantitative trait loci (QTL), and an additional 3000 different loci were randomly selected to serve as SNP markers. For each of the 60 QTL, an allelic substitution effect (*k* = 1, 2,…, 60) was drawn from a (0,0.005), *i.e.*, a Student-*t* distribution with 0 mean and a scale of 0.005 based on five degrees of freedom. Our choice of a heavy-tailed distribution for the QTL effects is consistent with current notions of the genetic architecture of quantitative traits in livestock population, by which traits seem to be controlled by many genes of small effect and few of large effects (Hayes and Goddard 2001; Goddard *et al.* 2009). The total additive genetic variance was constructed from the weighted sum of genetic variances across the QTL effects, namely , where is the MAF at QTL *k*. The true breeding value (TBV) for an individual *i* was obtained as the aggregated allelic substitution effects over the selected 60 QTL loci, each weighted by its corresponding allelic dosage , such that . Trait heritability was set at .

Within each data replicate, we considered five different simulation scenarios reflecting various degrees of residual heteroskedasticity. That is, the replicated datasets described in the previous paragraph were used as blocking factors to compare scenarios across a heteroskedastic error gradient. Simulation scenarios included the case of homoskedastic residuals whereby and for all *l* =1, 2,…, 50 levels of a random effects factor, such that . In this study, specification of represents the homoskedastic error scenario, as . In turn, other scenarios were defined by increasing levels of residual heteroskedasticity; *i.e.*, = 50, 12, 5, and 3, such that the standard deviations of the relative variances () across these random effects were , respectively. In addition, all heteroskedastic error scenarios (*i.e.*, = 50, 12, 5, and 3) further incorporated systematic sources of heterogeneity whereby and , where is a “fixed” reference residual variance. For data generation, the residual was sampled from where was obtained as a function of **τ** and **v**, as described in Equation (2). The phenotypic observation for individual *i* was generated as , with μ = 3 set as a common mean for all observations. Observations from Generation 6001 were used as a training set to fit the competing WGP models and to estimate SNP effects. For each simulated dataset, individuals from Generation 6001 were randomly mated to produce Generation 6002 consisting of additional 1500 animals. Genotypes and TBV from individuals in Generation 6002 were generated to be used for validation in the simulation study. The average level of linkage disequilibrium (LD) between adjacent markers in the simulation study ranged between 0.23 to 0.25 across all replicated datasets, to represent current livestock populations (Meuwissen *et al.* 2001; Calus *et al.* 2008; Hayes *et al.* 2009; Yang *et al.* 2011).

Each replicated dataset was fitted using homoskedastic and heteroskedastic error versions of the selected WGP models, namely RR-BLUP, BayesA, BayesB and BayesCπ. Programming code needed to implement these models is available in Supporting Information, File S1. Across models, Markov Chain Monte Carlo (MCMC) was implemented with burn-in lengths of 10,000 to 35,000, followed by subsequent saving of the next 140,000 to 480,000 cycles, depending on the WGP models and diagnostics as described subsequently.

### Application to MSU swine resource population data

Data corresponding to a three-generation Duroc × Pietrain swine resource population developed at Michigan State University (MSU) was used in this study. A detailed description of the dataset is available in Edwards *et al.* (2008a, 2008b). Briefly, a total of 19 F_{0}, 55 F_{1} and 928 F_{2} pigs were included in the pedigree. All F_{0} and F_{1} animals as well as 336 F_{2} animals were genotyped using the commercial Illumina PorcineSNP60 beadchip (GeneSeek a Neogen Co., Lincoln, NE) panel with a total of 62,163 SNP markers (Gualdron Duarte *et al.* 2014). Markers with more than 10% missing data, unknown physical positions, or with MAF < 0.01 were removed from further analyses. Quality control procedures followed those described in Badke *et al.* (2012). Genotypes for the remaining 592 F_{2} animals were obtained using a low-density panel of 9K tagSNP set referred to as the GeneSeek Genomic Profiler for Porcine LD (GGP-Porcine LD, GeneSeek a Neogen Company) consisting of a subset of the PorcineSNP60 panel. The F_{2} animals genotyped with the 9K low density panel were imputed to 60K with imputation accuracy of approximately 0.99, as previously described (Gualdron Duarte *et al.* 2013). From the 60K SNP, a subset of 6210 markers was selected for this study. The selected SNP subset matched the panel of 10K tagSNP previously described by Badke *et al.* (2014). Phenotypes corresponding to 29 growth traits and 25 carcass and meat quality traits were obtained for F_{2} animals, as described by Edwards *et al.* (2008a, 2008b). Traits were subjected to preliminary screening for heterogeneous residual variances using standard linear mixed models and approximately 80% of the traits showed some degree of residual heteroskedasticity. Two traits, namely carcass temperature at 45 min postmortem, and loin muscle pH at 45 min postmortem, were selected for further consideration based on potentially high and mild levels of heteroskedasticity, respectively, and were thus subjected to follow-up WGP analysis (see next section). Phenotypes for 921 and 908 F_{2} individuals were available for 45 min postmortem carcass temperature and for loin muscle pH, respectively. Phenotypes of the selected traits, genotypes and pedigree of the available animals were contained in the Supporting Information, File S2.

Each of the two selected traits were fitted using RR-BLUP, BayesA, BayesB and BayesCπ WGP models in both their homoskedastic and heteroskedastic error specifications. For both traits, **β** included the fixed effect of sex and a regression coefficient on carcass weight. The general WGP model in Equation (1) was further expanded to incorporate clustering effects of slaughter dates as well as polygenic effects , where **A** is a known pedigree-based additive relationship matrix. Therefore, in our data application, the genomic expected breeding value (GEBV) for individual *i* (*i* = 1,2,…,*n*) was defined as . We modeled heterogeneous residual variances as presented in Equation (2), with **τ** and specifying the fixed effects of sex and the random clustering effects of slaughter dates, respectively. Thus, the hyperparameter in reflects the magnitude of residual heteroskedasticity in the responses of interest due to slaughter dates clusters.

Prior specifications were similar to those described for the simulation study, with the following exceptions due to problems with parameter identifiability. For BayesCπ, the prior hyperparameter was set at = 3 for both traits to maximize prior uncertainty while retaining a defined mean (*i.e.*, > 2). Instead, the prior scale for BayesCπ was specified as for carcass temperature and for loin muscle pH, based on the posterior medians of obtained in BayesA. For BayesB, the hyperparameter was assumed known and set at for carcass temperature and for loin muscle pH, whereby these values were obtained based on the posterior median of under BayesCπ with a homoskedastic error assumption. Sensitivity analyses were conducted to assess the influence of specifying on posterior inference of interest. Also due to parameter identifiability issues, the variance of the polygenic effects was first estimated from traditional (*i.e.*, non-WGP) animal models that either assumed residual homo- or heteroskedasticity. These estimates of under homo- and heteroskedastic assumptions were then specified as known constants when fitting homo- and heteroskedastic WGP models, respectively. Homoskedastic and heteroskedastic error specifications of the selected WGP models were fitted to each trait. In every case, a total of 20 parallel MCMC chains were run, each consisting of 12,000 to 27,000 burn-in cycles followed by 6,000 to 14,000 saved cycles. Post burn-in samples from the 20 parallel MCMC chains run on a given model can be considered samples from the joint posterior density of interest, and were thus combined for inference. Initial values of hyperparameters in each parallel chain were dispersed by an arbitrary small value while constraining them to fall within their allowable parameter space (Gelman and Rubin 1992). Posterior inference on parameters of interest was summarized for the overall dataset.

### Model comparison

For each of the WGP models considered, namely RR-BLUP, BayesA, BayesB and BayesCπ, the performance of the homoskedastic *vs.* its heteroskedastic error model counterparts was compared in both simulated data and real data using various criteria for model fit and for prediction, as follows.

First, we compared quality of global model fit using pseudo-Bayes factor (PBF) (Gelfand 1996), defined as the ratio of the conditional likelihood function under each heteroskedastic error WGP model over its homoskedastic counterpart, expressed in logarithmic scale of base 10, as follows:(3)where the abbreviation HT and HO hereafter refer to the candidate heteroskedastic and homoskedastic models, respectively. Moreover, L(⋅) denotes the likelihood function of observation conditional on all remaining observations fitted with the corresponding WGP model. This conditional likelihood, also known as the conditional predictive ordinate (Gelfand 1996) for observation *i*, can be approximated by , where *B* is the number of post burn-in MCMC iterations; represents the posterior sample for model parameters **θ** after *b* iterations post burn-in (*b =*1,2,…,*B*). A positive value of indicates support for the heteroskedastic error model based on enhanced fit to the data relative to its homoskedastic error WGP counterpart, thereby indicating evidence for heterogeneity of residual variances.

We further compared predictive performance of breeding values between competing homoskedastic and heteroskedastic error alternatives of each WGP model. For simulated data, we assess genomic prediction accuracy using the Pearson correlation between TBV in the simulated validation set and corresponding estimates , whereby were obtained by fitting the WGP model to the simulated training set. Within each WGP model, we compared homoskedastic *vs.* heteroskedastic error specifications across the various scenarios using a multifactorial ANOVA on genomic prediction accuracy, with the simulated replicated dataset as a random blocking factor.

For the real data application, predictive performance was assessed using a five-fold cross-validation (Daetwyler *et al.* 2013), whereby animals within each slaughter dates cluster were randomly assigned to five nonoverlapping data partitions or folds of nearly equal size (175–191 animals). Each one of the five data folds was assigned to be a validation set exactly once. When a data fold was selected as a validation set, phenotypes of this particular fold were excluded from estimation of marker effects. Instead, phenotypes of the validation fold were predicted using estimates of SNP markers, polygenic and nongenetic effects obtained from fitting a model to the remaining data folds, referred to here at training folds. This procedure was repeated until each of the five data folds had served as a validation set once. Consequently, every phenotyped animal was excluded from estimation of marker effects once, in which case their phenotypes were predicted using estimates obtained from animals in corresponding training folds (Meuwissen *et al.* 2013). We defined cross-validation predictive ability as the Pearson correlation coefficient between observed phenotypes in the validation fold, and the corresponding predicted phenotypes from parameter estimates obtained from the training folds. That is, , where and are the observed and predicted phenotypes, respectively, for animal *i* in the validation fold. The predicted phenotypes included estimated marker effects (weighted by their allelic frequencies) and estimated polygenic effects, as well as the estimated fixed effects of sex and carcass weight, and the random blocking effects of slaughter dates.

We also characterized potential practical implications of heteroskedasticity in the context of breeding decisions based on WGP. More specifically, we computed the Spearman’s rank correlation coefficient between GEBV from homoskedastic *vs.* heteroskedastic error specifications for the top and bottom 10% ranked individuals. Relative ranking of top and bottom 10% individuals was assessed by fitting a linear mixed model to the estimated Spearman rank correlations obtained from data replicates, and testing for differences between simulation scenarios. For real data, rank correlations of GEBV for top and bottom 10% individuals in the validation sets were compared between homoskedastic and heteroskedastic WGP models.

### MCMC diagnostics

Convergence diagnostics were implemented using the R package CODA (Plummer *et al.* 2006). We monitored convergence using trace plots. Diagnostic tests by Raftery and Lewis (1992) and by Heidelberger and Welch (1983) were conducted on the simulation study. For the data application, the Gelman and Rubin’s diagnostic on multiple MCMC chains produced a shrinkage factor < 1.2 (Colosimo and Del Castillo 2007). We also determined effective sample size (ESS) for key hyperparameters (Kass *et al.* 1998). In each case, the number of MCMC cycles was adjusted to ensure that the ESS was greater than 100 for all hyperparameters.

## Results

### Simulation study

For each of the WGP models considered in this study, Figure 1 shows comparisons of global fit, expressed as , between homoskedastic and heteroskedastic model specifications for scenarios reflecting a gradient of increasing residual heteroskedasticity. Recall that positive values of the indicate support for the heteroskedastic, as opposed to the homoskedastic error version of the corresponding WGP model. When residual heteroskedasticity was high (= 3 or 5), was estimated to be between 12.8 and 77.3 across MC replicates fitted with any of the WGP models. This supports a strong advantage in global fit for heteroskedastic, rather than homoskedastic, error specifications, regardless of the specific WGP model. As the amount of residual heteroskedasticity decreased (= 12), so did the values of and thus the relative advantage of the heteroskedastic error WGP model over its homoskedastic error counterpart. Under scenarios of low heteroskedasticity (= 50), or of homogeneous residual variance (→ ∞), the values of under RR-BLUP and BayesA were closer to zero, thus indicating no apparent advantage of heteroskedastic WGP models over their homoskedastic error counterparts; in turn, BayesB and BayesCπ showed greater uncertainty in these conditions. Overall, we note that, when the amount of residual heteroskedasticity in the data was high (= 3 or 5), consistently selected the appropriate heteroskedastic error specification for all WGP models; however, as mentioned above, the discriminatory capability of to detect random sources of residual heteroskedasticity was partially, though incrementally, impaired when heteroskedasticity was moderate (= 12) or low (= 50).

Table 1 presents a summary of the posterior inference on the hyperparameter defining the degree of heterogeneity of residual variance across levels of the random effect factor for 10 MC replicates under each of the simulation scenarios fitted with the heteroskedastic WGP models. Coverage probabilities for , defined as proportion MC replicates for which the 95% highest posterior density (HPD) included the true parameter value, under WGP models RR-BLUP and BayesA were mutually identical at 100%, 90%, 100%,100% when true = 3, 5, 12, 50, respectively. In turn, coverage probabilities were 100%, 90%, 70%, 20% for BayesB and 100%, 90%, 80%, 20% for BayesCπ, when true = 3, 5, 12, 50, respectively. As might be expected, inferential precision on was maximized when heteroskedasticity was high, as indicated by the smallest difference between minimum and maximum values of the lower and upper boundaries of the 95% HPD on (Table 1). The reverse was also true, as inference on was most uncertain when heteroskedasticity was not present. That is, enhanced model fit of a heteroskedastic WGP model relative to its homoskedastic counterpart under conditions of heterogeneous variances may be explained by increased precision of inference on the hyperparameter, and vice versa. These results on posterior inference of the heteroskedasticity parameter are consistent with those presented for overall goodness of fit in Figure 1.

To validate inferential performance on the fixed effect parameters **τ** specified on the residual variance, we considered the posterior density of the ratio of over . Coverage probability of the 95% HPD for the true value of the parameter ratio under heteroskedastic WGP models was 92% for both BayesB and BayesCπ, and 94% for both RR-BLUP and BayesA across simulation scenarios. In all cases, the observed coverage was within probabilistic expectation.

Estimated genomic prediction accuracies (and corresponding standard errors) of heteroskedastic and homoskedastic error versions of WGP models are shown in Figure 2. For all RR-BLUP, BayesA, BayesB and BayesCπ models, the heteroskedastic specification showed a gain on genomic prediction accuracy relative to the homoskedastic WGP counterpart whenever the amount of residual heteroskedasticity in the data were high (*i.e.*, = 3 or 5, *P* < 0.001 in all cases). However, no evidence for any predictive advantage of heteroskedastic WGP specifications was apparent if the data had been generated under conditions of low or null heteroskedasticity (*i.e.*, = 50 or ∞; *P* > 0.30 in all cases). For situations of moderate heteroskedasticity (*i.e.*, = 12) fitted with RR-BLUP or BayesA WGP models, the heteroskedastic error specification yielded greater (*P* < 0.05) genomic predictive accuracy than its homoskedastic counterpart but this difference was not apparent using variable selection models like BayesB or BayesCπ. Despite the significant increase in genomic prediction accuracy by heteroskedastic WGP models when the data were highly heteroskedastic, we note that the gain in accuracy relative to the homoskedastic specification was of a relatively small magnitude (*i.e.*, range from 0.009 to 0.018 for = 3 and from 0.005 to 0.008 for = 5 across MC replicated data sets).

To further characterize potential practical implications of heteroskedastic *vs.* homoskedastic WGP models in the context of breeding programs, we explored differences in the ranking of individuals of extreme genetic merit. We computed the Spearman correlation of the top 10% individuals whose GEBV had been estimated from homoskedastic and heteroskedastic WGP models across the simulated gradient of residual heteroskedasticity (Figure 3). Results on the bottom 10% ranked individuals showed a similar pattern and are thus not discussed further. For homoskedastic scenarios (→ ∞) or scenarios of low heteroskedasticity (= 50), the Spearman rank correlation between homoskedastic- and heteroskedastic-based GEBVs from RR-BLUP, BayesA, BayesB or BayesCπ WGP models for the top 10% animals was close to 1, thus indicating minor concerns for selection purposes. However, as the amount of heteroskedasticity increased, the Spearman correlation between heteroskedastic-based GEBV and homoskedastic-assuming GEBV for the top 10% individuals decreased to an estimated value of 0.85. This result suggests nonnegligible reranking of top individuals for selection purposes. Given response to selection, this finding could potentially have direct implications for breeding programs despite the small magnitude of the overall gain on genomic prediction accuracy described before.

### MSU swine resource population data

For carcass temperature at 45 min postmortem, the variance of polygenic effects were estimated at 0.022 and 0.036 for homoskedastic and heteroskedastic error specifications, respectively. For loin muscle pH at 45 min, the corresponding estimates of were 0.006 and 0.005, respectively.

We first assessed evidence for residual heteroskedasticity on the traits carcass temperature and loin muscle pH 45 min postmortem selected for this study from the MSU resource population. Table 2 summarizes the posterior inference for in the heteroskedastic specification of WGP models on the complete dataset. Recall that the hyperparameter defines the magnitude of non-systematic heterogeneity of residual variances for each of the selected traits as a function of slaughter dates clusters. For carcass temperature at 45 min postmortem, the posterior mean of was smaller than two in all cases. Similarly, the magnitude of the upper boundaries of the 95% HPD on did not exceed three under any of the heteroskedastic WGP models considered, thereby providing evidence for strong cluster-based heterogeneity of residual variances for this trait. In fact, the range of posterior means of slaughter date-specific for carcass temperature at 45 min postmortem was greater than eight-fold, thereby indicating that the residual variance for some slaughter dates clusters was estimated to be as large as eight times greater than the residual variance in other slaughter clusters. In turn, for loin muscle pH at 45 min postmortem, the posterior mean of was below six, and the corresponding upper boundaries of its 95% HPD was approximately 10 in all cases, thus indicating milder cluster-driven residual heteroskedasticity for this trait, whereby the range of posterior means of was close to three-fold across all 33 slaughter dates clusters. These results are consistent with our findings during preliminary data screening.

We also explored the effect of sex as an additional source of heteroskedasticity on the selected traits, as represented by parameter **τ** in Equation (2). Based on the set-to-zero parameterization implemented in this study, the parameter **τ** may be interpreted as a ratio of female-to-male residual variances, whereby a ratio of one indicates homogeneous residual variances for both sexes. For carcass temperature at 45 min postmortem, the 95% HPD of **τ** fitted with any of the WGP models ranged from a lower boundary of 0.7 to an upper boundary of 1.2. For loin muscle pH, the corresponding 95% HPD range was 0.9 to 1.5. It then follows no evidence for sex-based heteroskedasticity of either trait regardless of WGP model.

Next, we consider relative global fit of homoskedastic *vs.* heteroskedastic error WGP models to the actual data using PBF (Gelfand 1996), and use a threshold value of to conclude upon a decisive difference in fit between models (Kass and Raftery 1995). For carcass temperature at 45 min postmortem, the range of across the five cross-validation folds was [45.7, 81.3] for RR-BLUP, [46.9, 81.0] for BayesA, [44.3, 77.5] for BayesB, and [43.8, 76.9] for BayesCπ WGP models. In turn, for loin muscle pH at 45 min postmortem, the range of was [9.7, 14.4], [9.8, 14.6], [8.8, 13.7] and [8.6, 13.8]. These results favor the use of heteroskedastic WGP error models for both traits, and under all WGP specifications considered here. The larger magnitude of , and hence greater evidence of residual heteroskedasticity for carcass temperature relative to loin muscle pH, is consistent both with our preliminary screening and with our posterior inference on as described previously.

We also assessed predictive performance of homoskedastic and heteroskedastic WGP models. We first conducted a sensitivity analysis to evaluate the stability of cross-validation predictive ability using BayesCπ under different choices of as , and for carcass temperature and , and for loin muscle pH trait. Similarly, sensitivity assessments were also conducted for BayesB looking at choices of as , and for carcass temperature and , and for loin muscle pH trait. As expected, changes in the specification of were compensated with changes in the estimates of the proportion of SNP markers with nonzero effects (Yang *et al.* 2015). In turn, the estimated median cross-validation prediction accuracies for carcass temperature, and its corresponding standard deviation, across cross-validation folds at any choice of were 0.86 ± 0.05 for homoskedastic error BayesCπ or BayesB and 0.86 ± 0.04 for heteroskedastic error BayesCπ or BayesB. For loin muscle pH, cross-validation prediction accuracy based on homoskedastic error BayesCπ or BayesB ranged from 0.31 ± 0.06 to 0.29 ± 0.07 across prior specifications of . For heteroskedastic error BayesCπ or BayesB, cross-validation predictive ability ranged from 0.32 ± 0.07 to 0.29 ± 0.07 across values of . Overall, sensitivity analyses assessment indicated little reason to be concerned about specification of hyperparameters for the purpose of prediction accuracy.

Figure 4 depicts estimated cross-validation predictive abilities across five folds for both carcass temperature and loin muscle pH at 45 min postmortem. Across WGP models, cross-validation predictive abilities for carcass temperature and loin muscle pH 45 min postmortem were estimated to be approximately 0.85 and 0.32, respectively. For neither trait did we find any evidence for differences in cross-validation predictive ability between homoskedastic *vs.* heteroskedastic specifications of any of the WGP models considered (*P* > 0.25 in all cases for either trait).

Often in animal breeding, greater interest is directed toward animals that exhibit extreme GEBV, as they are the ones likely to be selected as parents for the next generation. Table 3 reports Spearman’s rank correlation coefficient between homoskedastic-error *vs.* heteroskedastic-error GEBV for animals of extreme genetic merit. For both traits, and regardless of WGP model, we observed considerable reranking of the top and bottom 10% individuals, particularly as the degree of residual variance heterogeneity in the data increased. In fact, for loin muscle pH at 45 min postmortem, the corresponding estimated median rank correlations ranged from 0.52 to 0.70 (top), and from 0.64 to 0.70 (bottom) across WGP models; in turn, for the more heteroskedastic trait (*i.e.*, carcass temperature at 45 min postmortem), the median rank correlation of GEBV for top and bottom 10% animals ranged from 0.05 to 0.38 (top) and from 0.43 to 0.54 (bottom) across WGP models. Such variability in rankings of GEBV may be partially due to the relatively small sample size (*i.e.*, only 10% animals within a cross-validation fold) used to estimate the rank correlation coefficient. This is further supported by the considerable variability observed among cross-validation folds in the reranking of individuals based on homoskedastic-based GEBV relative to their heteroskedastic counterpart, though this variability was particularly noticeable for carcass temperature. Again, this may be partially explained by the relatively larger magnitude of residual heteroskedasticity detected for this trait. We further note that reranking of extreme GEBV using homoskedastic *vs.* heteroskedastic errors seemed to be particularly extreme when the BayesB WGP specification was implemented.

For further illustration, we selected a cross-validation fold and depicted a scatterplot of homoskedastic-error *vs.* heteroskedastic-error GEBV for carcass temperature (Figure 5A) and for loin muscle pH (Figure 5B) under each WGP model. For both traits, individuals that showed extreme genetic merit under homoskedastic error assumptions had their GEBV considerably attenuated when residual heteroskedasticity was accounted for (*e.g.*, Figure 5A, BayesB). In fact, an individual with extremely high GEBV inferred under a homoskedastic error model may not be considered as a viable selection candidate if its GEBV was estimated from a heteroskedastic error WGP model. This was indeed the case for the two individuals with top homoskedastic-based GEBV in the complete dataset. It is interesting to note that these top two individuals derived from one slaughter date cluster that had the largest posterior mean for the relative residual variance (data not shown). Conversely, candidate individuals with top or bottom genetic merit may be overlooked by using conventional homoskedastic error models (*e.g.*, Figure 5B, BayesB).

Taken together, the observed reranking could have practical implications from a selection point of view. In support of this observation, we note that the mean of genomic breeding values in the top 10% individuals was between 1.2× (based on BayesA) to 10× (based on BayesCπ) greater in magnitude when estimated based on the heteroskedastic WGP model relative to the homoskedastic specification for either trait. Similar results were observed for the bottom 10% individuals with extreme genetic merit based on the hetero- *vs.* homoskedastic WGP specification. It should be acknowledged, though, that these comparisons are conditioned on the models used to predict mean genomic breeding values.

## Discussion

In this study, we extend classical WGP models to account for potential heterogeneous residual variances across environments, and further assess whether explicit accounting for such heteroskedasticity may impact accuracy of prediction of genomic breeding values.

Environmental residual heteroskedasticity is a rather common phenomenon across agricultural environments in livestock production. For instance, residual variance estimates for birth weight in an Italian Piemontese cattle population differed by approximately 10-fold across herds (Kizilkaya and Tempelman 2005), and that of average daily gains in feedlot cattle from the US Midwest differed by more than 12-fold across contemporary groups (Cernicchiaro *et al.* 2013). Backfat thickness in pigs was shown to display considerable residual heteroskedasticity based on an animal model, whereby residual variances ranged by approximately eight-fold across herds (See 1998). Similarly, in this study, we found evidence for considerable environmentally-driven heterogeneity of residual variances in other swine carcass traits, as indicated by the small magnitude of posterior means of for carcass temperature and loin muscle pH. These results indicate considerable departure from the residual homoskedasticity assumption commonly invoked by standard WGP models.

Gianola and Rosa (2015) adverted that modeling heterogeneous residual variances across environments was likely to be important for reliable genomic selection, as further supported by our results. Unaccounted-for heteroskedastic errors can potentially impact breeding decisions as animals from the most diverse environments might then be disproportionally selected (Hill 1984). Indeed, previous studies have shown nonnegligible reranking of top and bottom 10% progenitors when heterogeneity of residual variances across environment or management groups is properly modeled (Cardoso *et al.* 2005; Kizilkaya and Tempelman 2005). In fact, incorrectly assuming homogeneous residual variances could cause a substantial reduction in selection efficiency, particularly under conditions of low heritability (Garrick and VanVleck 1987). Heritabilities of most technological quality traits of meat in swine, such as the ones evaluated in this study, has been reported to range from low to moderate, as the average value for many studies fall into the range 0.10–0.30 (reviewed by Sellier and Monin 1994 and Ciobanu *et al.* 2011).

Based on this evidence, we extended homoskedastic WGP models to allow for environmental heterogeneity of residual variances, and evaluated the relative performance of heteroskedastic and homoskedastic error WGP models in terms of both global fit and predictive performance. Our simulation study showed considerable improvements in global model fit when extreme residual heteroskedasticity was properly accounted for, though the advantage of heteroskedastic error WGP models seemed to dissipate quickly for even moderate amounts of environmental heterogeneity in residual variances, particularly under WGP models without variable selection (*i.e.*, RR-BLUP and BayesA). Furthermore, the observed advantage of heteroskedastic error WGP models in global fit translated into very small (∼1–2%), albeit significant, gains in genomic prediction accuracy under conditions of extreme data heteroskedasticity. As the amount of residual heteroskedasticity decreased, so did the power to detect differences in genomic prediction accuracy between heteroskedastic and homoskedastic model specification. This was particularly noticeable for WGP models with variable selection (*i.e.*, BayesB or BayesCπ) relative to those without variable selection (*i.e.*, RR-BLUP and BayesA). For the specific data application used in this study corresponding to the MSU swine resource population, there was no evidence of any gains in cross-validation predictive ability for selected carcass traits when heterogeneous residual variances across environments were explicitly modeled. This finding was somewhat unexpected given the extreme level of environmental heterogeneity of residual variances observed in at least one of those traits. The high level of environmental heteroskedasticity in the carcass temperature trait was recognized both by posterior inference on the hyperparameter and by improved global model fit of the heteroskedastic error model relative to the homoskedastic error model. Yet, it is possible that additional gains in prediction accuracy from specifying heteroskedasticity, either of environmental or genetic origin, may be difficult to observe due to the already large magnitude of “baseline” cross-validation predictive ability for this trait (∼0.85) based on standard homoskedastic error WGP models.

It is unclear whether a genetic component might have contributed to the high level of residual heteroskedasticity observed in the carcass temperature trait. A recent study by Yang *et al.* (2011) explored the use of parametric genomic models that specify genetic control of environmental variance in a swine production system. In particular, classical WGP models were extended to assess putative marker effects not only on GEBV but also on environmental variability. Consistent with our results, that study indicated enhanced fit of heteroskedastic error models. However, the gains in accuracy of prediction were either of small magnitude in simulated data or not at all apparent when applied to back fat thickness data in pigs, as was also observed in our application. Additional statistical methods for detecting genetic loci affecting phenotypic variability were recently introduced; proposed approaches range from fully-parametric (Rönnegård and Valdar 2011; Yang *et al.* 2011) through classical nonparametric (Paré *et al.* 2010; Struchalin *et al.* 2010), including a two-stage semi-parametric approximation (Hill and Mulder 2010). Following a thorough review, Rönnegård and Valdar (2012) highlighted the inferential importance of simultaneous estimation of effects on mean and variance as a strength of parametric methods for modeling variance-controlling QTL.

Ideally, heterogeneous genetic and residual variances should be modeled simultaneously, for example with multiple breed studies (de Roos *et al.* 2009) or genotype by environment interaction (Edwards and Jannink 2006; Jarquin *et al.* 2014; Lopez Cruz *et al.* 2015) studies, both of which require the specification of heterogeneous genetic variances. A study by Chen *et al.* (2014) explored multi-population genomic prediction for milk production traits on two dairy breeds using a multi-task Bayesian learning model. When error variances and marker effects variances were explicitly specified as heterogeneous across breeds (as opposed to assumed homogeneous), gains in prediction accuracy across traits ranged from 0.04 to 0.14 using a 50K SNP panel, and from 0.02 to 0.11. Also, a yield variety trial of 40 oat genotypes across 34 environments reported “stabilized” genetic predictions (*i.e.*, higher repeatability) when genetic and environmental sources of heterogeneity were explicitly specified on residual and genotype-by-environment variance components (Edwards and Jannink 2006).

It is often the case that deregressed expected breeding values are weighted and used as response variables in WGP models (Garrick *et al.* 2009) instead of actual phenotypes. In a way, modeling of weighted deregressed expected breeding values may be considered an approach to account of heterogeneous variance, in this case the variance of the expected breeding value. However, this approach may be considered *ad hoc* as its effectiveness depends on several factors such as the number of repeated measured observations, size of training data and the reliability of the breeding values (Garrick *et al.* 2009; Ostersen *et al.* 2011; Boddhireddy *et al.* 2014).

Despite the observed lack of any appreciable gain in overall accuracy of prediction of carcass trait phenotypes by heteroskedastic WGP models, the differential ranking of animals with the 10% most extreme genetic merit suggest important practical implications for the assumption of homogeneous residual variance. Substantial reranking was apparent for these candidate animals depending on whether environmental residual heteroskedasticity was explicitly accounted for in obtaining their breeding values. In fact, noticeable differences in the mean of genomic breeding values for individuals of extreme genetic merit were apparent based on heteroskedastic *vs.* homoskedastic WGP specifications. To reconcile these results, we notice that the magnitude of the difference in GEBV between most extreme individuals and the rest seems to be rather small, and thus difficult to detect, particularly given the relatively narrow range of GEBV observed for the selected traits in this population. In turn, a localized performance metric, such as the Spearman correlation coefficient among the top and bottom 10% individuals, could detect regional patterns that may not be necessarily apparent from overall performance metrics, such as cross-validation predictive ability. However, we acknowledge that, given the low-to-moderate heritabilities of the traits evaluated in this study, it is not possible to discard nonnegligible sampling variability on the estimates of Spearman correlation coefficients or other statistical reasons related to unstable behavior of correlations within extreme tails.

The very low magnitude of the posterior mean of observed in the swine data application indicates that residual variability is not homogeneous across environmental subclasses. However, extreme within-cluster residuals, for example, due to preferential treatment, may be a reasonable concern even after accounting for residual heteroskedasticity. Biased prediction of breeding values is a problem often encountered under conditions of preferential treatment (Kuhn and Freeman 1995). Our observation of substantial reranking of extreme breeding values suggests that heteroskedastic WGP models may, at least partially, offset prediction bias due to preferential treatment. Yet, preferential treatment is a concern that further motivates the need to extend heteroskedastic error WGP models to allow for outlier robustness, as advocated by Gianola and Rosa (2015). In a nongenomic application, Cardoso *et al.* (2007) extended the univariate *t*-models proposed by Stranden and Gianola (1999) to attenuate adverse effects of preferential treatment and specified Student-*t* distributed residual heteroskedasticity across environments to potentially accommodate a more robust analysis capable of muting the influence of extreme observations on inferences of cluster specific residual variances. Given the breeding objective of ranking candidates for selection, a heavy-tailed residual distribution combined with explicit modeling of environmental heteroskedasticity is likely to yield more robust genomic predictions in the sense of reducing influence from outlying identifiable clusters and extreme individual datapoints.

Most recently, WGP models have been used to predict complex human traits, such as risk of disease and life expectancy (de los Campos *et al.* 2012; Vazquez *et al.* 2012). One can surmise that environmentally-driven heteroskedasticity is likely present in this context as well, though the full extent of it remains unclear. Predictive performance of WGP models for human traits can be low, mostly due to factors unique to human populations, such as unrelatedness of individuals and short LD patterns (de los Campos *et al.* 2013b). Extending WGP models for complex human traits to explicitly model heterogeneous residual variances across environments could potentially help account for still-unexplained variance, and thus affect the extent of missing heritability in human populations.

Finally, WGP models often require estimation of a large number of SNP marker effects and considerations for computing efficiency become paramount, particularly since MCMC inference can be computationally expensive. Computational enhancements for homoskedastic WGP models have been developed based on expectation-maximization (EM) based algorithms (Hayashi and Iwata 2010), or analytically derived posterior densities of each marker effect (Meuwissen *et al.* 2009). Heteroskedastic error extensions to WGP models, such as those presented here, could also be further modified to enhance computational efficiency. For example, an EM-like algorithm such as that proposed by Gianola *et al.* (1992) may be adapted to obtain empirical Bayes estimates of environment-specific variances in a WGP context.

### Conclusions

In this study, we describe extensions to classical whole-genome prediction models that incorporate modeling of heterogeneous residual variances across environments and evaluate potential impact of specifying heteroskedastic *vs.* homoskedastic error models on the accuracy of prediction of genomic breeding values. Heteroskedastic error models were overwhelmingly supported by improved global fit to the data. The advantages of heteroskedastic error WGP models on overall predictive ability of carcass traits in pigs was of small magnitude, if at all present; however, considerable reranking of individuals with extreme genetic merit was observed when heteroskedasticity was explicitly accounted for. Heteroskedastic error WGP modeling should be carefully considered in breeding programs as environmental residual heteroskedasticity seems prevalent and, if unaccounted, can have considerable practical implications for selection of individuals of extreme genetic merit. Additional work tackling simultaneous modeling of heterogeneous genetic variances jointly with heterogeneous error variances and potential outlier robustness extensions is warranted.

## Acknowledgments

Wenzhao Yang and the MSU Department of Animal Science are gratefully acknowledged for facilitating partial R code and for preparing the data used in this study. This project was supported by Agriculture and Food Research Initiative Competitive Grant no. 2011-67015-30338 from the U.S. Department of Agriculture National Institute of Food and Agriculture. Computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CNS-1006860, EPS-1006860, and EPS-0919443.

## Footnotes

Supporting information is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.115.022897/-/DC1

*Communicating editor: D. J. de Koning*

- Received May 27, 2015.
- Accepted October 14, 2015.

- Copyright © 2016 Ou
*et al.*

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.