Abstract
Piscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming, and current treatments have been ineffective for the control of this disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping of hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and to identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. A total of 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) were experimentally challenged against P. salmonis and their genotypes were assayed using ddRAD sequencing. A total of 9,389 SNPs markers were identified in the population. These markers were used to test genomic selection models and compare different GWAS methodologies for resistance measured as day of death (DD) and binary survival (BIN). Genomic selection models showed higher accuracies than the traditional pedigree-based best linear unbiased prediction (PBLUP) method, for both DD and BIN. The models showed an improvement of up to 95% and 155% respectively over PBLUP. One SNP related with B-cell development was identified as a potential functional candidate associated with resistance to P. salmonis defined as DD.
- selective breeding
- genotyping by sequencing
- Oncorhynchus kisutch
- disease resistance
- GWAS
- Genomic Selection
- GenPred
- Shared Data Resources
Chile is the largest producer of coho salmon (Oncorhynchus kisutch) globally, reaching about 160,000 tons in 2014, representing more than 90% of total production (FAO 2016). However, the success and sustainability of this industry is constantly threatened by infectious diseases, including Salmon Rickettsial Syndrome (SRS). This disease is caused by Piscirickettsia salmonis, a gram-negative and facultative intracellular bacteria, which was isolated for the first time in Chile in coho salmon (Cvitanich et al. 1991). Data from the Chilean National Fisheries and Aquaculture Service (Sernapesca) indicates that during the first half of 2016, 53% of the moralities ascribed to infectious diseases in coho salmon were associated with SRS (Sernapesca 2016). To date, control measures and treatments for SRS are based on antibiotics and vaccines. However, both strategies have not had the expected effectiveness under field conditions (Rozas and Enríquez 2014).
Because of this, it is necessary to develop alternative strategies for the control of this disease (Yáñez et al. 2014a). In this regard, breeding for enhanced disease resistance is a feasible and sustainable option to improve animal health, welfare and productivity (Stear et al. 2001). A primary requisite for including disease resistance into a breeding program is the presence of significant additive genetic variation for the trait (Falconer and Mackay 1996). Commonly, data to evaluate resistance comes from experimental challenges carried out using siblings of the selection candidates (Ødegård et al. 2011; Yáñez et al. 2014a). Quantitative studies have estimated significant genetic variation for resistance against different pathogens in salmonid species (Ødegård et al. 2011; Yáñez et al. 2014a). For instance, low to moderate heritabilities for resistance against P. salmonis in Atlantic salmon (Salmo salar) (h2 = 0.11 to 0.41) (Yáñez et al. 2013; Yáñez et al. 2014b) and coho salmon (h2 = 0.16) (Yáñez et al. 2016a) have been estimated.
Marker assisted selection (MAS) can improve production traits in cases where the phenotypes are difficult to measure in the selected candidates (e.g., disease resistance traits) and the total additive genetic variance explained by genetic markers is high (Hayes and Goddard 2010). This methodology has been successfully applied for the improvement of resistance against the Infectious Pancreatic Necrosis Virus (IPNV) in Atlantic salmon, which is controlled by a major quantitative trait locus (QTL) (Houston et al. 2012; Moen et al. 2015). In the case of polygenic traits, genomic selection (GS) (Meuwissen et al. 2001) can significantly improve selection accuracy of breeding values compared to traditional selection, and therefore enhance the response of selection for disease resistance in salmonid species (Tsai et al. 2016; Vallejo et al. 2016, 2017a; Bangera et al. 2017; Correa et al. 2017; Yoshida et al. 2017).
Genotyping by sequencing (GBS) is an alternative for genotyping in cases when SNP panels are not available. This approach reduces the complexity of the genome, and can be used to identify thousands of markers without prior marker discovery efforts or a reference genome. Currently, several approaches of GBS have been developed, significantly reducing the cost and labor (Baird et al. 2008; Elshire et al. 2011; Peterson et al. 2012). These methodologies have been widely used in salmonid species, to generate dense linkage maps (Brieuc et al. 2014; Gonen et al. 2014), perform association studies to identify genomic regions involved in the resistance against pathogens (Campbell et al. 2014; Liu et al. 2015; Palti et al. 2015b) and generate SNPs resources (Houston et al. 2012).
Double-digest restriction-site associated DNA (ddRAD) reduces DNA complexity by digesting DNA with two restriction enzymes (REs) simultaneously, without random shearing (Peterson et al. 2012). This approach has been widely used in genetic studies in aquaculture species (reviewed in Robledo et al. 2017).
In the present study, we used ddRAD sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and identify molecular markers associated with the trait. Furthermore, GS models were used to evaluate the potential to accelerate the genetic improvement of resistance against P. salmonis in this coho salmon population by means of using genome-wide molecular information.
Materials and Methods
Coho salmon breeding population
The coho salmon population used in the present study came from a unique 2012 year-class population. This population belongs to a genetic improvement program that was established in 1997 and is owned by Pesquera Antares and managed by Aquainnovo (Puerto Montt, Chile). Further details about this breeding population, in terms of reproductive management, rearing conditions, fish tagging and breeding objectives are described by Yáñez et al. (2014c; 2016a) and Dufflocq et al. (2016).
Experimental challenge
The experimental challenge against P. salmonis was performed as it is described in detail by Correa et al. (2015a) and Yáñez et al. (2016a). Briefly, 2,606 individuals, belonging to 107 maternal full-sib families and 60 paternal half-sib families, were challenged against P. salmonis. Prior to the experiment, each fish was tagged with a passive integrated transponder (PIT-tag), placed in the abdominal cavity for genealogy traceability during the challenge test. The P. salmonis challenge was performed at Aquainnovo´s research station, located in Lenca River, X Region, Chile.
For the lethal dose 50 (LD50) calculation, a random sample of 80 fish were selected from the population. Four different dilutions from the P. salmonis inoculum were evaluated (1/10, 1/100, 1/1000 and 1/10000). Twenty fish were challenged at each dilution. The dilutions were intraperitoneally (IP) injected with a volume of 0.2ml/fish. Daily mortality was recorded. This preliminary test spanned 26 days and a dilution of 1:680 was estimated as the LD50.
For the main challenge, fish were distributed into three tanks (7m3) with a salt water concentration of 31 ppt. An average of eight individuals (ranging from 1 to 18) from each of the 107 families were distributed into each tank. The experimental challenge was performed through an intraperitoneal (IP) injection with 0.2ml/fish of the LD50 inoculum. The average weight of the fish at the inoculation was 279g (SD = 138g). To ensure that these fish were free from other pathogens, qRT-PCR was previously performed in order to control for the presence of Infectious Salmon Anemia Virus (ISAV), IPNV and Flavobacterium spp.
The P. salmonis challenge continued for up to 50 days post IP injection. Throughout the challenge, environmental parameters (pH, temperature, salinity and oxygen) were measured and controlled. Fish were removed from the tanks after death, and a sample of the anterior kidney was taken and stored at -80° in RNALater. A necropsy assay was performed in conjunction with qRT-PCR to confirm the cause of death and the presence of P. salmonis. This was also done to control for the presence of other pathogens, such as Vibrio ordalii, Renibacterium salmoninarum and IPNV.
ddRAD library preparation and sequencing
Ten ddRAD libraries were produced by multiplexing 828 individuals following the protocol described by Peterson et al. (2012). For this, 64 parents (males and females) and 764 offspring representing the 17 most resistant and 16 most susceptible families were selected. An average of 23 (ranging from 11 to 43 individuals) offspring per family were chosen. Briefly, total DNA was extracted using the commercial kit Wizard SV Genomic DNA purification System (Promega) according to the manufacturer’s protocol. Between 80 and 200 ng of DNA, from each individual was digested with two restriction enzymes (New England Biolabs, UK; NEB); 10 U of SbfI (specific for the CCTGCA|GG motif)) and MseI (specific for the T|TAA motif) in a 12 µl reaction volume, including 1 µl of SbfI and MseI adapter (8.3 pM), for 90 min at 37°. The ligation reaction was carried out by adding 1 µl of T4 ligase (NEB) diluted 1:100 in T4 buffer and incubating for 150 min at 37° and subsequently at 16° overnight.
Each ligation mix was diluted with 189 µl of dilute TE buffer (1:10). Kodak DNA Polymerase (ABM), a high-fidelity polymerase, was used to amplify DNA fragments with the correct adapters. PCR reactions (20 µl) were prepared containing 10 µl of PCR mix 2x, 1 µl of primer mix (10 µM each), 6 µl of diluted ligation mix and 3 µl of nuclease-free water. Each sample was PCR amplified using the following conditions: 95° for 2 min, followed by 17 cycles of 95° for 20s, 66° for 30s and 68° for 40s. After PCR, amplicon quality was checked by loading 5 µl on a 2% agarose gel. Subsequently, samples were pooled, so that the final concentration was similar among them within each library. Each library was concentrated through an evaporation step for 80 min in a Centrivap Mobile Console Centrifugal Evaporator (Labconco). This step was conducted until 300 µl of the generated library was obtained. Final volume of each library was loaded on a 1% agarose gel. Size of the bands selected for sequencing ranged from 750 and 1,500 bp and between 1,800 and 2,500 bp. DNA was purified through the QIAquick gel extraction kit (Qiagen) following manufacturer’s instructions. Finally, libraries were sequenced on an Illumina Hiseq2500 platform, using 150 base single-end.
SNP identification
Raw sequence reads obtained from Illumina sequencing were analyzed using STACKS v. 1.41 (Catchen et al. 2011, 2013). This software was specifically developed to analyze short-read data generated through next generation sequencing (NGS) (Davey et al. 2013).
Sample reads were trimmed to 134 bp for all subsequent analyses, demultiplexed and filtered using process_radtags. Rad-tags which passed the quality filter were aligned to the Oncorhynchus kisutch reference genome (GenBank: MPKV00000000.1) using BWA v. 0.7.12 (Li and Durbin 2009). The reference genome was indexed (using the index function) and alignments were performed using the mem algorithm; all parameters were set as default. Loci were then built using pstacks with a minimum depth of coverage of three to build a locus (-m 3).
A catalog of loci was constructed using the cstacks program using only the parents’ loci from pstacks. To build the catalog, the maximum number of mismatches allowed between sample tags was set to three (-n 3), and the matching was based on genomic location (g). After catalog construction, the sstacks program was used in order to match rad-tags against the catalog based again on genomic location (g), followed by the populations software, using defaults parameters. Loci were considered as valid if they were present in at least 75% of the individuals of the population. As a quality control step, the following parameters were used to filter low-confidence SNPs: Minor Allele Frequency (MAF) ≤ 0.05, Hardy-Weinberg Equilibrium (HWE) P < 1x10-6 and genotyping call rate < 0.80. Individuals were removed from the analyses if their genotyping call rates were below 0.70.
Trait definitions
Resistance against P. salmonis was defined as the day of death (DD) with values ranging from 1 to 50 depending on the time of death. Additionally, resistance was also evaluated as a binary (BIN) trait, either dead or alive at the end of the challenge. Values for this trait were 1 in cases where the fish died during the challenge, or 0 if the fish survived until the end of the challenge. Initial Body Weight (IW) for each fish, was measured prior to the IP injection.
Pedigree-based BLUP
All challenged individuals (n = 2,606) were used for the pedigree-based approach, PBLUP, as a control for the performance evaluation of genomic predictions. A linear univariate animal model was used to estimate variance components and predict Estimated Breeding Values (EBVs) for DD, while for BIN a univariate threshold animal model was fitted (Table 1). The model used was as follows: In the previous equation,
is a vector of phenotypes (BIN or DD),
is a vector of fixed effects (sex and tank as factors, and initial weight as covariate),
is a vector of random additive polygenic genetic effects that follows a normal distribution ∼N(0,A
),
and
are incidence matrices,
is the additive relationship matrix, and
is the random residual (Lynch and Walsh 1998). Both models were fitted using the BLUPF90 set of programs (Misztal et al. 2016) by means of the AIREMLF90 and THRGIBBS1F90 modules to analyze DD and BIN, respectively. The MCMC Gibbs sampling scheme set for running THRGIBBS1F90, included a total of 200,000 iterations. The first 20,000 were discarded as burn-in iterations, and then from the remaining 180,000 samples, one from every 50 samples was saved for analysis. This Gibbs sampling scheme collected 3,600 independent samples for analysis.
Heritabilities for PBLUP models were computed as follows: where
and
are the additive genetic and residual variances for each trait. In the case of BIN, the residual variance was set to 1.
Genomic BLUP
The SNP based variance components and GEBVs were estimated using genomic BLUP (GBLUP), similar to the PBLUP model described above, and implemented in the BLUPF90 software. The GBLUP is a modification of the PBLUP method, where g is a vector of random additive genetic polygenic effects with a distribution ∼N(0, G) and the numerator relationship matrix
is replaced by a genomic relationship matrix
, as described by (VanRaden 2008). Only genotyped animals, which passed all quality controls (n = 580) were analyzed.
Single step genomic GBLUP
The single-step GBLUP (ssGBLUP) and weighted single-step GBLUP (wssGBLUP) models were similar to the PBLUP model except for a combined genomic and pedigree relationship. The kinship matrix used was H (Aguilar et al. 2010), in which genotype and pedigree data are combined. The inverse of the matrix H is: where
is the inverse numerator relationship matrix for all animals,
is the inverse of a pedigree-based relationship matrix for genotyped animals only, and
is the inverse genomic relationship matrix. SNPs were equally weighted and given an initial value of one in the ssGBLUP method. In the wssGBLUP method, the marker variances were used as weights. The marker variances were estimated from allele frequencies, and from marker effects that were calculated in the ssGBLUP method (Wang et al. 2014). The DD trait was analyzed as a linear trait using AIREMLF90 and BLUPF90, whereas, BIN was analyzed as a threshold trait with THRGIBBS1F90 in the BLUPF90 family of programs (Misztal et al. 2016). The MCMC Gibbs scheme for the estimation of the genetic parameters for BIN for ssGBLUP and wssGBLUP, were estimated identically as described above. The ssGBLUP and wssGBLUP models included all the genotyped animals which passed quality control (n = 580), and all the phenotyped fish (n = 2,606) from 107 families.
Bayes C
The Bayes C (Habier et al. 2011) analyses were performed using GS3 software. A total of 200,000 iterations were used in the Gibbs sampling, with a burn-in period of 20,000 cycles. The results were saved every 50 cycles. The number of samples in this analysis totaled 4,000. Convergence and autocorrelation were assessed by visual inspection of trace plots of the posterior variance components. The adjusted model can be described, in matrix notation, as follows:where y is the vector of phenotypic records (DD or BIN), X is an incidence matrix of fixed effects (sex and tank as factors and IW as covariate), β is the vector of fixed effects, T is an incidence matrix of polygenic effects, p is a random vector of polygenic effects of all individuals in the pedigree,
is the vector of the genotypes for the ith SNP for each animal,
is the random allele substitution effect of the ith SNP,
is an indicator variable (0, 1) sampled from a binomial distribution with parameters determined such that 1% of the markers were included in the model, and e is a vector of residual effects. The following prior distributions were assumed for the genetic random effects: Independent and identical mixture distributions for the SNP effects; each SNP has a point mass at zero having a probability π and a univariate normal distribution with a probability of 1 – π with null mean and variance
; which in turn has a scaled inverse chi-squared prior with
degrees of freedom and scale parameter
(or
) (Fernando and Garrick 2013). The scale parameter was estimated as a function of the genetic variance population, based on the mean SNP allele frequency and number of markers assumed with nonzero effects (Fernando et al. 2007). Only genotyped animals, which passed quality control (n = 580) were used.
Genomic prediction accuracy
The different models were compared using a fivefold cross validation scheme. To reduce stochastic effects of sampling, the cross-validation analysis was replicated ten times. Briefly, all challenged individuals (genotyped, phenotyped, or both), were randomly separated into five validations sets. For each set, predictions were made by masking the animals’ phenotypes and using the remaining fish as a training set to estimate the marker effects. Thus, for each cross-validation run, the dataset was split into a training set (80%) and a validation set (20%). Accuracy was used to assess the performance of each model and was estimated as follows: where
is the correlation between the EBV of a given model (predicted for the validation set using information from the training set) and the actual phenotype, while
is the square root of the pedigree-based estimate of heritability (Correa et al. 2017; Legarra 2008; Palaiokostas et al. 2016; Tsai et al. 2016). Finally, accuracies were calculated for each model and compared to those obtained with the PBLUP model.
Genome-wide association study
In order to identify associations between genetic markers and P. salmonis resistance, as DD or BIN, four genome-wide association methodologies were performed using the BLUPF90 set of programs (Misztal et al. 2016). GBLUP, ssGBLUP, wssGBLUP and w3ssGBLUP models were used to analyze the DD and BIN traits, using a linear and a threshold model respectively (as described above in model 1). For the GBLUP model, the pedigree-based relationship matrix (A) was replaced by a genomic matrix (G). For the ssGBLUP, wssGBLUP and w3ssGBLUP GWAS models the H matrix was used as described above. SNPs were weighted equally (and given a weight of one), for the ssGBLUP model. For the wssGBLUP and w3ssGBLUP methods, weights were determined based on individual marker variances that were estimated using both the marker effects, which were calculated previously for the ssGBLUP model, and marker allele frequencies (Wang et al. 2014). For the GBLUP GWAS model, only genotyped animals, which passed quality control (n = 580), were analyzed. For the ssGBLUP, wssGBLUP and w3ssGBLUP GWAS models, all the genotyped animals passing quality control (n = 580), and all the phenotyped fish (n = 2,606) were used. Due to practical reasons, the molecular markers anchored to the different scaffolds not placed in chromosomes, were assigned as chromosome 31. The parents of the challenged individuals were not included in the GWAS analysis because they did not have associated phenotype information, as they were not submitted to the challenge experiment.
Data availability
Table S1 contains genotypic data (available at the public dryad digital repository https://doi.org/10.5061/dryad.b273q6p), Table S2 contains phenotypic data, and Table S3 contains the pedigree information. Table S4 contains the full list of genes located within the top ten 1-Mbp windows proximate to each SNP associated with P. salmonis resistance for DD and BIN identified through wssGBLUP. Table S5 contains information of the top ten markers which explain the highest percentage of the genetic variance for each method and trait. Table S6 contains results from the 10 replicated CV.
Results
Challenge test
Mortality began on the 10th day after the P. salmonis challenge, with evident symptoms of SRS and pathological lesions typical of SRS. These signs include swollen kidney, splenomegaly and yellowish liver tone and coloration (Rozas and Enríquez 2014). Challenged families showed considerable phenotypic variation for P. salmonis resistance. Average mortality of all the 107 families reached 38.53% during the 50-day challenge. The average cumulative mortality rate among the 17 best and 16 worst families, selected for genotyping, reached 19% and 63%, respectively (Figure 1).
Kaplan-Meier curves for Piscirickettsia salmonis experimental challenge in coho salmon. Average mortality curves for the 107 full-sib families, and the 17 best and 16 worst families.
ddRAD sequencing
Prior to quality control (QC), per base quality (Phred score) was evaluated. The average quality score ranged from 36 to 38 among libraries, indicating high quality of data. Illumina sequencing, including parents, yielded an average of 156,058,078 (± 16 million) of raw sequences per library. After initial QC, which included the removal of low quality sequences and reads with either missing or ambiguous barcodes, an average of 31,660,024 of the reads were removed. In parallel with QC, reads were trimmed to 134 bp. Thus, 79% of the raw reads were retained for subsequent analysis. To create a set of all possible alleles in the population, data sets of the parental samples were used to create a STACKS catalog. This catalog consisted of 106,309 unique ddRAD loci from which 20,068 markers from 757 individuals were identified. Quality filtering reduced this to 9,389 putative bi allelic SNPs (see Table S1) with an average sequencing depth of 38x ranging from 11x to 501x. These markers were identified segregating along the genome of 580 individuals (see Table S1).
Genome-wide association analysis
Four genome-wide association methodologies were performed either for DD and BIN. These approximations include GBLUP, ssGBLUP, wssGBLUP and w3ssGBLUP. For both traits, all the models showed a similar association pattern. In the case of the ssGBLUP methodology, the GWAS plots become less noisy as the iterations progress, and the peaks associated with the traits become more distinct (Figure 2 and Figure 3). For DD, a marker potentially associated with P. salmonis resistance was located on chromosome 11 (Figure 2). This marker was identified in all of the four models and was within the top ten markers explaining most of the percentage of the genetic variance (Table 2).
Genomic association analyses for resistance against Piscirickettsia salmonis in a coho salmon population (defined as day of death) for four different models; GBLUP (A), ssGBLUP (B), wssGBLUP (C) and w3ssGBLUP (D). The gray area highlights the SNPs (in green), which are among the top ten markers explaining a high percentage of the genetic variance in the four models.
Genomic association analyses for resistance against Piscirickettsia salmonis in a coho salmon population (for survival as a binary trait) for four different models; GBLUP (A), ssGBLUP (B), wssGBLUP (C) and w3ssGBLUP (D). The gray area highlights SNPs (in green) that were among the top ten markers explaining a high percentage of the genetic variance in the four models.
The availability of a high quality coho salmon reference genome (GenBank accession number MPVK00000000.1), made it possible to identify genes near this marker. Within ∼55 Kbp of this marker is the phosphoinositide-3-kinase adaptor protein 1 (pik3ap1) gene; a gene related with innate host defense through B-cells development (Aiba et al. 2008; Herzog et al. 2009).
In the case of BIN, two molecular markers, explaining most of the genetic variance, were identified in all of the four models (Table 2). One of these markers is located on chromosome 29, while the second marker was identified only at a scaffold level (Scaffold04124) (Figure 3). For BIN, host immune response related genes were not found proximate to any of the suggestive molecular markers.
However, some genes within a 1-Mb window of these markers have been suggested to be involved with P. salmonis infection. The 14 KDa Phosphohistidine phosphatase-like (PHPT1), gelsolin-like (GSN), and glutamine synthase-like (GS) genes are located on chromosome 29 and near associated markers. Claudin-10 was found on scaffold04124. These genes have been previously identified as being up-regulated in Salmo salar individuals with low susceptibility to P. salmonis (Pulgar et al. 2015). Moreover, the retinoic acid receptor RXR-alpha-A-like (RXRA) gene, located on chromosome 29, has previously been identified as a molecular biomarker for P. salmonis infections, and has been found to be down-regulated in macrophages during infection (Rise et al. 2004).
A full list of genes that are located within a 1-Mbp window proximate to the suggestive markers associated with P. salmonis resistance for Okis11, Okis29 and at a scaffold level, identified through wssGBLUP model, is shown in Table S4. In the case of the marker located on the scaffold, the surrounding sequence was blasted against the Salmo salar reference genome (NC_027300.1).
Genetic parameters and predictions
Significant additive genetic variation was estimated for both DD and BIN when using all the data from challenged individuals from the 107 maternal, full-sib families (Table 1). Using the pedigree-based model without genomic data, estimates of the narrow sense heritability for DD and BIN were equal to 0.14 (± 0.034) and 0.27 (± 0.043), respectively.
Based on a fivefold cross validation, the accuracy of the PBLUP model was slightly lower for DD (0.271) than for BIN (0.316) (Figure 4). When genomic data were included, accuracies for DD and BIN were higher than those achieved using only phenotypic data. However, there is considerable variation between models and trait definitions. The accuracies for the different models ranged from 0.299 (ssGBLUP) to 0.529 (GBLUP) for the DD trait, and from 0.314 (ssGBLUP) to 0.807 (GBLUP) for the BIN. For DD, all the models with genomic data outperformed the pedigree-based model. The relative increase in accuracy ranged from 10 (ssGBLUP) to 95% (GBLUP) (Figure 4). In the case of BIN, the relative increase in accuracy ranged from 20 (wssGBLUP) to 155% (GBLUP). However, one of the genomic models (ssGBLUP) had a similar accuracy to the PBLUP model, with a relative accuracy 1% lower than the pedigree-based model.
Comparison of predicted accuracies (R) for Piscirickettsia salmonis resistance in a coho salmon population comparing between PBLUP and models with genomic data for DD (red bars) and BIN (blue bars).
Interestingly, the accuracies obtained with GBLUP were higher than ssGBLUP and wssGBLUP for both traits. These accuracy values reached 0.529, 0.299 and 0.417 for DD and 0.807, 0.314 and 0.3797 for BIN. For both traits, the models with better performance were, GBLUP > Bayes C > wssGBLUP > ssGBLUP.
Discussion
Significant genetic variation for P. salmonis resistance was detected in the present study. Moderate heritabilities were estimated using different trait definitions, either for DD or BIN. Estimated heritabilities were higher for resistance as a binary trait when compared to DD.
Previously, Yáñez et al. (2016a), estimated a heritability of 0.16 for resistance against P. salmonis, defined as day of death (from the same coho salmon population) through a bivariate linear model. Our study estimated a similar heritability value (0.14) in the same population. Differences in the estimations are likely due to the univariate model we used instead of a bivariate model. We also estimated heritability for P. salmonis resistance using a threshold model for the binary trait, which was higher (0.27) than the value reported for DD. These results are consistent with previous findings for resistance against P. salmonis in Atlantic salmon using pedigree information (Yáñez et al. 2013).
When resistance, defined as day of death, was analyzed using a linear model, heritability was estimated as 0.18 (0.03). When using a threshold model to analyze resistance as a binary trait, a heritability of 0.24 (0.04) was calculated (Yáñez et al. 2013, 2014b). The genetic variation and heritability values for P. salmonis resistance are in accord with different studies that have also found significant genetic resistance to other bacterial diseases in salmonid species (Gjøen et al. 1997; Ødegård et al. 2006; Vallejo et al. 2016).
Bacterial disease resistance has been suggested to be a polygenic trait in aquaculture species. For example, Palaiokostas et al. (2016) suggested that resistance against Photobacterium damselae subsp. has a polygenic architecture in Gilthead Sea Bream (Sparus aurata). Using a 50K SNP genotyping array, it was possible to elucidate a moderately polygenic architecture of P. salmonis resistance in Atlantic salmon (Correa et al. 2015b). In the current study, and using 9K SNPs, a similar genetic architecture for resistance against P. salmonis in coho salmon population was found. However, it is likely that the moderate number of individuals could limit the power to detect QTL of larger effect controlling P. salmonis resistance in coho salmon.
Among the top ten genetic markers for DD and BIN, one and two markers were identified among all four models, respectively. The availability of an annotated, coho salmon genome made it possible to identify the phosphoinositide-3-kinase adaptor protein 1 (pik3ap1) (also known as the adaptor protein B-cell PI3K adaptor (BCAP)), a gene that is related with B-cell development (Herzog et al. 2009), proximate to the genetic marker found to be associated with resistance. The role of B-cells, through the humoral response, has been widely investigated and elucidated (reviewed in Janeway et al. 2001). B-cells may also have phagocytic activity in both rainbow trout (Li et al. 2006) and Atlantic salmon (Øverland et al. 2010). These cells are capable of ingesting large particles and bacteria; killing them through phagolysosome fusion (Li et al. 2006). Studies in vitro, showed that from the total phagocytic leukocytes isolated from Atlantic salmon head kidney, 37% were B-cells. Additionally, 77% were B-cells when leukocytes were isolated from peripheral blood (PB). The phagocytic ability of B-cells was three times higher than those observed in neutrophils in head kidney, while in PB no differences were observed (Øverland et al. 2010).
We hypothesize that B-cell development could help in the immune response against P. salmonis in coho salmon population through its phagocytic activity. Further studies are needed in order to have a better understanding of the role of these cells in the resistance against P. salmonis, and its ability to digest and kill bacteria. For resistance defined as the BIN trait, no genes related with immune response were identified near genetic markers associated with this trait. A possible explanation of this observation, is that the region near the genetic marker acts as a regulatory sequence. Also, it could be due to the relative small sample size for the GWAS, which ideally should be over 1000 animals (limiting the resolution of the GWAS).
The underlying genetic basis for this trait may have an important impact on the accuracy of genomic predictions. Thus, performance comparisons of different algorithms is required for either GS and GWAS when a complex trait is studied for the first time within a population (Vallejo et al. 2017b). The current study is the first to evaluate the genetic architecture of resistance against Piscirickettsia salmonis in coho salmon through different algorithms (to our knowledge). We used a genomic model that assumes that genetic variances are controlled by an infinite number of markers with minimum effects on the trait (i.e., GBLUP). The GBLUP method calculates genomic relationship (G matrix) using all the genotyped markers for this reason. An extension of this method, which combines genomic (G) and pedigree-based (A) relationship information into the H relationship matrix (Aguilar et al. 2010; Legarra et al. 2014) was also evaluated. This method, called single-step GBLUP (ssGBLUP), mimics a Bayesian selection model in that it only fits SNPs that explain moderate to large genetic variance of the trait (Wang et al. 2012). Results of these different genomic-wide association analyses suggest that resistance against Piscirickettsia salmonis has a polygenic architecture, with no major QTL (i.e., explaining >= 10% of the genetic variation) segregating in the current population (Figure 2 and Figure 3).
When resistance was defined as DD, the accuracy for the PBLUP model was slightly lower than resistance measured as BIN (0.271 and 0.316, respectively). These values are consistent with the results obtained for resistance against the bacterial disease pasteurellosis in S. aurata defined as day of death, authors reached an accuracy up to 0.30 through a pedigree-based model (Palaiokostas et al. 2016). However, both accuracy values are slightly lower when compared with values obtained for resistance against sea lice in Atlantic salmon. In this regard, Correa et al. (2017) reached an accuracy of 0.41 for resistance against Caligus rogercresseyi, while Tsai et al. (2016) reached a prediction accuracy ∼0.5 for resistance against Lepeophtheirus salmonis using PBLUP.
The results from our genomic predictions are in agreement with previous studies that showed higher estimated accuracies using GS than with PBLUP in the same half/full-sib family structure in salmon breeding programs (Nielsen et al. 2009; Lillehammer et al. 2013; Bangera et al. 2017).
The GS model that showed the best performance in terms of accuracy of predictions for DD was GBLUP. This was followed by Bayes C, wssGBLUP and ssGBLUP. The addition of genomic information allowed an improvement up to 95% in accuracy for this trait. The current study showed a relative accuracy prediction improvement of 54% when comparing PBLUP to wssGBLUP for this trait. This improvement is different to that obtained by Vallejo et al. (2016) for resistance against Bacterial Cold Water Disease (BCWD), defined as day of death, in rainbow trout (Oncorhynchus mykiss). The authors reported a reduction in the predictive ability (PA) of 20 and 26% using wssGBLUP, either with a chip array (40K) or through RAD sequencing (10K) respectively. The authors attributed this pronounced reduction to stochastic fluctuations due to a small training group of their study (n = 583). However, when the number of individuals was increased, this model outperformed the pedigree-based model, reaching a relative increase in accuracy up to 108%, for the same trait definition (Vallejo et al. 2017a).
In the case of BIN, the use of the ssGBLUP model, did not show an improvement in the accuracy prediction (with a reduction of 1% in comparison to the accuracy obtained though the pedigree-based model). However, there was an estimated relative increase in accuracy ranging from 20 to 155% when comparing PBLUP to the others genomic models. In this regard, Bayes C and GBLUP showed an increase in accuracy of 140% and 155%, respectively. These high values are similar to the improvements seen with resistance against BCWD defined as a binary trait, which reached a relative increase up to 97% (Vallejo et al. 2017a).
The relative improvements ranged in accuracy from -1 to 155% for BIN and from 10 to 95% for DD are greater to those obtained for other diseases resistance studies in Atlantic salmon; even with lower marker numbers. Using an identical random selection design as in the current study, sea lice resistance showed a relative improvement in accuracy of 22% relative to PBLUP when using 37K SNPs (Correa et al. 2017). For reliability, an improvement up to 52% with 220K SNPs was reached (Ødegård et al. 2014). Tsai et al. (2016) reported that when a non-full siblings design was used, an improvement in accuracy of 250% and 500% was reached in two different populations compared to PBLUP. However, when the methodology was changed to a random selection scheme this improvement only reached up to 27% using 33K SNPs.
In case of Piscirickettsia salmonis resistance, and using the same cross-validation scheme in our study, the relative reliability was increased by 25% and 30% for resistance, defined as day of death or as a binary trait, respectively with 50K SNPs (Bangera et al. 2017).
In the current study we evaluated a wide range of different models of GS for their potential implementation in aquaculture. The performance of each implemented model varied according to the underlying genetic architecture of the trait (Meuwissen et al. 2001; Daetwyler et al. 2010). Thus, it is valuable to perform these comparisons to identify the best performing method using real data. Similar accuracies among PBLUP and ssGLBUP models could be due to the predicted GEBV as both models are overrepresented by polygenic EBV (Bangera et al. 2017). GBLUP estimated genetic relationships using genotype and pedigree data rather than just average relationship as PBLUP (Habier et al. 2007). This allows a more accurate genetic relationship matrix and provides an increase in performance, as seen in the current data, due to the close family relationship. Moreover, GBLUP had significantly better performance when resistance was defined as a linear or binary trait compared to the other evaluated genomic models. We suggest that this could be an effect of only genotyping families from the opposite sides of the mortality distribution (i.e., most resistant and most susceptible), and not all the challenged individuals. Bangera et al. (2017), reported similar accuracies among GBLUP and Bayesian methods between GBLUP and Bayes C using 10K SNPs in Atlantic salmon. Similar results that have been seen in dairy cattle for most traits (de Roos et al. 2009; Goddard 2009)
The greater improvement in accuracy for the binary trait BIN, compared with the linear trait DD, could be due a better fit of the threshold model for BIN than the fit of the linear model for DD. It could also be due to the higher estimated heritability.
We hypothesize that the large improvement values seen in the current study are likely due to an increased level of linkage disequilibrium (LD) found within this farmed coho salmon population. Additional studies are needed to elucidate the minimum number of markers necessary for GS.
Our results, using ddRAD sequencing, are in agreement with other genomic studies, which utilize GBS techniques with aquaculture species. Using RAD sequencing, some authors have previously performed genomic-wide association studies in rainbow trout looking for associations with disease resistance. From a total of 4K identified SNPs, 31 markers were significantly associated with either BCWD or Infectious Hematopoietic Necrosis Virus (IHNV) resistance as a binary trait (Campbell et al. 2014). These authors also showed the potential of these RAD markers to predict an animal’s phenotype. In the case of BCWD resistance, defined as a linear and binary trait, Palti et al. (2015b) identified suggestive and significant SNPs in two different families and candidate genes associated with this trait using ∼5K markers per family. Similar numbers of SNPs were used by Liu et al. (2015) to significantly associate 18 SNPs.
Genomic selection predictions are in accordance with studies aimed to evaluate GS using others GBS techniques, in both, relative increase in accuracy and number of discovered SNPs. Dou et al. (2016) predicted higher accuracies for shell height and shell width using GBLUP and Bayes methods in Yesso scallop (Patinopecten yessoensis) using 2K SNPs identified by 2b-RAD. Identical methodology allowed Palaiokostas et al. (2016) to reach a relative increase in accuracy up to 53% with 12K SNPs using Bayesian methods compared to PBLUP. A study in rainbow trout using RAD sequencing identified 10K SNPs. Even then, the accuracies were similar with GS models compared to PBLUP, and the authors predicted that increasing the number of individuals could lead to a relative increase in accuracy up to 69% (Vallejo et al. 2016).
The genotyping strategy was aimed at the i) evaluation of genomic selection methods; and ii) allowing the identification of molecular markers associated with the trait by means of GWAS. The aim was maximizing the phenotypic variance within the sample while keeping a balanced representation of fish per family. Thus, genotyping strategy was not totally random, but specifically focused on the most extreme families; 17 resistant and 16 susceptible families. We aimed at genotyping all the fish belonging to each selected family. Thus, each family was represented within the sample with an average of 23 (ranging from 11 to 43) fish/family.
The availability of dense SNP arrays for coho salmon, as it is already the case for Atlantic salmon (Houston et al. 2014; Yáñez et al. 2016b) and rainbow trout (Palti et al. 2015a), may increase the accuracy for predicting genomic breeding values and the power for the determination of the genetic factors involved in economically-important traits. It is also expected that in the near future, further functional studies for a better understanding of P. salmonis resistance and other complex traits in salmonids will be facilitated by the international initiative on the Functional Annotation of All Salmonid Genomes, FAASG (Macqueen et al. 2017).
We have evaluated different GS models, and demonstrated that the use of genomic prediction is a feasible strategy for the improvement of breeding value prediction. This information could be used for the implementation of genomic information in genetic programs for Piscirickettsia salmonis resistance in farmed coho salmon populations.
Conclusions
Moderate significant genetic variation was estimated for resistance against Piscirickettsia salmonis in coho salmon, using either pedigree or genomic information. These results highlight the feasibility of including this trait into genetic improvement programs. Our study shows that genomic prediction methods, using ddRAD genotypes (including 9K SNPs), has a substantial advantage in terms of accuracy when compared to pedigree-based model for either DD or BIN. The improvement was up to 95 and 155% respectively in the current population. The association analyses were used to identify a gene related with B-cell development, which could also be involved in resistance against P. salmonis. To our knowledge, this is the first study aimed at dissecting the genetic architecture of P. salmonis resistance in a coho salmon population.
Acknowledgments
AB and KC acknowledge the National Commission of Scientific and Technologic Research (CONICYT) for the funding through the National PhD funding program. AB acknowledges the Government of Canada for the funding through the Canada-Chile Leadership Exchange Scholarship. This project was funded by the U-Inicia grant, from the Vicerrectoria de Investigación y Desarrollo, Universidad de Chile. This work has been conceived on the frame of the grant FONDEF NEWTON-PICARTE (IT14I10100), funded by CONICYT (Government of Chile) and the Newton Fund - The British Council (Government of United Kingdom). This work has been partially supported by Núcleo Milenio INVASAL from Iniciativa Científica Milenio (Ministerio de Economía, Fomento y Turismo, Gobierno de Chile). This research was carried out in conjunction with EPIC4 (Enhanced Production in Coho: Culture, Community, Catch), a project supported by the government of Canada through Genome Canada, Genome British Columbia, and Genome Quebec.
Authors’ contributions: AB performed DNA extraction, library construction, ddRAD analysis, GWAS analysis, and wrote the initial version of the manuscript. KrC performed library construction and contributed to the data analysis. AB, GY and KC performed genomic prediction analysis. AJ performed DNA extraction. JPL contributed with study design. WD contributed with analysis and discussion. JMY conceived and designed the study, supervised work of AB and contributed to the analysis, discussion and writing. All authors have reviewed and approved the manuscript.
Animal ethics approval: Challenge and sampling procedures were approved by the Comité de Bioética Animal from the Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile (Certificate N08-2015).
Footnotes
Communicating editor: R. Houston
Supplemental Material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.118.200053/-/DC1
- Received September 26, 2017.
- Accepted February 1, 2018.
- Copyright © 2018 Barria et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.