Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies

The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI.


Introduction
Identifying loci under selection and inferring past species history remain two of the main challenges in evolutionary biology. To a large extent this is due to the difficulty to tease apart the effects of selection and demography at the genetic level and to the dearth of methods to estimate the effect of these two processes simultaneously (Li et al., 2012;Tiffin & Ross-Ibarra, 2014). This is especially true for methods based on the detection of outliers from a null demographic model since an incorrect demographic model can lead to a large number of false positives or false negatives (Huber et al., 2014;Tiffin & Ross-Ibarra, 2014). The difficulty is further compounded by the fact that selection can act at different spatial scales, and detection of local selection can simply be missed if one uses a sample of individuals drawn indiscriminately from a large population.
Conversely, selection can leave a signature in some genes at the species range level, but no detectable signature at a more local scale and vice versa. This is what is suggested by our recent studies in Norway spruce (Picea abies) and Siberian spruce (Picea obovata), two Eurasian conifer species with a joint distribution extending from Norway eastwards to the Sea of Okhotsk. We have identified loci putatively under selection in two types of studies. First, using populations located along latitudinal clines we combined tests of association to environmental variables, detection of FST outliers and gene expression studies to detect evidence of local adaptation in genes related to variation in phenology, a trait known to exhibit strong latitudinal variation (Ekberg et al., 1979;Savolainen et al., 2007). In both species, two photoperiod pathway related genes, PaFTL2 (Picea abies Flowering Locus T like 2) and PaGI (Picea abies Gigantea), appear to have been under local selection (Chen et al. 2012a;Chen et al., 2014).
Second, using individuals sampled across the range of Norway spruce, we looked for departure from neutrality under different demographic models in 19 genes from the photoperiodic pathway. Only the circadian clock gene PaPRR3 (Picea abies Pseudo Response Regulator 3) deviated consistently from neutrality for all tested demographic scenarios, and the signature of selection observed at PaPRR3, a positive Tajima's D value, indicative of an excess of common variants, was suggestive of local selection, at least at broad geographic scale (Källman et al., 2014). No evidence of selection at PaPRR3 was, however, detected in the aforementioned analyses of clines that considered a more local or finer spatial scale.
In comparison to these previous studies that either sampled rather densely along a cline or used a scattered sampling across the whole natural range, the present one is based on an intermediate sampling strategy. The current study also differs in its re-sequencing approach. Here we use next generation sequencing of two pools of individuals to assess polymorphism at 88 nuclear genes. Pooled sequencing reduces costs and, if analyzed carefully, can provide reliable estimates of allele frequencies (Boitard et al., 2012;Futschik & Schlotterer, 2010;Lynch et al., 2014). These polymorphism data are used to (i) infer the divergence time and migration rate between the two domains, using silent sites and (ii) test for the presence of both purifying and adaptive selection in those genes. The two pools of individuals were sampled in the two main phylogeographic domains of Norway spruce, the Alpine domain and the Fennoscandian domain (Fig. 1). These two domains are today geographically separated and genetically differentiated (Heuertz et al., 2006;Lagercrantz & Ryman, 1990), especially for mitochondrial DNA markers (Lockwood et al., 2013;Tollefsrud et al., 2009;Tsuda et al., in press). The strong divergence observed at mtDNA markers and the polyphyly of P. abies for mtDNA even led Lockwood and co-authors (2013) to suggest that the two domains may deserve to be considered as two different species. However, divergence between the two domains is not exceptionally high for 22 allozymes (Lagercrantz & Ryman, 1990), and 16 nuclear genes (Heuertz et al., 2006) and certainly lower than the divergence from Siberian spruce (P. obovata) at both types of markers (Tsuda et al., in press). With 88 nuclear loci the present dataset provides an opportunity to obtain a better estimate of divergence time and migration between the two domains compared to earlier studies. We also show that both purifying and directional selection have shaped polymorphism at some of the genes related to the photoperiodic pathway.

Sampling and sequencing
We germinated seeds from 48 maternal trees sampled in four Fennoscandian populations (coming from latitudes ranging from 61°N to 67°N), and from 48 maternal trees sampled in three Alpine populations (coming from latitudes ranging from 46°N to 47°N) ( Table 1, Fig. 1). Diploid DNA was extracted from needles individually, using DNeasy Plant Mini Kit (QIAGEN, Germantown, MD).
We PCR amplified 88 gene fragments using the proofreading enzyme Phusion (Finnzymes, Espoo, Finland). We selected genes in two ways: 1) 30 of them come from our resequencing efforts; they were chosen since they are homologous to Arabidopsis or Populus genes that belong to the photoperiodic pathway, and in particular to the circadian clock; 2) 58 of them originate from a previous unpublished microarray expression study. They had a rapid response to day length changes and were highly divergent in gene expression during the bud set period (Källman et al. unpublished Paper I in Källman 2009). Gene sequences and their best BLASTN hits in Norway spruce genome have been deposited in Dryad (see Data Accessibility). We blasted the reference sequences against the P. abies and P. glauca genomes and found them to be unique. We controlled for paralogs in two ways. First, we checked the gel bands in the lab. We also Sangersequenced genes using the megagametophytes of two individuals from the same origin as the pooled samples and this did not suggest the presence of paralogs.
Second, a BLAST search of reference sequence against P. abies and P. glauca genomes produced significant similarity differences between best hit and second hit. The bias introduced by possible paralogs depends on the percentage of paralogs distributed in individuals of both pools and the percentage of paralog reads mapped/unmapped to the reference. This remains a difficult problem in non-model organisms whose genome is incomplete since exhaustive search is impossible for homolog/paralog/gene duplication.
Amplification products of all genes were diluted to a concentration of 10ng/μl and mixed in equal proportions. Individuals from the different populations were pooled into two sequencing libraries corresponding to the Fennoscandian and Alpine domains, respectively. Mixed DNA samples were sheared into fragments of size between 150-200 base pairs using Bioruptor. Paired-end libraries were prepared and sequenced using next generation sequencing (NGS) technology on 6 Illumina HiSeq2000 platform by Istituto di Genomica Applicata (IGA), Udine (Italy), with the read length equal to 109bp. In order to identify errors arising from sequencing, two sequencing replicates were run for both pooled libraries.
The Sanger-sequenced genes using the megagametophytes of two individuals from the same origin as the pooled samples were also used as references for short-read alignment.

Read mapping and SNP calling
We trimmed sequencing reads using the program Trimmomatic v. 0.32 (Bolger et al., 2014), retaining only high-quality sequence pairs where each pair was at least 34bp long after trimming. Read alignment was performed using a two-step SNP-tolerant mapping protocol. First, the trimmed reads were aligned to reference Sanger sequences using GSNAP (Wu & Nacu, 2010;Wu & Watanabe, 2005) with default settings for genomic mapping. Aligned reads were filtered with a mapping quality ≥20. We only kept those properly paired reads that were mapped to a unique position in the same gene (i.e. concordant unique read pairs). Indel-realignment was performed using the Genome Analysis Toolkit (DePristo et al., 2011). We identified single nucleotide polymorphisms (SNPs) with Sanger sequenced references in all sequencing libraries using samtools' mpileup (Li et al., 2009). Missing sites in reference sequences were corrected by BLAST (Zhang et al., 2000) against the Spruce Genome Project (Nystedt et al., 2013), as well as the aligned sequence in both pools before the next step. In the second step, we used identified variants to perform SNP-tolerant read alignment using GSNAP. In order to have a uniform number of reads per gene, reads of each library were subjected to two steps of downsampling before and after aligning to references. We randomly picked reads without replacement to generate a downsampled library of size equal to the smallest library, which contains 1,807,545 read pairs. We then repeated the quality filtering and indel-realignment as in the first step. Before SNP calling, we performed a further downsampling of aligned reads in each gene to reduce possible bias in frequency estimation caused by uneven distribution of read depth across gene body, as well as different amplification cycles amongst genes. We randomly picked 100 reads on both strands without replacement that covered the first base at 5' end of the gene and 7 recalculated read depth in each position. We then kept downsampling at the next base toward the 3' end until the number of reads at all positions reached 100 on both strands. Thus, in total we expected the mean depth at each position to be around 200. At positions with a depth lower than 200 all reads were retained.
Finally, SNP identification and allele frequency estimation were performed in all sequencing libraries using CRISP, a multi-sample variant caller for highthroughput pooled sequence data (Bansal, 2010). To minimize the number of possible sequencing errors, the inference of an allele and its frequency was considered to be reliable only if: (i) the allele was supported by both sequencing replicates, and (ii) minor alleles could be identified in at least two reads in each replicate or five reads in total across the two pools. Allele frequencies were polarized by comparing to the P. glauca homolog (Birol et al., 2013). Since we expected the minimum allele frequency to be at least one in 96, we considered rare alleles of frequency <0.01 (which is also the 5% tail of frequency distribution of all alleles) to be biased estimates of unbalanced mixture of amplification products or possible mismatches introduced during the amplification and/or sequencing processes. However, we did not simply exclude them from our study since those estimates still represented sampling probability of rare alleles. Instead, we performed our analyses on the total dataset, as well as on subsets of SNPs with different frequency cutoffs. Two frequency cutoffs were used: ≥0.01 and ≥0.1.

Nucleotide diversity and population divergence
We estimated pairwise nucleotide diversity (π) (Nei & Li, 1979), Tajima's D (Tajima, 1989) and Wright's fixation index FST (Wright, 1931) values for each locus based on estimates of allele frequencies using custom R scripts. We chose Hudson's estimator of FST (Hudson et al., 1992;Keinan et al., 2007) for each SNP and an overall FST across all SNPs as an estimate of population divergence using the formulae derived by Bhatia et al. (2013): where n1, n2 and p1, p2 are the sample size and allele frequencies in the two samples, respectively. Hudson's estimator is a better choice for pairs of 8 populations since it has been shown to be more robust to sample size difference than other estimators and FST is not systematically overestimated. To combine information across SNPs two methods were used: we took the average of individual SNP FST values or we calculated the ratio of the expectations of the denominator and numerator across all SNPs. The first estimator is more sensitive to the presence of SNPs with low minimum allele frequency (MAF) and the second is dominated by SNPs that are on average more polymorphic (Jackson et al., 2014).

Inference of population demographic history
To infer the population divergence history we used the program fastsimcoal2, (ver. 2.5.2.8) which implements a maximum likelihood method based on composite likelihoods of the two-dimensional joint site frequency spectra (SFS) (Excoffier et al., 2013). We considered three demographic models. All three models start with an ancestral population (N0) that split into two descendant populations with effective population size NFEN and NALP, respectively, at time T before present. Asymmetrical migration is allowed between the two descendant populations after divergence with a number of migrants equal to M=2Nm where N is the effective population size of one of the two descendant populations and m is the migration rate that can differ for each descendant population. Three different demographic models were applied to the ancestral population: 1) constant population size; 2) population expansion with a growth rate α; and 3) bottleneck ( Fig. 2). In the latter, to reduce the complexity, we assumed that a bottleneck size of 0.001N0 occurred TBOT generations ago and lasted for 100 generations. We used all silent SNPs to calculate the site frequency spectrum and To assess the performance of the models, we applied a multinomial comparison between expected and observed SFS by calculating Anscombe residuals (Gutenkunst et al., 2009).

SNP divergence analysis based on FST and allele frequency difference
In this analysis, we combined Hudson's FST and allele frequency difference (AFD) as a distance measurement of SNP divergence between populations. By comparing to the expected distribution of the most likely demographic scenario, we identified outliers with higher divergence as the clue of either diversifying selection.
We first dispersed all SNPs on orthogonal coordinates of FST and AFD. The distance of each SNP to the origin (FST=0, AFD=0) was calculated as a new summary statistics for SNP divergence distance (divD) that combines FST and difference in allele frequency (AFD): We then dispersed SNPs on a third dimension by grouping them into nonsynonymous or silent (non-coding and synonymous) sites using a variable transformation based on Factor Analysis of Mixed Data (FAMD). We applied FAMD to our data using functions implemented in the R package 'FactoMineR' (Le et al., 2008). Divergence distance for a given gene was calculated by averaging values at SNPs within that gene. Simulated SNPs were also randomly grouped as if they belonged to the same genes as in the observed dataset. For each simulated dataset, we then calculated divD for SNPs and genes and calculated the 95%, 97.5%, and 99% envelopes based on the ranked SNP distance to the origin. We used the maximum values of these three envelopes of the 1000 bootstrap runs as the cutoffs for outlier selection at 5%, 2.5%, and 1% significance levels.

FST outliers
FST has been used extensively as a powerful statistic to identify loci under natural selection (e.g. Beaumont & Balding, 2004;Beaumont & Nichols, 1996;Foll & Gaggiotti, 2008 (Foll & Gaggiotti, 2008). To account for the difference in numbers of putative neutral and selected loci, we chose a prior odds ratio of 10, which should be conservative in our case. We performed 20 pilot runs with 5,000 iterations and 500,000 iterations MCMC with an additional 50,000 iterations as burn-in. We calculated Q-values (Storey, 2003) as posterior probabilities corrected for multiple tests to estimate the likelihood of the model with selection compared to the neutral model.

Sampling and sequencing
We

Population divergence
Population divergence between the two domains was estimated for each single SNP. We chose Hudson's estimator of FST since it is less sensitive to sample size difference than other estimators (Bhatia et al. 2013). More than half of the SNPs had an estimate of FST value less than 0.01 and close to 90% of the SNPs had an estimate of FST below 0.1 (Fig. 3D). Twenty-seven highly divergent SNPs had FST values above 0.3 (99% quantile). About half of these high FST SNPs were distributed over four genes, PaGI (2 SNPs), PaAP2L3 (3), and Pa02776N18 (5) and Pa02728N15

Joint demographic history inference
We simulated the joint site frequency spectrum (SFS) based on an "Isolation with migration" model, using all silent SNPs. Three ancestral population models were tested: a constant, expanding, or bottlenecked ancestral population size (Fig. 2).
Demographic inference was conducted using 1,000 bootstraps of the SFS and we reported the 95% confidence interval for each parameter (Table 3, and Fig. S2).
Model comparison using Akaike's weight of evidence (w) showed that 83.8% of bootstraps supported the constant size model (mean w=0.53) while only 8.7% and 7.5% supported the bottleneck (mean w=0.24) and growth models (mean w=0.23), respectively. Two-dimensional multinomial comparison between the constant size model and observed data also showed a good fit with the maximum Anscombe residual less than 2.8 (Fig. S3). Assuming a mutation rate of 1.1×10 -9 per site per year (Chen et al. 2012b) and an average generation time of 25 years, we obtained an effective population size around 41,000 (95%CI: 13,000-47,000) for FEN and a slightly larger one, at 51,000 for ALP (95% CI: 45,000-65,000). The ancestral population size was nearly ten times larger (~400,000). As noted by Becquet and Przeworski (2009) isolation-migration models often yield an 13 estimate of the ancestral effective population size that is much larger than that of either descendant population. Simulations indicate that this large value of the ancestral population size could reflect the fact that the ancestral population was actually structured. The two populations started to diverge around 5 million years ago (95% CI: 3.6 Mya-14 Mya, for the constant size model). We also found more migrants in the direction from north to south (M~15 per generation, 95% CI: 1.7, 19.5) than in the opposite direction (M<2 per generation; 95% CI: 0.9, 8.9).

SNP divergence analyses based on FST and AFD
FST is a fixation index and measures genetic divergence between populations, but is not an absolute measure of population differentiation: two populations can have no alleles in common and yet have a very low FST value, if each of them is highly variable (Charlesworth, 1998). Hence, part of the information present in the difference in allele frequencies between populations is not captured by FST.
We therefore calculated a new summary statistic divD from both allele frequency difference (AFD) and FST values (see Material & Methods). While over half of the SNPs with AFD <0.1 and FST<0.01 clustered tightly around the origin, SNPs spread quickly as AFD increased (Fig. 4). We also separated the SNPs into nonsynonymous and silent sites (Fig. S4). The non-synonymous SNPs in general tend to be less diverged (less spread in FST and AFD values) between FEN and ALP populations than silent ones. The distance distribution between these two groups of SNPs differed significantly  in both mean (0.87 and 1.1 for non-synonymous and silent SNPs respectively, Wilcoxon Rank Sum test p-value =7e-4) and variance (0.55 and 1.1, F-test p-value <2.2e-16). This difference could reflect the fact that non-synonymous mutations are more likely to be under purifying selection and therefore exhibit much lower divergence between geographically isolated populations than silent mutations that are primarily affected by genetic drift (Barreiro et al., 2008;Jackson et al., 2014;Maruki et al., 2012).
On the other hand, SNPs involved in local adaptation may show higher divergence level due to differences in environmental stress across populations.
To control for the effect of genetic drift, we took advantage of the simulated SFS to estimate the variation in FST and AFD that could be generated by the demographic history. We then built significance envelopes at 5%, 2.5%, and 1% to select SNP outliers based on their ranked distance to the origin (see details in Material & Methods). This resulted in 132, 68, and 15 SNPs in the 5%, 2.5%, and 1% significance level tails, respectively (see the detailed list in Supplementary file 1). Remarkably, the majority (>70%) of these outliers came from nine genes: To account for the difference between synonymous and silent SNPs, we also applied Factor Analysis with Mixed Data (FAMD) that incorporated the nonsynonymous/silent information as a categorical variable. The first three components explained 41%, 32% and 27% of the variance, respectively (Fig S4). We

Testing for local adaptation: FST ouliers
To identify candidate SNPs under local adaptation, we used BayeScan v2.1 (Foll & Gaggiotti, 2008). The method decomposes FST into a population-specific effect shared by all SNPs, and a locus-specific effect shared by all populations. The presence of selection is inferred if the locus-specific effect is necessary to explain the observed divergence. The log probabilities of choosing alternative models were calculated and corrected for multiple tests as Q-values (Fig. 5). By assuming a prior odd of neutral to selected mutations at 10, we identified 24 outlier SNPs at Q-value ≤0.05 containing four non-synonymous (PaGI_F4_233, PaPRR3_2066, PaAP2L3_1420, and Pa0279G15_694) and one synonymous (PaAP2L3_1621) SNP.

Discussion
In previous studies we have detected departures from neutrality among phenology related genes using scattered samples across the whole range of Norway spruce (Källman et al. 2014) and evidence of local adaptation by analyzing variation along latitudinal clines in Norway spruce and Siberian spruce (Chen et al., 2012a;Chen et al., 2014). In the present study, we used an intermediate sampling strategy as well as a different sequencing approach and sequenced 88 genes in two pools of individuals drawn from the two main historical domains of Norway spruce, the Alpine and Fennoscandian domains.
We then used silent polymorphisms to infer the demographic history of the two domains and all polymorphic sites to test for the presence of selection. Different approaches, including a novel one, were used for the latter. The main conclusions are that, based on the silent variation in this sample of 88 genes, the two domains diverged some 5 Mya and that purifying and positive selection both played an important role in the divergence of the two domains. In particular, we confirmed the importance of selection at a large geographic scale on genes related to the circadian clock such as PaPRR3 and PaGI. Below we shall discuss those results in turn, but we start by addressing some of the technical issues encountered when analyzing pooled sequence data.

Estimate population genetic parameters from pooled ngs data
Pooled sequencing technology has proven to be not only cost-effective but, more importantly, a fast and reliable access to estimates of the site frequency spectrum and population divergence if a proper sampling and statistical strategy is used (e.g. Boitard et al., 2012;Futschik & Schlotterer, 2010;Lynch et al., 2014;Fracassetti et al. 2015). To minimize the error rate in this study, we took a number of precautionary steps at both the data collection and analysis phases of the project. We first sequenced both pools twice in order to increase the detection power of rare alleles while securing a lower false positive rate (Bansal, 2010). Since the distribution of short-read coverage could be a major source of error in estimation of allele frequency and population divergence (Futschik & Schlotterer, 2010;Lynch et al., 2014), we then applied a two-step down-sampling  FST~0.12 in comparisons between populations of 66°N and populations of 46-47°N (Heuertz et al., 2006)) as the ones obtained in this study (π=0.005, D=-1.0, and FST=0.047). The smaller nucleotide diversity and larger FST estimates obtained in Heuertz et al. (2006) could simply be due to the difference in sample sizes and choice of genes since the present study and studies that are of more similar sizes (Chen et al., 2012a;Larsson et al., 2013) gave highly similar results.

Population divergence history
According to the analysis of the joint frequency spectrum of synonymous sites the two domains diverged around 5 Mya and gene flow between them has been extensive and mostly in a north to south direction. This divergence estimate is much larger than the one proposed earlier on by Lagercrantz and Ryman (Lagercrantz & Ryman, 1990) who, based on Nei's genetic distance calculated from 22 allozyme loci, obtained an approximate divergence time of 40,000 years.
Our estimate of 5 Mya, however, is of the same order of magnitude as the 6 million years obtained by Lockwood et al. (2013) from plastid (matK and trnk and rbcL), mitochondrial (nad1 and nad5) and nuclear sequences (4CL). Given that the latter primarily reflects divergence at cytoplasmic loci one would expect to obtain shorter estimates of divergence since the effective population sizes are twice as large for nuclear as for cytoplasmic markers in monoecious species. On the other hand, neither Lockwood et al. (2013) nor Lagercrantz and Ryman (1990) considered migration when estimating divergence time and hence their estimates and the one obtained here are difficult to compare. Finally, one major caveat of the present estimate is the fact that synonymous sites come from genes that might have been under selection and thereby could have been affected by nearby selection on non-synonymous sites. Although, we could not estimate linkage disequilibrium in these pooled data, a rapid decay of LD within genes has generally been found in previous studies (Heuertz et al., 2006;Larsson et al., 2013) so polymorphism at the silent sites is unlikely to have been strongly affected by selection across the 88 genes. Also, when using Bayescan a single synonymous SNP was detected. So, altogether, linked selection is unlikely to have had a major impact on demographic inferences.

Evidence of selection
Previous studies of genetic differentiation have either focused on quantifying population divergence and past demography from variation at neutral sites or on detecting genomic region under diversifying selection by using genome scans.
Despite overwhelming evidence of its importance in evolution the effect of purifying selection on population divergence has seldom been considered explicitly. Recently Jackson et al. (2014)  for example loci underlying a quantitative trait controlled by a large number of loci, each of which is weakly affected by selection (Latta, 1998;Kremer and LeCorre, 2012), gene flow will keep introducing new variants and prevent the locus to go to fixation. In such case AFD would be high but neither alleles will be fixed (lower-right or upper-left areas of Fig 5). These loci will be identified by the divergence distance method as putatively under selection, but not by Bayescan, which is more stringent. Additionally for pooled samples, to be able to select Bayescan outliers large sample sizes are required as the outliers usually have a frequency close to fixation or loss in one of the populations (Lynch et al., 2014).

Many outliers of our divergence analysis have intermediate frequency and thus
can lower the requirement for the number of individuals. Finally, one caveat is that, as for many outlier methods, the present method can lead to false positives and false negatives if the wrong demographic model is used it the analysis..
The two most interesting loci detected in the present study, are the circadian clock genes PaGI (Gigantea) and PaPRR3, since they were also identified as top candidate genes for local adaptation in previous studies. Gigantea homologs consistently show evidence of non-neutral evolution, both in conifers (Chen et al., 2012a;Chen et al., 2014;Holliday et al., 2010) and in angiosperms (Keller et al., 2012;Olson et al., 2013). Its pattern of polymorphism within spruce species is intriguing, but has been hard to analyze because of the lack of silent polymorphism in the parts of the gene that were sequenced so far. Repeated selective sweeps have been proposed for Populus balsamifera where a similar pattern of polymorphism is observed (Keller et al., 2012) but this hypothesis remains to be confirmed. PaPRR3 has not been included as often in candidate gene panels and less is known about it. In Norway spruce PaPRR3 departed from neutrality when departure from neutrality was tested from a range-wide sample (Källman et al., 2014), but PaPRR3 did not exhibit latitudinal cline in allele frequency as one might have expected from a circadian clock gene if it were directly involved in adaptation to local conditions (Chen et al., 2012a). Finally, it is also worth mentioning AP2L3 as a gene of special interest. While genetic variation in this gene has not been studied as extensively as that of GI, PRR3 or FTL2, AP2L3 exhibits an intriguing pattern in the present study and its involvement in development in Norway spruce (Nilsson et al., 2007) makes it an interesting candidate gene.
In conclusion, this study shows that combining PCR with a pooled sequencing strategy is a cost-effective and reliable approach for generating multilocus population genetic data sets in non-model organisms. The inferred pattern of diversity and population divergence clearly shows that the Alpine and Fennoscandian domains of Norway spruce remain genetically closely related although they diverged 5 Mya. This is probably due to pollen flow as the two domains are clearly differentiated for mitochondrial DNA, which is maternally inherited (Lockwood et al., 2013;Tsuda, et al., in press). In addition we detected a set of genes deviating from neutral expectations. Together with earlier studies of candidate genes for adaptation to local light conditions, both PaPRR3 and PaGI are emerging as central genes that would certainly deserve further attention. In particular it would be very informative to re-sequence a larger part of the PaGI gene in a larger sample in order to test for selection, and to conduct more extensive functional studies in both genes.      a: Mean and 95% confidence intervals are shown Table 3 Posterior distribution modes and 95% confidence intervals of demographic parameters estimated with fastsimcoal2.

Tables
See text for definition of the parameters. The models are described in Fig. 2