Genome-Wide Linkage Disequilibrium in Nine-Spined Stickleback Populations

Variation in the extent and magnitude of genome-wide linkage disequilibrium (LD) among populations residing in different habitats has seldom been studied in wild vertebrates. We used a total of 109 microsatellite markers to quantify the level and patterns of genome-wide LD in 13 Fennoscandian nine-spined stickleback (Pungitius pungitius) populations from four (viz. marine, lake, pond, and river) different habitat types. In general, high magnitude (D’ > 0.5) of LD was found both in freshwater and marine populations, and the magnitude of LD was significantly greater in inland freshwater than in marine populations. Interestingly, three coastal freshwater populations located in close geographic proximity to the marine populations exhibited similar LD patterns and genetic diversity as their marine neighbors. The greater levels of LD in inland freshwater compared with marine and costal freshwater populations can be explained in terms of their contrasting demographic histories: founder events, long-term isolation, small effective sizes, and population bottlenecks are factors likely to have contributed to the high levels of LD in the inland freshwater populations. In general, these findings shed new light on the patterns and extent of variation in genome-wide LD, as well as the ecological and evolutionary factors driving them.

During the processes of population differentiation and local adaptation, evolutionary forces of selection, drift, gene flow, and mutation jointly influence the structure and patterning of genetic variation in the genome. Ultimately, this influences the extent and strength of associations among different parts of the genome. Such genetic associations are reflected in nonrandom coinheritance of alleles at different loci, a phenomenon known as linkage disequilibrium (LD; Lewontin and Kojima 1960). Interest toward LD recently has been fueled by its fundamental role in determining the required marker density and feasibility of gene mapping approaches (Jorde 2000;Zondervan and Cardon 2004). Knowledge about the extent and magnitude of LD also has the potential to provide valuable insights into an organism's evolutionary past (Nordborg and Tavaré 2002;Slatkin 2008). For instance, the degree and extent of genome-wide LD can help to identify population substructuring and demographic events such as bottlenecks and admixture (e.g., Nei and Li 1973;Golding and Strobeck 1980). Similarly, patterns of local LD can help to uncover the history of mutation, gene conversion, and selection (e.g., Karlin and Feldman 1970;Frisse et al. 2001). In this perspective, studies of LD also can be viewed as bridging evolutionary biology to genomics.
During the past few years, molecular markers across the whole genome have become available in many species, facilitating progress in quantifying the magnitude and patterns of genome-wide LD, for example in human (e.g., Reich et al. 2001;Shifman et al. 2003), livestock (e.g., Corbin et al. 2010;Badke et al. 2012;García-Gámez et al. 2012;Espigolan et al. 2013), crop (e.g., Hao et al. 2011;Van Inghelandt et al. 2011;Delourme et al. 2013;Fang et al. 2013), and model species (e.g., Mukai et al. 1971;Branca et al. 2011). However, the information about genome-wide LD in wild vertebrate populations remains limited to a few studies of mammals (e.g., Hernandez et al. 2007;Laurie et al. 2007), birds (e.g., Backström et al. 2006;Li and Merilä 2010;Kawakami et al. 2014), and fishes (e.g., Hohenlohe et al. 2012). Yet, studies of LD in the wild are important, because they can address biological questions that are not approachable by use of laboratory or domestic populations. These include, for instance, mapping quantitative trait loci (QTL) or candidate genes (e.g., Slate 2005Slate , 2013Laurie et al. 2007;Ellegren and Sheldon 2008;Gratten et al. 2008;Slate et al. 2009), and disclosing the relative contributions of different factors like natural selection and demography shaping organism's genome (e.g., Cutter 2006). Furthermore, knowledge about interpopulation and interhabitat variation in genomic LD can be helpful in advancing our understanding of evolutionary processes in nature (Gould and Johnston 1972;Roesti et al. 2013). Several earlier studies have described differences in the degree and extent of LD among populations of humans (e.g., Service et al. 2006), domestic animals (e.g., Sutter et al. 2004;Badke et al. 2012), and cultivated plants (Hao et al. 2011;Fang et al. 2013). However, interpopulation comparisons of LD in wild vertebrates are scarce (but see: Li and Merilä 2011;Miller et al. 2011;Hohenlohe et al. 2012). Hence, more empirical studies are needed to advance our understanding of variation in the extent and magnitude of LD in the wild.
The nine-spined stickleback (Pungitius pungitius) is a small coldwater adapted fish with a circumpolar distribution in the northern hemisphere (Wootton 1976). Fennoscandian nine-spined stickleback populations have been derived from a common ancestral population and became established after the last glacial maximum (Shikano et al. 2010a;Teacher et al. 2011). They occur in both freshwater and marine habitats along the coastal areas of the White Sea and the Baltic Sea (Shikano et al. 2010a;Defaveri et al. 2012). Due to differing selection pressures among habitats, the species has undergone marked adaptive differentiation and, thus, shows pronounced morphological, physiological, and behavioral differentiation across habitat types (Merilä 2013). For instance, freshwater populations display reduced body armor (e.g., Herczeg et al. 2010;Shikano et al. 2013), gigantism (e.g., Herczeg et al. 2009), increased aggression (e.g., Herczeg and Välimäki 2011), and divergent brain architecture (e.g., Gonda et al. 2012) compared with marine populations. Earlier population genetic and phylogeographic studies (Shikano et al. 2010a;Teacher et al. 2011;Bruneaux et al. 2013) also suggest that postglacial recolonization and associated founder events have strongly affected the genetic variability and structure of current populations. Despite this progress in understanding local adaptation and differentiation among nine-spined stickleback populations (see also : Karhunen et al. 2014), possible differences in the extent and levels of genome-wide LD among populations and habitat types remain unknown.
The main aim of this study was to quantify and compare the patterns and extent of genome-wide LD in nine-spined stickleback populations from different habitats (viz. marine, river, lake, and pond). To this end, we used genotypic data on 109 microsatellite loci from 13 different nine-spined stickleback populations. Because isolated freshwater populations have very low levels of genetic variability (Shikano et al. 2010a;Bruneaux et al. 2013) and thus, are likely to have smaller effective population sizes and be more susceptible to stochastic demographic events than open and more genetically variable marine populations, we expected to find greater levels of genomic LD in freshwater compared with marine populations.

Study populations and samples
A total of 312 nine-spined stickleback individuals (24 per population) from three marine and 10 freshwater populations were included in the analyses. The sampling sites covered a large part of the Fennoscandian area and encompassed a diverse array of habitats (viz. marine, river, lake, and pond populations; Figure 1 and Table 1). Marine fish were collected from the White Sea (Lev) and the Baltic Sea (Sbol and Hel),  Table 1. The letter in brackets stands for habitat type (M = marine; R = river; L = lake; P = pond). Asterisks indicate coastal freshwater populations.

Molecular analyses
Total genomic DNA for the samples was extracted from fin clips using the phenol-chloroform method (Taggart et al. 1992) following proteinase K digestion. The same panel of 112 microsatellites as used by Shikano et al. (2010b) was used in all analyses. The genotyping data of the microsatellite markers for eight populations (Lev, Sbol, Hel, Mat, L1, Kro, Rbol, and Pyo) were taken from Shikano et al. (2010b,c), whereas the data for other five populations (Rah, Por, Ska, Ryt, and Byn) were produced in the present study (Supporting Information, File S1). Polymerase chain reactions (PCRs) were carried out using the QIAGEN multiplex PCR Kit (QIAGEN) in a reaction volume of 10 mL containing 1· QIAGEN multiplex PCR Master Mix, 0.5· Q-Solution, 2 pmol of each primer, and 10-20 ng of genomic DNA. The PCR amplifications were performed using the following cycle: initial activation at 95°for 15 min, followed by 30 s at 94°, 90 s at 53 or 55°, and 60 s at 72°for 30 cycles, ending with a final extension at 60°for 5 min. PCR products were resolved on a MegaBACE 1000 automated sequencer (Amersham Biosciences), and their sizes were determined with ET-ROX 550 size standard (Amersham Biosciences). Alleles were scored using FRAGMENT PROFILER 1.2 (Amersham Biosciences) with visual inspection and manual corrections.

Population genetic analyses
Within-population observed heterozygosities (H O ), expected heterozygosities (H E ), inbreeding coefficient (F IS ), and allele frequencies were calculated with FSTAT v2.9.3.2 (Goudet 2002). The proportion of rare alleles (allele frequency ,5%) in each population was estimated using Microsoft Excel. Measures of allelic richness and private allelic richness for each population were calculated using HP-RARE (Kalinowski 2005), accounting for rarefaction.
Three approaches were used to investigate population genetic structure. First, pairwise F ST among populations was calculated using GENETIX v4.03 (Belkhir et al. 2004), and the significance of F ST values was evaluated via 10,000 permutations. Second, principal component analysis was performed at the individual level using the program GenAlex 6.501 Smouse 2006, 2012). Third, to assess the relative contributions of potential factors to population differentiation, a hierarchical analysis of molecular variance was performed using the program Arlequin v3.5 (Excoffier and Lischer 2010), based on three different grouping patterns of populations: habitat type I (marine, lake, pond, and river), habitat type II (marine and freshwater) and geographic proximity (Hel and Mat; Sbol and Kro; Ska and Byn; Por, Pyo, and Ryt; Rbol and Lev; L1; and Rah; see Figure 1). Statistical significance was assessed with 10,000 permutations. As population substructure tends to inflate LD (Nei and Li 1973;Pritchard and Przeworski 2001), we performed Bayesian clustering analyses in STRUCTURE v2.3.4 (Pritchard et al. 2000) to examine whether the observed high levels of LD (see the section Results) were due to within-population substructuring. We conducted three independent runs for each K-value ranging from 1 to 20. The admixture model and correlated allele frequencies model (Falush et al. 2003;Excoffier et al. 2005) were used, with 500,000 iterations after a 100,000 burn-in for each run. Also hidden family structure could amplify LD, and thus, we used Queller and Goodnight's method (Queller and Goodnight 1989) implemented in program IDENTIX v1.1.5 (Belkhir et al. 2002) to estimate pairwise relatedness coefficient between individuals within each population.
Signatures of genetic bottlenecks were tested for each population using two methods. First, we used the heterozygosity excess method (Luikart et al. 1998) as implemented in the program Bottleneck v1.2.02 (Piry et al. 1999) to test for recent reductions in population size. We ran the program under the two-phased mutation model (TPM) with 90% single-step mutations. Statistical significance of the results was evaluated by 1000 iterations with a one-tailed Wilcoxon signed-rank test. Second, we used the M-ratio method (Garza and Williamson 2001) to detect historical population contractions (Garza and Williamson 2001;Williamson-Natesan 2005). Population-specific values of M (the number of alleles / the allele size range) and M c (the critical value of M) were estimated using the programs M_P_VAL and CRITICAL_M (Garza and Williamson 2001), respectively. For each run, the simulations consisted of 10,000 iterations with the average mutation rate (m) of 1.5 · 10 24 per generation (Shimoda et al. 1999), a TPM with 10% multistate change and 3.5 base steps for the mean size of multistep mutations (Garza and Williamson 2001). We tested three conservative values of theta (u = 4N e m) that equate to a prebottleneck effective population size (N e ) of 1000, 5000, and 10,000 for the three marine and three coastal freshwater populations, and a prebottleneck n N e of 100, 500, and 1000 for the seven inland freshwater populations. The observed value of M was compared with the corresponding M c , and a lower value of M relative to M c indicated a historical population bottleneck (Garza and Williamson 2001).

Linkage map and haplotype phasing
Since nine-spined and three-spined sticklebacks (Gasterosteus aculeatus) have the same number (n = 21) of chromosomes (Chen and Reisman 1970) and syntenic locations of microsatellite loci are conserved between these two closely related species (Shapiro et al. 2009;Shikano et al. 2010cShikano et al. , 2013, we built the genomic distance-based (Mb) linkage map for the nine-spined stickleback through its homology with the three-spined genome assembly (http://www.ensembl.org/Gasterosteus_ aculeatus/index.html). BLAST searches were performed to locate the 112 nine-spined stickleback microsatellite markers in the threespined stickleback genome using the BLASTN tool in the Ensembl database. Initial searches were performed with the default conditions, and a locus was assigned to a genomic location if it provided a unique hit at E # 1e 210 . When a locus provided multiple matches at E # 1e 210 , it was unassigned unless the best hit had an E value at least 10 decimal places lower than the next best one. For ease of comparison, we numbered linkage groups (LGs) for the nine-spined stickleback linkage map in accordance with the syntenic LGs in the three-spined stickleback ( Figure 2). The gametic phase of haplotypes and missing genotypes were inferred from genotype data for each LG in each population and habitat type using a Bayesian statistical method as implemented in PHASE v2.1 (Stephens et al. 2001;Stephens and Scheet 2005). In each run, we chose the original model defined in Stephens et al. (2001), and set the number of iterations to 1000, thinning interval to 1 and a burn-in to 100. Ten independent runs were performed with different seeds to check for consistency between the results. We considered the PHASE results to be consistent when no less than eight runs gave the same inferred haplotypes, and in such case the consistent haplotypes were used in the subsequent calculations; otherwise, the haplotypes from the run with the highest average value for the goodness of fit statistics were used for the subsequent analyses (Stephens et al. 2001).

LD analyses
Two different gametic LD measures, multiallelic D' and r 2 , were used. The two LD estimates were derived from the standard measure of LD between two alleles at two different loci: Multiallelic D' was estimated as (Lewontin 1964;Hedrick 1987): where k and l were the number of alleles for markers A and B, respectively, and Multiallelic r 2 was estimated as (Hill and Robertson 1968): We computed D' and r 2 for all pairwise syntenic markers in each population and habitat type using the program PowerMarker v3.25 (Liu and Muse 2005). Pearson's and Kendall's correlation tests were performed to investigate the correlation between D' and r 2 values within population or habitat. Because the measure D' commonly has been used in studies of wild vertebrates (e.g., Backström et al. 2006;Hohenlohe et al. 2012) and has more power to detect LD (Devlin and Risch 1995), it was used in the following analyses to facilitate comparison of our results with those of other studies. Logarithmic regression plots of D' values of all syntenic pairwise markers against genomic distances (Mb) in each population and habitat type were generated in Microsoft Excel. The half-length of LD (Reich et al. 2001), i.e., the distance at which it falls to 0.5, was evaluated.
Mann-Whitney U-tests (Mann and Whitney 1947) were used to assess the statistical significance of differences in D' values between habitat types. Kruskal-Wallis tests (Kruskal and Wallis 1952) were used to assess the significance of differences in D' values across all of the populations or among populations within the same habitat type. Partly different polymorphic markers were involved in different population-specific LD analyses (Table 1), hence the variation in marker distance between populations could potentially influence statistical significance tests of D' values. In order to control for this, we used analysis of covariance (ANCOVA) in which population and habitat were treated as random and fixed factors, respectively, and associated D' values were regarded as dependent variables, with physical distance between markers as a covariate. Furthermore, there were differences in marker density in different LGs. In order to corroborate the LD patterns observed in the genome-wide analyses, we examined LD patterns in four LGs with the greatest marker densities (i.e., LGs 9, 11, 19, and 21; Figure 2) for each population. All statistical tests were conducted in SPSS 16.0 (SPSS Inc, Chicago, IL), and Bonferroni corrections (Rice 1989) were applied to adjust significance levels when multiple testing was involved.
To examine whether observed high levels of LD could be an artifact due to haplotype phasing, we also estimated the composite LD measure (Weir 1996) based on unphased genotypic data using the method described in Zaykin et al. (2008). In addition, to examine the effect of rare alleles (allele frequency ,5%) on the levels of LD, we recalculated both haplotypic and composite LD measures with rare alleles excluded.

Population genetic analyses
One-hundred nine microsatellite markers were successfully mapped to the three-spined stickleback genome. The basic indices of withinpopulation genetic variability are given in Table 1. The number of polymorphic loci ranged from 33 (in Pyo) to 106 (in Hel) depending on the population. Allelic richness and expected heterozygosities (H E ) estimated across all loci ranged from 1.50 (in Pyo) to 7.02 (in Lev), and from 0.085 (in Pyo) to 0.574 (in Kro), respectively (Table 1). Private allelic richness for each population ranged from 0.03 (in Pyo) to 0.72 (in Hel; Table 1). The marine (Hel, Sbol, and Lev) and coastal freshwater populations (Mat, Kro, and Rbol) had much greater genetic diversities (H E = 0.530-0.574; Table 1) than the seven inland freshwater populations (Ska, Byn, Por, Pyo, Ryt, L1, and Rah; H E = 0.085-0.368; Table 1). F IS values and their 95% confidence intervals did not deviate significantly from zero in any of the populations (Table 1). A high proportion of rare alleles was observed within populations, ranging from 0.15 in Pyo to 0.53 in Hel (Table S1).
The extent of population differentiation as measured by F ST among population pairs varied greatly (F ST = 0.00320.724), most of which were significant (52/78, P , 0.05/78 = 0.000641; Table S2). In general, F ST values between inland freshwater populations were always greater than those between marine or coastal freshwater populations (Table S2). Principal component analysis revealed that the first and second axes accounted for 13.7% and 10.1% of variation in allele frequencies, respectively ( Figure S1). The individuals from the inland freshwater populations clustered more tightly than those from the coastal freshwater and marine populations ( Figure S1). Analysis of molecular variance analyses suggested that 7.4% of the total genetic variation was explained by geographic proximity (P , 0.001), whereas the factors of habitat type (marine vs. lake vs. pond vs. river, 21.9%, P . 0.05; marine vs. freshwater, 22.1%, P . 0.05; see Table 2) did not contribute to the patterns of genetic differentiation. Based on the value of DK (Evanno et al. 2005), STRUCTURE analyses indicated that the most probable K was nine ( Figure S2). No substructure was found within any of the populations at both the optimal K value (i.e., 9) and the maximum tested K value (i.e., 20; Figure S2). Thus, population substructuring was unlikely to account for the observed high levels of LD. The estimated pairwise relatedness coefficients were generally small (e.g., , 0.2) for 12 populations except Pyo (File S2), suggesting that most individuals should be unrelated; hence, family structure was not an explanation for the high LD values.
A signal of recent population bottleneck was detected in only one population (L1; P = 0.03) under the TPM using the heterozygosity excess method. However, all populations except Pyo showed strong evidence for historical population bottlenecks using the M-ratio method, despite the differences in pre-bottleneck N e (Table S3). Observed population-specific M-ratio values ranged from 0.670 to 0.898, and most (12/13, except Pyo) were lower than the corresponding M c values (Table S3). It was unexpected that no bottleneck was detected in Pyo because this population had the lowest genetic diversity of all populations in this study (Table 1). However, this could be due to a small number of polymorphic markers (n = 33; Table 1) segregating in the population. n

Linkage map
Based on homologous positions in the three-spined stickleback genome, the 109 mapped microsatellites defined a total number of 20 LGs of the nine-spined stickleback (Figure 2). Two to 13markers were mapped to each of the LGs, but none of the markers mapped to LG6 of the three-spined stickleback (Figure 2). Based on the threespined stickleback genome assembly, the average interval between adjacent markers was 2.738 Mb, with the smallest spacing of 0.001 Mb and the largest of 11.496 Mb. The median distance between adjacent markers was 2.004 Mb. With regard to different LGs, the average inter-marker distance ranged from 1.19 Mb in LG11 to 6.227 Mb in LG5. Inferred haplotypes from the program PHASE were largely consistent across the ten replicate runs, and approximately 90% of the total number of loci had phase probabilities of more than 0.8, indicating that the results were reliable.
Comparison of the patterns of LD decay as a function of genomic distance revealed very weak and statistically nonsignificant (R 2 , 0.01, P . 0.05; Table S6 and Figure 3) correlations between D' and genomic distance. With regard to LD decay in different habitats, the dataset of all marine populations combined or all freshwater populations combined showed higher correlations and shorter LD half-length compared with the combined lake or pond datasets ( Figure 4 and Table S6). Interestingly, we found that the three coastal freshwater populations (Mat, Kro, Rbol; Figure 3B), which were geographically close to the marine populations (Hel, Sbol, Lev; Figure 3A), exhibited similar LD patterns as their marine neighbors, but deviated from the typical LD pattern in the inland freshwater populations ( Figure 3C and Table 3). In addition, LD values increased slightly with genomic distance in three inland freshwater populations (Ska, Byn, Pyo; Figure 3D and Table S6), and the level of LD in Por was independent of genomic n Table 3 Linkage disequilibrium estimate (D') and associated estimation error for syntenic markers in 13 nine-spined stickleback populations and five habitat types (marine, lake, pond, river, and coastal freshwater) using 109 microsatellite markers  Table 1. The value in the brackets is the estimation error associated to the mean D' value, obtained by dividing the SD of D' value by the square root of the number of marker pairs used to measure LD in each distance bin (Table S7). a D9 value is directly obtained from the averaged D9 value of relevant populations. b D9 value is calculated from the combined original haplotype data of relevant populations. distance ( Figure 3D and Table S6). This finding could be ascribable to stochasticity caused by the small number of marker pairs used to measure LD in each distance bin in these highly homozygous populations (Table S7).
The composite D' and r 2 values were relatively high (Table S8) and comparable with the levels of haplotypic LD values (Table 3 and Table S4), indicating that observed high levels of LD were unlikely to be explainable as an effect of haplotype phasing. When the rare alleles  were excluded, both haplotypic and composite D' values were smaller, but the overall syntenic D' value was still above 0.4 in almost all the populations (Table S8). On the contrary, both haplotypic and composite r 2 values became larger without the rare alleles (Table S8). Notably, irrespective of whether inferred haplotypic data or unphased genotypic data were used and whether the rare alleles were involved in the analyses or not, the findings about the LD patterns among habitat types (i.e., Pond . Lake . Marine; Coastal freshwater is similar to Marine) based on combined data remained largely unchanged (Table 3,  Table S4, and Table S8).

DISCUSSION
In general, low-to-moderate genetic diversity, strong genetic differentiation, and high levels of genome-wide LD were observed in Fennoscandian nine-spined stickleback populations. The extent and patterns of LD varied among populations and habitat types. Isolated and small freshwater populations tended to have greater LD compared with open marine populations. In the following, we will discuss these findings and their implications to our understanding of the factors influencing levels and extent of genomic LD in the wild.
Several recent studies have focused on fine-scale LD in commercially important fishes (e.g., Hayes et al. 2006), whereas genome-wide levels of LD in wild fish populations remain largely unexplored, with few exceptions (e.g., Hohenlohe et al. 2012;Roesti et al. 2013). We found high levels of LD in the studied nine-spined stickleback populations, and in this respect the results are comparable with those from the closely related three-spined stickleback (Mattern 2004;Mattern and Mclennan 2004), in which high magnitudes of LD were observed in both freshwater and marine populations (Hohenlohe et al. 2012). The high degree of LD in nine-spine sticklebacks did not come as a surprise in the view that earlier population genetic studies of this species (Shikano et al. 2010a;Teacher et al. 2011;Bruneaux et al. 2013) have suggested limited gene flow and low effective population sizes, both of which are factors expected to amplify genetic drift and thus the accumulation of LD (Service et al. 2006;Slatkin 2008;Charlesworth 2009). Likewise, demographic events such as founder effects and population bottlenecks can create high LD (e.g., Nei and Li 1973;Zhang et al. 2004). In our case, the evidence for genetic bottlenecks in 12 of the 13 populations using M-ratio tests indicated that historical bottlenecks most probably have contributed to the high magnitude of genome-wide LD. Given that the stickleback populations studied here have been colonized after the last glacial maximum (,10,000 years ago), founder effects associated with postglacial recolonization also may account for the high LD. It should be noted that we have not taken recombination into account in our LD estimation due to its heterogeneity across the genome. Nevertheless, this should not affect the observed habitat or population differences in LD if the recombination hotspots are congruent in different populations, as has been reported for human populations (Conrad et al. 2006). One should also note that marker type can influence observed levels and extent of LD. For instance, microsatellite markers have more alleles per locus than SNP markers, and hence, they generally show higher levels of LD than SNPs (Chapman and Wijsman 1998). Consequently, the strong LD found here could partly be attributed to the high information content of microsatellites (Pritchard and Przeworski 2001). However, it is unlikely that this would be the sole explanation for the high levels of LD in nine-spined sticklebacks, especially in the view that this explanation cannot account for observed habitat or population differences in levels of LD. Other factors such as gene conversion, inversions and chromosome rearrangement could also have influenced the levels of LD in nine-spined sticklebacks, but the role of these factors remains to be investigated in future studies.
Despite the generally high magnitude of LD within populations, we also found significant differences in the levels and extent of LD between habitat types. The greatest levels of LD were observed in the seven inland freshwater populations, which was not unexpected as these are all population isolates that have been subject to substantial genetic drift due to initial founder effects, subsequent isolation and small effective population sizes. This drift has also led to reduced allelic diversity as reflected by low heterozygosities, low allelic richness, and overrepresentation of monomorphic microsatellite loci and rare alleles in these populations. This finding aligns well with those of earlier studies, which have shown that population isolates typically are characterized by low levels of genetic variation and high levels of LD (e.g., Arcos-Burgos and Muenke 2002; Li and Merilä 2010). Interestingly, the patterns of LD and genetic variation in the three coastal freshwater populations were similar to those in the adjacent marine populations. Similar observations also were reported in an earlier study of Swedish nine-spined sticklebacks, which showed little genetic and morphological differentiation between marine and coastal lake populations in the Baltic Sea region (Herczeg et al. 2009;Mobley et al. 2011). One plausible explanation for these observations is that the coastal freshwater populations are influenced by admixture/gene flow from adjacent marine populations, or that they have only recently become isolated from the marine populations (Herczeg et al. 2009;Mobley et al. 2011).
Different metrics have been developed to measure the degree of LD, and we employed both D' and r 2 estimators in this study. We found that the former yielded consistently higher values than the latter; such differences have also been reported in previous LD studies (e.g., Shifman et al. 2003;García-Gámez et al. 2012;Espigolan et al. 2013). Several possible underlying factors could account for such differences, including large allele frequency differences between markers (e.g., Ardlie et al. 2002;Wray et al. 2011) as was observed in this study (File S1). Likewise, the high proportion of rare alleles (allele frequency ,5%; Table S1) and consequent loss of haplotypes in the populations may also yield high D' values yet low r 2 values (Slatkin 2008;Purcell et al. 2009). Despite this discrepancy in absolute values of D' and r 2 , the two estimators were positively correlated in our data (Table S5), and gave consistent LD patterns in inter-habitat comparisons (Table 3 and Table S4). Thus, conclusions drawn from D' values are qualitatively similar to those obtained using r 2 values in respect to patterns of LD across habitat types.
Rare alleles (allele frequency ,5%) tend to elevate D' values (Teare et al. 2002); hence, they have often been eliminated from LD analyses. In our study, rare alleles were frequent in many populations, and this partly explains the high D' values in this study. We believe that the inclusion of rare alleles in our LD analyses was reasonable on the following grounds: First, the overall syntenic D' values remained relatively high (.0.4) in all of the 13 populations when the rare alleles were excluded. The differences in LD among habitat types (i.e., Pond . Lake . Marine) remained unchanged even if the rare alleles were excluded. Second, rare variants can convey important information in genome-wide genetic studies (Dickson et al. 2010). Thus, given that the high proportion of rare alleles is an inherent characteristic of the nine-spined stickleback populations investigated here, ignoring them might bias the results. Third, given the demographic history of these populations, a high frequency of rare alleles is to be expected. Population genetics theory suggests that rare variants are likely to be recently derived alleles (Watterson and Guess 1977), and a large number of rare variants could derive from recent population expansions (Pritchard 2001;Gorlov et al. 2008). As for Fennoscandian ninespined sticklebacks, earlier studies (Shikano et al. 2010a;Teacher et al. 2011;Bruneaux et al. 2013) indicated that populations inhabiting this region derived from ancestors in refugia from which the recolonization occurred approximately 10,000 years ago. Population expansions are very likely to have been involved in this re-establishment process, and thus, result in the large number of rare alleles in marine and coastal freshwater populations observed here. Previous studies have also indicated that inland freshwater populations have been established from marine populations recurrently (Teacher et al. 2011;Bruneaux et al. 2013). This finding, coupled with the fact that much genetic variation including rare alleles has been lost due to drift in inland isolates may explain why fewer rare alleles were observed in inland as compared to marine populations. In fact, within the same geographic region, an excess of rare alleles have also been observed in human (Reich et al. 2001) and Norway spruce (Picea abies) populations (Larsson et al. 2013).
Our findings of genomic LD and genetic variability have several important implications for gene mapping studies in nine-spined sticklebacks. First, given the high level of LD, a relatively small number of markers are required to cover a relatively large genomic region in QTLmapping studies. Second, given the previous consideration, the mapping resolution will be relatively low because large genomic regions are likely to be inherited as linked clusters. Third, given the high frequency of rare alleles, nine-spined stickleback populations might prove to be suitable for rare variant mapping of complex traits. Nevertheless, although this study provides some preliminary insight on variation in LD across the ninespined stickleback genome, one should bear in mind that the relatively low number of markers and their non-uniform distribution over the LGs and populations limit the inferences. Further exploration based on a larger number of markers, together with a high-density linkage map would pave the road for more refined inferences.
To sum up, the results provide the first investigation of genomewide LD patterns in the nine-spined stickleback, and also one of the most extensive studies exploring patterns of habitat related variation in LD in wild vertebrates. In general, high levels of LD were observed in most of the analyzed populations, and more interestingly, higher levels of LD were detected in inland freshwater than in costal populations. This habitat patterning in the levels of LD matches what we discovered-and what has been known from earlier studiesabout habitat-specific differences in demographic history and effective population size in these populations. The levels of LD uncovered in present study also suggest that studies seeking to disclose the genetic basis of phenotypic traits using QTL-mapping approaches may face challenges, especially in inland freshwater populations which are low in genetic variability and exhibit high levels of LD: the few polymorphic markers segregating in those populations are likely to be associated for long stretches of linked genes.