Tandem-Repeat Patterns and Mutation Rates in Microsatellites of the Nematode Model Organism Pristionchus pacificus

Modern evolutionary biology requires integrative approaches that combine life history, population structure, ecology, and development. The nematode Pristionchus pacificus has been established as a model system in which these aspects can be studied in one organism. P. pacificus has well-developed genetic, genomic, and transgenic tools and its ecologic association with scarab beetles is well described. A recent study provided first mutation rate estimates based on mitochondrial genome sequencing and mutation accumulation line experiments that help resolve rather ancient evolutionary branches. Here, we analyzed the tandem-repeat pattern and studied spontaneous mutation rates for microsatellite markers by using the previously generated mutation accumulation lines. We found that 0.59%–3.83% of the genome is composed of short tandem repeats. We developed 41 microsatellite markers, randomly chosen throughout the genome and analyzed them in 82 mutation accumulation lines after 142 generations. A total of 31 mutations were identified in these lines. There was a strong correlation between allele size and mutation rate in P. pacificus, similar to Caenorhabditis elegans. In contrast to C. elegans, however, there is no evidence for a bias toward multistep mutations. The mutation spectrum of microsatellite loci in P. pacificus shows more insertions than deletions, indicating a tendency toward lengthening, a process that might have contributed to the increase in genome size. The mutation rates obtained for individual microsatellite markers provide guidelines for divergence time estimates that can be applied in P. pacificus next-generation sequencing approaches of wild isolates.

The nematode Pristionchus pacificus is a model organism increasingly used for integrative approaches in evolution biology, through interdisciplinary studies in evo-devo, population genetics, and ecology (Hong and Sommer 2006;Sommer 2009). P. pacificus has a generation time of 4 days in standard laboratory cultures (Sommer et al. 1996), well-developed tools for forward and reverse genetic analysis, DNAmediated transformation (Schlager et al. 2009), and a fully sequenced genome (Dieterich et al. 2008). Pristionchus nematodes are unique among model organisms in their well-described necromenic association with scarab beetles, e.g., P. pacificus has been found on Exomala orientalis in Japan and on Oryctes borbonicus on La Réunion Island in the Indian Ocean (Herrmann et al. , 2010; Morgan et al. 2012). The Pristionchus2beetle association represents a robust platform for the isolation and characterization of new Pristionchus isolates on a global scale. Close to 30 Pristionchus species and more than 400 P. pacificus strains have been isolated between 2004 and 2011 in worldwide samplings, and a molecular phylogenetic framework has been generated (Mayer et al. , 2009; Morgan et al. 2012).
Estimations of the rate and pattern of microsatellite mutations are usually indirect, based on allele frequency distributions (Chakraborty et al. 1997;Primmer and Ellegren 1998) or phylogenetic analyses (Jin et al. 1996;Dettman and Taylor 2004). The first studies aiming to understand the mutation mechanisms of microsatellites have been made possible by direct observations of mutations based on artificial constructs with expressed microsatellite sequences within bacterial and fungal systems (Levinson and Gutman 1987;Strand et al. 1993). Direct estimates of the microsatellite mutation rates are also derived from MA line experiments conducted in Drosophila melanogaster (Schug et al. 1997), Caenorhabditis elegans, Daphnia pulex (Seyfert et al. 2008), and Arabidopsis thaliana (Marriage et al. 2009). Under ideal conditions, MA line-based mutation rate estimates can be combined with genomic analysis of natural isolates of a given species and close relatives to provide robust divergence time estimates (Molnar et al. 2011).
Here, we evaluate the genomic composition of microsatellites for P. pacificus and make use of MA lines to provide robust estimates of n b Shown only if perfect loci were found with the TRF-strict method or more than 30 perfect loci were found with TRF-loose method.
the rate and size spectra of microsatellite mutations. We found that 0.59%-3.83% of the genome is composed of short tandem repeats. By analyzing 82 MA lines after 142 generations, we found a total of 31 mutations in these markers. There is a correlation between allele size and mutation rate, but no bias toward multistep mutations. We use these findings to suggest general guidelines for the selection of microsatellite markers in future genome-wide association studies of the evolutionary history of P. pacificus.

MA lines
The propagation of the MA lines has been described in detail previously (Molnar et al. 2011). To summarize, 100 MA lines were initiated from the F3 descendants of a single, inbred, wild-type P. pacificus PS312 laboratory strain. MA lines were propagated by a single, randomly chosen offspring from the middle of the reproduction period. Worms were cultured at 20°on nematode growth medium seeded with Escherichia coli as a food source, as originally described for P. pacificus (Sommer et al. 1996). Backup cultures were kept for two generations at 15°to prevent the accidental loss of lines. From the original 100 MA lines, 82 lines survived the 142 generations. Because a single homozygous progenitor started all the lines, a single random offspring began each generation and because P. pacificus is a self-fertilizing hermaphrodite, the variation between the strains is caused by the accumulation of mutations.

Microsatellite pattern
Microsatellite loci were identified in the 'Freeze 1' assembly of the P. pacificus genome (available at www.pristionchus.org) using Tandem Repeats Finder (TRF) (Benson 1999) with two different sets of parameters: threshold alignment score 20 and alignment weights {2,7,7} ({match,mismatch,indels}) for the TRF-loose method and threshold alignment score 50 and alignment weights {2,3,5} for the TRF-strict method (Leclercq et al. 2007). The values for the weights can be 3, 5, and 7, with 3 being more permissive and 7 more restrictive. Leclercq et al. have previously shown that both the weights and threshold criteria influence the composition of the microsatellites detected (see also supporting information, Figure S1 for the differential screening results from the present study). Specifically, Leclercq et al. have shown that increasing TRF alignment score allows the detection of smaller and more perfect microsatellites, whereas decreasing the TRF weights allows for longer and more imperfect microsatellites to be detected. We, therefore, have created two data sets of microsatellite loci. Both datasets include perfect and imperfect repeats. The frequency of microsatellites was calculated as (number of microsatellite loci)/(number of nucleotides in megabases), considering the assembled genome of P. pacificus of 169 Mb, and loci of at least three repeat units. For a comparison base line, we screened the C. elegans genome with the same methods as for P. pacificus.

Microsatellite markers for molecular analysis
We randomly chose 32 microsatellites and developed suitable primers (Table S1). These markers cover a range of three to 57 repeat units and have a percentage match score between 51 and 100. To ensure the evaluation of the mutation rate for the long microsatellites present in the genome, we chose nine additional markers for perfect tri-, tetra-, penta-, and hexanucleotide that had pattern repeat count greater than 30 (except the hexanucleotide locus that had the pattern repeat count of 17 but an overall length of more than 100 bp). We had 41 markers in total, represented by 19 perfect and 22 imperfect repeat loci.

DNA extraction and amplification
For each MA line, genomic DNA was prepared from two full plates of worms, using worm lysis buffer (50 mM KCl; 10 mM Tris-HCl, pH 8.3; 2.5 mM MgCl 2 ; 0.45% NP-40; 0.45% Tween-20; 5 mg/mL proteinase K). The suspension was incubated for 2 hr at 65°, followed by inactivation of the proteinase K at 95°for 10 min. All forward primers had an M13 tail (59-CACGACGTTGTAAAACGAC-39) attached at the 59 end, labeled with 6-FAM, VIC, NED, or PET (Applied Biosystems) for genotyping (Table S1). We performed individual polymerase chain reactions in 20 mL of final volume for each marker, using an annealing temperature of 55°.

Genotyping
Genotyping was performed on an ABI 3730xl using ABI Genemapper version 4.0 (Applied Biosystems) analysis software with the internal size standard GS500LIZ or GS1200LIZ. Markers M01 to M47 were multiplexed before genotyping. Markers M74 to M88 were labeled

Mutation rate estimates
The mutation rates (per allele per generation) were calculated using the formula from Seyfert et al. (2008): m = 2[ln(12n/l)]/t, where n is the number of mutations, t is the number of generations, and l is the number of lines. Note that the number of lines assessed may differ slightly between the markers.

Tandem repeat pattern in P. pacificus
We screened the genome of P. pacificus for short tandem repeats consisting of di-, tri-, tetra-, penta-, and hexanucleotides with at least three repeat units, using the Tandem Repeats Finder software (Benson 1999). Data presented below always refer to duplex DNA, even if we show only the sequence of the repeated motif on one strand for simplicity, i.e., notations like (AC) n and (AC) n :(GT) n are equivalent. The two sets of parameters, involving different thresholds and distinguishing between perfect and imperfect repeats (see Materials and Methods), yielded very different results. Specifically, in the genome of P. pacificus, the TRF-loose method counted 70,543 perfect loci, whereas the TRF-strict method identified only 730 loci. From the 730 perfect loci identified by the TRF-strict method, dinucleotide repeats are by far the most common repeat type (383/730). Of the four possible unique dimer combinations (AC, AG, AT, CG), three are present within the genome, with AG repeats representing the greatest number (265/383 loci; Table 1). In contrast, perfect CG repeats are not found in the P. pacificus genome. In comparison with the C. elegans genome, the dinucleotide repeats have a different repeat unit distribution, but the perfect CG repeats are also missing ( Figure 1A).
n A total of 133 loci were found representing the 10 possible unique trimer combinations. Trimers rich in AT are the most common, with [(A/T) 2 , X 1 ] combinations accounting for 92% of the instances found (Table 1). Again, one possible combination is not found, and this is the pure CCG repeat (Table 1, Figure 1B Table 1). The 'nematode' telomeric repeat motif, (TTAGGC) n (Niedermaier and Moritz 2000), is found to be the most abundant among the hexamer repeats (8 pure loci in TRF-strict dataset-the most in terms of number of loci per repeat type and the longest repeats per locus), but no functional conclusion can be drawn.
The TRF-loose method yielded in general the same trend for the different types of repeats, with AT-rich loci representing more than 50% of the total number of loci found for each repeat unit size. CGrich pentamer and hexamer repeats are few or missing (Table 1).

Mutation rates
We randomly selected 32 markers from the loci found with the TRFloose method (the larger dataset). To these we added nine markers selected based on the total repeat count of more than 30 to have the longest loci represented in the analysis. Among the 41 loci assayed in the 82 MA lines, we found 31 mutation events at 11 loci (Table 2). At four of these 11 loci, only a single mutation was observed, which is insufficient for deriving proper mutation frequencies. Therefore, most conclusions listed below are based on markers with more than one mutation. Several general trends can be observed in the mutation patterns. First, M17 was the only locus smaller than 17 units that contained a mutation. In general, microsatellite loci that contain more than 30 repeat units show more mutations than the shorter ones, an observation similar to the C. elegans MA line-based analysis of microsatellites mutation rates (Seyfert et al. 2008). Second, 26 of the 31 mutations involved gain or loss of a single repeat unit, with insertions outnumbering deletions (21 insertions and 5 deletions). Only five of the 31 mutations involved a change of more than one repeat unit, and only one large deletion (29 repeats at the M78) occurred at a locus with more than 30 repeat units. Third, most of the mutations were found in perfect, long microsatellite loci (29 of 31 mutations; Table 2).
Fourth, in the overall MA pattern, deletions are unable to balance out the insertions (8 vs. 23 repeat units). This observation suggests that the microsatellite loci have a tendency toward lengthening. Further analysis shows that seven of the analyzed loci (M34, M77, M79, M80, M82, M83, and M84) show accumulated growth, two (M17, M78) show accumulated decrease in length, and two (M74, M88) show no accumulated change in size. Finally, the mutation rate per locus per generation in P. pacificus ranges from 8.9 · 10 25 to 7.5 · 10 24 for those markers where mutations occurred.

DISCUSSION
This is the first analysis of mutational processes and mutation rate estimates in the nuclear genome of P. pacificus because previous knowledge is based solely on the mitochondrial genome (Molnar et al. 2011). We have analyzed the tandem repeat pattern of the P. pacificus genome and studied the spontaneous mutation rates for microsatellite markers. From the mutation patterns and mutation rates obtained for individual microsatellite markers, we provide guidelines for the properties of microsatellite markers to be useful for divergence time estimates in future genome-wide sequencing projects (to follow in this section).
P. pacificus and C. elegans belong to the same nematode clade but they are only distantly related, representing members of different nematode families (Dieterich et al. 2008). Sequence turnover over these evolutionary distances resulted in unrelated microsatellite patterns in these two species. Therefore, the microsatellites are not homologous and cannot be directly compared, which unfortunately prevents the usage of statistical methods in a meaningful manner. The average overall AT content for the P. pacificus genome is 58% (Dieterich et al. 2008); therefore, we would expect AT dinucleotide repeats to be more common, followed by AC and AG in approximate equal numbers, and fewer CG loci. This pattern is, however, not followed by the dimer repeats composition found in P. pacificus genome, arguing against the expectation of cryptic simplicity. In contrast, the tri-to hexanucleotide repeat loci do follow this expectation, with AT-rich repeats being more abundant than the others. The most striking finding, however, is the absence or near absence of the CG loci. Direct and indirect observations tend to support the stepwise mutation model at microsatellite loci (Schlötterer and Tautz 1992;Weber and Wong 1993;Thuillet et al. 2002), by which their sequence is altered by addition or deletion of one repeat at a time. An alternative is the model according to which the sequence of microsatellites can be altered by large deletions, due to secondary structures that certain types of repeats can form (Di Rienzo et al. 1994). The absence of perfect CG dimer repeats but the presence of impure and CG-rich loci might support the latter model of microsatellite evolution.
A comparison with the C. elegans microsatellite dataset reveals that P. pacificus has an overall greater frequency of perfect microsatellite loci, although the dinucleotide repeats dominate the landscape in both genomes (Figure 2), a finding that has also been made by Castagnone-Sereno et al. (2010) using different algorithms. In a second step, we evaluated the mutation rate at microsatellite loci ranging from di-to hexanucleotide repeats, randomly chosen in the noncoding genome of P. pacificus. The random choice allows us to avoid a bias by assaying only certain types of repeats. However, it does not allow us to make decisions of how the mutation rate is influenced by the repeat unit size, nucleotide composition, or the overall length of the locus. Although the loci have been chosen randomly, eight of 10 dinucleotide loci are of the type (AG) n . This correlates with the general composition of the genome of P. pacificus, which has more AG repeats than other dinucleotide repeats.
The mutation patterns observed for P. pacificus in this study support the idea that mutational processes are length dependent. Specifically, large loci have, on average, more mutations than the small ones (three mutations in a (X) 3 locus vs. seven mutations in a (X) 64 locus). A second general trend, supported by the P. pacificus data, is that imperfect loci are less prone to accumulate mutations than the perfect ones. From 22 loci with diverse percentage of match and with a wide range of repeat unit size tested, only one showed a mutation (M34; Table 2). A comparison of the mutation rates at microsatellite loci with the same range of repeat number between C. elegans (Seyfert et al. 2008) and P. pacificus shows a similar effect of repeat number on mutation rates in both organisms.
A third major trend in the P. pacificus dataset is that the mutation process is upwardly biased in that loci tend to gain units more frequently that they lose units. Similar trends have been shown previously in other organisms (Primmer et al. 1996;Ellegren 2000). At the same time, long alleles tend to contract upon mutation (Harr and Schlötterer 2000;Xu et al. 2000). The mutations observed in P. pacificus are 21 insertions and 5 deletions, showing that microsatellites in P. pacificus have a tendency toward lengthening. It is interesting to note that the largest P. pacificus microsatellites detected in the genomes assembly are substantially smaller than the largest C. elegans microsatellites, which have repeat units greater than 68 (Seyfert et al. 2008). In P. pacificus, the loci M80 and M83, both show insertions, indicating that they are still in the growth phase. We speculate that the P. pacificus microsatellites, on average, are still in the expansion phase, a process that might have substantially contributed to the increase of the P. pacificus genome size relative to C. elegans. A final aspect of our analysis is that the P. pacificus genome shows no evidence for a bias toward multistep mutations. Specifically, all but five mutations are single-step insertions or deletions. This pattern is clearly distinct from what has been observed in C. elegans, indicating, again, the species and locus-specificity of the mutational processes.
The rate and pattern of mutations observed in the MA lines have implications for the use of microsatellites for inference of genetic history. It is critical to recognize that the evolutionary rate for a single locus will change with the size of the allele. Thus, choosing a microsatellite locus with the appropriate evolutionary rate to address a specific evolutionary time frame requires a careful consideration of allele size ranges. The data presented in this study provide guidelines for the selection of adequate markers for studying recent and ancient evolutionary branches of P. pacificus. Specifically, the absence of mutations in many short and/or imperfect loci, as well as the fact that for five markers only one mutation has been identified, do not allow us to use these results in deriving a mean mutation frequency. Interestingly however, all but one of the perfect repeats with more than 30 repeat units recovered multiple mutation events that resulted in a quite stable mutation rate of 2.527.5 · 10 24 (Table  2). Therefore, we suggest to only use microsatellite markers with a minimal length of 30 repeat units in studies that aim to reconstruct the evolutionary history of wild isolates. Furthermore, we suggest that such studies should use an average mutation frequency of 5 · 10 24 given the relatively stable mutation frequencies obtained in this study. Taken together therefore, this study provides useful information for future genome-wide studies that investigate the evolutionary history of P. pacificus.