A Recent Global Selective Sweep on the age-1 Phosphatidylinositol 3-OH Kinase Regulator of the Insulin-Like Signaling Pathway Within Caenorhabditis remanei

The discovery that genetic pathways can be manipulated to extend lifespan has revolutionized our understanding of aging, yet their function within natural populations remains poorly characterized. In particular, evolutionary theories of aging predict tradeoffs in resource investment toward somatic maintenance vs. reproductive output that should impose strong natural selection on genetic components that influence this balance. To explore such selective pressure at the molecular level, we examine population genetic variation in the insulin-like signaling pathway of the nematode Caenorhabditis remanei. We document a recent global selective sweep on the phosphoinositide-3-kinase pathway regulator, age-1, the first life-extension gene to have been identified. In particular, we find that age-1 has 5−20 times less genetic variation than any other insulin-like signaling pathway components and that evolutionary signatures of selection center on the age-1 locus within its genomic environment. These results demonstrate that critical components of aging-related pathways can be subject to shifting patterns of strong selection, as predicted by theory. This highly polymorphic outcrossing species offers high-resolution, population-level analyses of molecular variation as a complement to functional genetic studies within the self-reproducing C. elegans model system.

It is clearly advantageous for organisms to live and continue reproducing for as long as possible. The evolutionary explanation for why organisms instead tend to age and die derives from the fact that the high reproductive value of offspring produced early in life weakens the relative strength of selection against deleterious mutations acting later in life. This can result either in the accumulation of mutations with late-onset, age-specific effects (mutation accumulation ;Medawar 1952) or the preferential fixation of alleles with favorable effects early in life, even if they have negative consequences later in life (antagonistic pleiotropy ;Willams 1957). Under either of these scenarios, we might expect aging to result from the accumulation of genetic problems in a diverse set of biological systems. It was therefore somewhat surprising when age-1, the first mutation shown to extend life span (in this case in the nematode Caenorhabditis elegans), was described by Friedman and Johnson (1988). Even more surprising was the fact that age-1 is part of the larger genetic pathway controlling insulin signaling (Figure 1) in which disruption of multiple components, most notably the daf-2 insulin receptor (Kenyon et al. 1993), can also lead to life extension in nematodes and a wide variety of other animals (Garofalo 2002; Barbieri et al. 2003;Kenyon 2005;Broughton and Partridge 2009). The most likely explanation for the conserved effects of this pathway on longevity is that it mediates a physiological switch point that governs a trade-off between investment in reproduction and investment in the response to stress (e.g., starvation) (Kirkwood 2002). Indeed, the insulin-signaling pathway satisfies the structural expectations of the antagonistic pleiotropy model of aging as longevity mutations in age-1 and daf-2 show a fitness cost under nutrient stress (Walker et al. 2000;Jenkins et al. 2004). As such, we would expect the pattern of selection on the regulation of the insulinsignaling pathway to vary over time with shifts in the environment and with changes in the demographic structure of populations. This expectation is further motivated by the pattern of selection for the longevity gene methuselah (mth) in Drosophila. Mutants for the G-protein coupled receptor mth have increased lifespan but also show a trade-off between longevity and reproduction under some circumstances (Mockett and Sohal 2006). Moreover, the gene mth is adaptively evolving (Schmidt et al. 2000), and allelic diversity in mth among populations coincides with clinal variation in longevity (Schmidt et al. 2000;Duvernell et al. 2003) and contributes to genetic differences in lifespan (Paaby and Schmidt 2008), further implicating natural selection acting on lifespan and on genetic variation at this locus. However, the correlation between variation in lifespan and allelic variation at mth could differ among populations and/or depend on specific environments (Sgro et al. 2013).
Nevertheless, the number of studies investigating patterns of selection in genes involved in trade-offs between lifespan and reproduction is limited. A rational aim would be to look for evidence of this kind of selection within C. elegans, the species in which the majority of the aging-related mutations have been isolated. However, the natural ecology of C. elegans is not well defined, and its population genomic structure makes it difficult to use DNA sequence variation to make inferences about the evolutionary forces generating a phenotypic variation. In particular, linkage disequilibrium (LD) spans whole chromosomes (Cutter 2006;Rockman and Kruglyak 2009;Andersen et al. 2012), suggesting that both background selection and selective sweeps are likely to perturb genetic variation and nucleotide sites far away from the site under selection (Gaertner and Phillips 2010). For example, the vast majority of variation in gene transcript level within the species appears to be well described as a function of background selection operating in genomic regions of low recombination (Rockman et al. 2010). Moreover, the total amount of genetic variation within this species, which appears to be largely tied up within a few dozen haplotypes (Rockman and Kruglyak 2009;Andersen et al. 2012), is very low and does not reflect geographic structure, perhaps reflecting fairly recent dispersal of C. elegans around the world (Phillips 2006).
The pattern of nucleotide variation within C. elegans differs starkly with the gonochoristic or obligately outcrossing species of the genus. For example, C. remanei is a temperate species that lives in association with terrestrial isopods and displays~20-fold more sequence polymorphism than C. elegans (reviewed in Cutter et al. 2013). LD also breaks down very rapidly within the species (on the order of a few hundred base pairs; Cutter et al. 2006;Dey et al. 2012), making it ideal for high-resolution mapping of recent evolutionary changes. The recent discovery of a near outgroup for C. remanei, C. sp. 23 (Dey et al. 2012) is particularly valuable in this regard because it is now possible to analyze patterns of genetic divergence more accurately, which has been heretofore problematic in Caenorhabditis because the large degree of divergence among currently sequenced species tends to lead to saturation of neutral sites in the genome. Here, we build on the functional knowledge generated within C. elegans and take advantage of the population genetic strengths of C. remanei to examine patterns of sequence variation across the entire insulin-like signaling (IS) pathway. We find a clear genomic footprint of a recent selective sweep on one pathway component (age-1), suggesting that the shifting pattern of natural selection on genes influencing the balance between investment in early and late life function predicted by theory can be observed within this species.

MATERIAL AND METHODS
Identification of orthologs C. remanei orthologs of the C. elegans insulin-signaling genes (highlighted in Figure 1) were identified from the current C. remanei genome assembly (version 15.0.1; Genome Sequencing Center, Washington University, St Louis, unpublished data) using the TBLASTN program (Altschul et al. 1990). Intron/exon boundaries were predicted with respect to the C. elegans protein sequence. No ortholog of akt-2 could be identified, as it appears to be a gene duplication within the C. elegans lineage (Jovelin and Phillips 2011). Although some conserved exons could be identified, no clear ortholog of daf-18 could be found, presumably because of extensive divergence at this locus (see also Alvarez-Ponce et al. 2009). This procedure also was applied to the identification of the orthologs of genes immediately flanking age-1, which show conserved synteny between C. elegans and C. remanei ( Figure 2A). Orthologs of age-1 and its immediate neighbors were identified in Caenorhabditis sp. 23 through direct sequencing using C. remaneispecific primers. We obtained the full sequence for the C. sp. 23 orthologs of age-1, srh-44, mdt-8, CRE01736, CRE02129, and CRE02131 and partial sequence for the C. sp. 23 ortholog of CRE01735 (27%).

Strains, amplification, and sequencing
The C. remanei strains used in this study are isofemale lines derived from individuals collected from isopods or decaying vegetal matter and sampled from three different populations in Dayton, Ohio; Kiel, Germany; and King City, Ontario, Canada (Cutter 2008;Jovelin et al. 2009;Dey et al. 2012). We also used a strain of the closely related species Caenorhabditis sp. 23, isolated from Wuhan City, China, as an outgroup (Dey et al. 2012). All strains were maintained on agar plates seeded with Escherichia coli OP50 following standard protocols (Brenner 1974).
For the C. remanei strains from Ohio, total RNA was extracted from plates containing individuals at all stages of development using the TRI Reagent protocol (Molecular Research Center) and subsequently used to synthesize double-stranded complementary DNA with the Retroscript kit (Ambion). Primers designed from the C. remanei genomic sequence were then used to amplify and sequence the coding region of the insulin signaling genes. DNA also was amplified from a single individual using the manufacturer's protocol of the Repli-G kit (QIAGEN) for each strain of C. remanei from Ohio, C. remanei from Germany, and C. sp. 23. Genomic DNA isolated from a single individual was diluted 20 times before undergoing polymerase chain reaction. For each strain of C. remanei from Ontario, DNA was isolated from large populations of worms using the DNeasy Blood and Tissue kit (QIAGEN). Genomic DNA was then used as a template to amplify and sequence the coding and intronic regions of age-1 and its three immediate neighbors in the 59 and 39 flanking regions ( Figure 2). Amplifications were processed in 50-mL reaction volumes with 2.5 mL of dimethyl sulfoxide, 5 mL olf 10X Buffer (Fermentas), 4 mL of MgCl 2 , 0.6 mL of each primer (50 mM), 0.3 mL of TrueStart Taq polymerase (Fermentas), and 1 mL of template complementary DNA or 2 mL of genomic DNA. Cycling conditions were: 95°for 4 min followed by 35 cycles of 95°for 1 min, 55°or 58°for 1 min, and 72°for 3 min. Amplifications were sequenced using automated sequencers at the University of Oregon and University of Arizona sequencing facilities. All sequence changes were rechecked visually against sequencing chromatograms. Heterozygote sites were coded according to the International Union of Pure and Applied Chemistry nomenclature. Haplotypes were resolved using the program PHASE 2.1 (Stephens et al. 2001), implemented in DnaSP 5.10 (Librado and Rozas 2009). Both haplotypes were used for each strain in subsequent analyses.

Relationships among strains and sampling scheme
We examined the relationships among strains by using neighbor networks generated with a Jukes-Cantor distance in the program SplitsTree 4.10 (Huson and Bryant 2006). We performed all population genetic analyses by using several sampling schemes: first, considering each population separately and second, grouping all strains together .

Nucleotide diversity and tests of neutrality
Insertions and deletions were excluded in all analyses. Estimates of nucleotide diversity (p; Nei 1987) were computed for different categories of sites with DnaSp 5.10 (Librado and Rozas 2009). The sliding window analysis of nucleotide diversity across the 17-Kb genomic region was performed using 673 windows, each 150 bp-long with a 25-bp step size.
We tested deviation from neutrality by using Tajima's D (Tajima 1989) computed either using synonymous or silent (synonymous + intronic) sites. The significance of Tajima's D was determined by coalescent simulations using DnaSP 5.10 with 50,000 replicates, making the conservative assumption of no intragenic recombination (Tajima 1989;Wall 1999) and conditioning on the number of segregating silent sites S. We combined our data with published data on polymorphism in the coding sequence of 87 genes with various function, sampled in the Ohio population, to plot the empirical distribution of Tajima's D (Jovelin et al. 2003;Cutter et al. 2006;Cutter 2008;Jovelin 2009;Jovelin et al. 2009). We used C. sp. 23 as an outgroup to determine ancestral and derived alleles within our C. remanei samples (Dey et al. 2012). We then further tested deviations from neutral expectations by using the normalized Fay and Wu's H statistics (Fay and Wu 2000;Zeng et al. 2006), and assessed significance by coalescent simulations with 10,000 replicates using the program DH (Zeng et al. 2006). Because the H test is sensitive to misidentification of ancestral and derived states, we estimated the probability of misorientation following the method developed by (Baudry and Depaulis 2003).
We used pairwise Hudson-Kreitman-Aguadé (HKA) tests (Hudson et al. 1987) and coalescent simulations with 10,000 replicates using the program HKA (J. Hey, unpublished data) to examine the significance of silent site nucleotide differences between age-1 and its neighbors (Obbard et al. 2011). We also used maximum likelihood HKA tests (Wright and Charlesworth 2004) to further investigate patterns of selection at age-1 and test the significance of the observed low level of neutral site nucleotide diversity. For this analysis, we combined our data with published polymorphisms at synonymous sites for 20 loci sampled in the same populations and for which the C. sp. 23 ortholog is available (Dey et al. 2012). Maximum likelihood estimates of u and k, the selection parameter, were generated using 200,000 chains and with starting values of the parameters T and u obtained by analyzing the data with the program HKA as described previously. We repeated this procedure three times to ensure that parameter estimates were similar. We performed a likelihood ratio test between the null hypothesis of neutral evolution and the alternative hypothesis of selection at age-1, and obtained significance of the likelihood ratio statistics 2DL by comparison with the x 2 distribution with 1 degree of freedom (Wright and Charlesworth 2004).

Scans of selective sweep
We used the program SweepFinder (Nielsen et al. 2005) to test for a selective sweep in the vicinity of age-1. This method computes a likelihood ratio test between a model of a selective sweep to a null model obtained from the background frequency spectrum in the data. The grid size parameter was set to 125. We used the unfolded site frequency spectrum (SFS) with derived alleles determined by comparison with C. sp. 23 and used the folded SFS at sites where data are missing in C. sp. 23 or when the C. sp. 23 allele was distinct from the C. remanei alleles. To evaluate how missing data in intergenic regions between the genes of interest might affect our results, we resequenced the entire 17-Kb region, including intergenic sequence, in 15 individual worms from the population in Ohio. For this analysis, we performed the selective scan with SweepFinder using the folded SFS. In addition, we performed another sweep scan using patterns of LD with this dataset. This method identifies selected regions that are flanked by high LD but with low LD across the region (Kim and Nielsen 2004). We used the program OmegaPlus (Alachiotis et al. 2012) to compute the v statistics describing this LD pattern under a selective sweep. The grid size parameter was set to 125, and the minwin and maxwin parameters were set, respectively, to 1000 bp and 2000 bp. For each analysis, the 1% cutoff value of the composite likelihood ratio (CLR) test and the v statistics was obtained by coalescent simulations under the standard neutral equilibrium model with 10,000 replicates using the program ms (Hudson 2002). The standard neutral model provides a conservative test (Nielsen et al. 2005) and the pattern of polymorphism in C. remanei suggests demographic equilibrium, in particular in the populations from Ohio and Ontario Dey et al. 2012).

Protein sequence divergence
The protein sequences of C. remanei and C. sp. 23 orthologs of each gene within the age-1 genomic region were aligned by eye using Bio-Edit (Hall 1999) and subsequently used to generate codon-based DNA sequence alignments. Maximum likelihood estimates of the rates of nonsynonymous (dN) and synonymous (dS) substitutions were then computed between C. remanei and C. sp. 23 with the CODEML program in PAML 3.14 (Yang 1997). We examined adaptive evolution in the protein sequences of age-1 and its neighbors by contrasting polymorphism and divergence in their coding sequence using the McDonald-Kreitman test (McDonald and Kreitman 1991).

Patterns of variation across the IS pathway
We quantified nucleotide variation in the coding sequence of the IS genes in a population of C. remanei from Ohio to investigate the microevolution of insulin-signaling (Table 1). Overall levels of nucleotide variability are similar to previous reports in this species (Graustein et al. 2002;Jovelin et al. 2003;Haag and Ackerman 2005;Cutter et al. 2006;Cutter 2008;Jovelin 2009;Jovelin et al. 2009;Dey et al. 2012) with the key exception of the pattern of polymorphism at the age-1 locus. There is no evidence that expression level (Spearman's r = 20.071, P = 0.879) or pathway position (Spearman's r = -0.132, P = 0.754) affect synonymous site diversity across the pathway as a whole (see also Jovelin and Phillips 2011). Nucleotide diversity at age-1 is 20-fold lower than nucleotide diversity for the most polymorphic IS gene, aap-1, such that age-1 has only 34 polymorphisms in 3564 bp of coding sequence (Table 1). More intriguing is the unusually low variation at age-1 synonymous sites (p s = 0.257%) relative to the other 7 IS genes (average p s = 3.93%) and to other loci sampled in the same population (n = 91, average p s = 3.75%). This low nucleotide diversity could result from a selective sweep linked to age-1 or from strong purifying selection at synonymous sites. A recent selective sweep at the age-1 locus Natural selection can be uncovered because of the signatures it leaves in the genomic sequence around the sites under selection. A selective sweep results in a reduction of nucleotide diversity because linked neutral variants hitchhike with the selected allele (Maynard Smith and Haigh 1974). To test for such an effect on age-1, we collected polymorphism data in the coding and intronic regions of age-1 and its three upstream and downstream immediate neighbors, located within a 17-kb region, from three populations of C. remanei (Ohio, Ontario, and Germany, Table 2 and Figure 2). In addition, we sequenced the orthologs of these seven genes in the closely related species Caenorhabditis sp. 23 (Dey et al. 2012) to measure interspecific sequence divergence and to polarize the ancestry of polymorphisms within C. remanei. There is a clear reduction of nucleotide diversity centered directly on age-1 in the populations from Ohio and Ontario and centered on CRE02129 in the population from Germany (Figure 3). We combined data from the three populations to examine global patterns of nucleotide variation within C. remanei. Similarly, nucleotide polymorphism is lowest for age-1 and CRE02129 in the pooled sample and increases as a function of the distance from these two genes ( Figure 3). We then performed pairwise HKA tests between age-1 and each of its neighbors to determine the significance of the reduction of nucleotide diversity at age-1 (Hudson et al. 1987). In all population samples, the nucleotide diversity at silent sites is significantly reduced at age-1 relative to its two most distant neighbors, and in the Ontario population all genes but CRE02129 have significantly higher silent site nucleotide variation than age-1 (Figure 3). To further explore selection at age-1, we contrasted multilocus polymorphism and divergence by combining our data with a larger set of genes (Dey et al. 2012) and used the maximum likelihood HKA framework (Wright and Charlesworth 2004). Synonymous site variation in age-1 is significantly reduced relative to the neutral model in all samples but the German population, consistent with the action of positive selection (Table 3).
A selective sweep perturbs the SFS such that it results in an excess of low-frequency variants at linked sites (Tajima 1989). Thus, we first quantified the SFS by using Tajima's D (D Taj ) (Tajima 1989). In the Ohio population and in the pooled sample, D Taj is significantly negative for age-1 but not for its neighbors (Figure 3). Moreover, the number of rare alleles decreases a function of the distance from age-1, suggesting that age-1 is the focal point of a selective sweep (Figure 3). D Taj values also form a valley in the populations from Germany and Ontario, with negative values for age-1 and its closest neighbors, although genes with the most negative values are the immediate neighbors CRE02129 in the German population and mdt-8 in the Ontario population ( Figure 3). Demographic factors, such as population growth, also can result in an excess of lowfrequency alleles and significant D Taj values across the entire genome. However, we found that the value of D Taj for age-1 is the most negative among 92 protein-coding genes, indicating that demographic n history is insufficient to explain the strong skew in SFS for age-1 (synonymous sites, D Taj = -2.08, P , 0.01, Figure 4). Alternatively, the reduction of nucleotide diversity we observed around age-1 could be the result of background selection, the removal of neutral variants linked to deleterious mutations (Charlesworth et al. 1993), and so negative D Taj values may reveal purifying selection (Tajima 1989). However, another signature of a selective sweep is an excess of derived high-frequency variants (Fay and Wu 2000). Fay and Wu's H (H FW ) is significantly negative for age-1 in the Ohio, German and pooled samples, indicating that age-1 has an excess of derived high-frequency single-nucleotide polymorphisms relative to neutral expectations ( Figure 3). However, other genes also have significant negative values of H FW , depending on the sampling scheme, suggesting that the SFS at these genes is somewhat perturbed by the sweep (Figure 3). A potential issue associated with the H test is the misidentification of ancestral and derived states as the H test is very sensitive to homoplasy (Baudry and Depaulis 2003). Nevertheless, our results are unlikely to be an artifact of misorientation because the inferred probability of misorientation in our data are 0.078% (0.062% for age-1) (Baudry and Depaulis 2003). Altogether, the patterns of polymorphism and SFS suggest that age-1 is the direct target or is tightly linked to a target of a selective sweep that affects C. remanei on a global spatial scale.

Selective sweep scans
We used the method of (Nielsen et al. 2005) to scan for a selective sweep within the age-1 genomic region. This method performs a likelihood ratio test between a model of selective sweep and a null model derived directly from the observed SFS in the data. The CLR is maximized and is significant (P , 0.01) at age-1 in all three populations and in the pooled sample, although the exact position of CLR max and the shape of the likelihood ratio surface vary between samples ( Figure 5). For the Ontario population, the CLR also is significant for mdt-8, consistent with the analyses of the SFS based on Tajima's D and Fay and Wu's H (Figure 3). Thus, these analyses further implicate age-1 as the target of a global selective sweep. Figure 3 The nucleotide diversity around age-1 is reduced in three populations of C. remanei and at a global spatial scale. The site frequency spectrum shows an excess of rare alleles (Tajima's D) and an excess of derived high-frequency variants (Fay and Wu's H) localizing directly at age-1 and/or on its close neighbors. Significance of the difference in silent site nucleotide diversity between age-1 and each of its neighbors was assessed using pairwise HKA tests. Significance of the Tajima's D and Fay and Wu's H statistics were determined by coalescent simulations. Ã P , 0.05, ÃÃ P , 0.01, ÃÃÃ P , 0.001.
n k, selection parameter, k , 1 indicates a reduction in diversity due to selection; L, log-likelihood of the hypothesis; 2DL, likelihood ratio statistics.
All the aforementioned results for the 17-kb region encompassing age-1 its neighbors are based on polymorphisms collected in the coding and intronic sequences of these genes. Although intergenic sequence comprises only~11% of this genomic region and is thus unlikely to affect our results, we nevertheless sequenced the entire 17-kb region, including intergenic sequence, in 15 individuals from the Ohio population and re-examined signatures of selective sweep with this data. First, consistent with the pattern of diversity at individual loci, a sliding window analysis shows a clear reduction in nucleotide diversity within a~6-kb region spanning from the end of shr-44 through CRE02129 to age-1 ( Figure 6A). Second, the CLR along the genomic region is maximized within age-1 at position +3391 (relative to the start codon) in exon 7 ( Figure 6B, CLR max = 6.84, P , 0.01). Third, we further examined the occurrence of a selective sweep in the age-1 genomic region using patterns of LD. Another signature of a selective sweep is increased LD on each side but low LD across the selected region (Kim and Nielsen 2004;Pavlidis et al. 2010;Alachiotis et al. 2012). The v statistics, measuring the LD pattern under a sweep, is maximized within CRE02129 at position +913 (relative to start codon) in the last exon ( Figure 6B, v max = 12.19, P , 0.01). Both methods of selective sweep detection based on the SFS and LD identify a narrow selected region as the CLR max and v max are distant from each other by only~1 kb ( Figure 6B). Because we targeted the age-1 region for further analysis based on our findings for the different components of the IS pathway, there is the possible concern of statistical ascertainment bias (Thornton and Jensen 2007). However, such an issue should be less pronounced for our a priori selected pathway scan than for a full genome scan (which leads to numerous posthoc tests), and the P-values associated with our analysis of the age-1 region suggest that statistical significance of our findings will be robust to moderate adjustment of the significance threshold (Supporting Information, Table S1).
Protein sequence divergence of age-1 and its neighbors The models of selective sweep based on LD and the SFS are most powerful in detecting recent hitchhiking events (Nielsen et al. 2005;Pavlidis et al. 2010). To investigate selection over longer evolutionary time scales in the coding sequence of age-1 and its neighbors, we contrasted patterns of polymorphism within species with sequence divergence between species using the McDonald-Kreitman test (McDonald and Kreitman 1991). First, we note that CRE02136 and age-1 have the highest dN/dS values among the 7 genes tested, indicating relatively rapid protein sequence divergence (Table 4). Second, we found that the ratios of non-synonymous to synonymous polymorphisms and substitutions are not equal for CRE02129 and age-1, as would be expected by the Neutral Theory (Table 4). CRE02129 exhibits long term purifying selection with a significant deficit of sequence divergence (P = 0.001). However, age-1 shows a significant excess of sequence divergence relative to polymorphism (P = 0.037), implicating repeated fixation of adaptive mutations by positive selection in its coding sequence. Altogether, our results strongly support age-1 as the focal point of positive directional selection and a global selective sweep.

DISCUSSION
Evolutionary theories of aging predict that senescence evolves as a result of a trade-off between maintenance and repair of the soma and investment in reproduction (Kirkwood 2002). In most circumstances, reproduction that occurs earlier in life will have a larger effect on fitness and on the rate of population growth than reproduction that occurs later in life (Rose et al. 2008). Thus, under the antagonistic pleiotropy theory of aging, beneficial mutations early in life will be favored even if they cause deleterious effects late in life (Willams 1957). If existing genetic systems have evolved under these conditions, then we would expect mutations that increase lifespan to have negative effects on reproduction (and vice versa). Both the insulin-like receptor daf-2 and the phosphatidylinositol 3-OH kinase (PI3K) catalytic subunit age-1, which are known to increase lifespan when mutated (Kimura et al. 1997;Ayyadevara et al. 2008), exhibit a fitness cost under nutrient stress, as predicted by the antagonistic pleiotropy model (Walker et al. 2000;Jenkins et al. 2004). However, all of these studies have been conducted with the use of induced mutations whose effects have been examined under laboratory conditions (although see Van Voorhies et al. 2005). In nature we might expect the optimal balance between reproduction and somatic maintenance to shift depending on environmental conditions and local demography. Further, natural allelic variation may or may not well represent the severe effects displayed by mutations isolated and studied in the laboratory (Anderson et al. 2011). How then, does natural selection shape variation in these genetic pathways in nature?
The PI3K catalytic subunit age-1 is the target of a recent selective sweep in C. remanei Our analysis of DNA sequence variation in the IS pathway shows that polymorphism at most loci is high and very similar to that observed in other genes with a wide range of biological functions (Jovelin et al. 2003;Cutter et al. 2006;Cutter 2008;Jovelin 2009;Jovelin et al. 2009). However, variation in one gene, the age-1 PI3K, is much lower than any other gene in the pathway and, indeed, is lower than any other previously examined locus within this species. Analysis of a broader distribution of polymorphism in multiple populations clearly demonstrates that this region of the genome has recently undergone a global selective sweep that appears to be centered directly at the age-1 locus.
Although a comparative analysis among species within the Caenorhabditis genus has shown that divergence among IS pathway components appears to be largely driven by differences in gene expression (Jovelin and Phillips 2011), we do not see this pattern reflected in within-population variation. In C. elegans, age-1 is part of an operon that includes genes mdt-8 and Y62F5A.12. More generally, age-1 is located in a highly compact genomic region in which the distance Figure 4 Empirical distribution of Tajima's D from 92 protein coding genes sequenced from the same population in Ohio. Cre-age-1 has the most negative Tajima's D value (black bar), suggesting that the excess of rare variants at Cre-age-1 is not the result of genome-wide demographic effects. Tajima's D was computed using synonymous site diversity.
between two gene neighbors is only a few hundred base-pairs long ( Figure 2). We detected strong purifying selection on CRE02129, the closest downstream neighbor of age-1. However, the pattern of diversity at age-1 does not result from linked negative selection at CRE02129. Explicit models consistently localize age-1 as a target of a selective sweep. Moreover, the abundance of high-frequency derived single-nucleotide polymorphisms and the rapid protein sequence divergence in age-1 are not compatible with background selection  shaping diversity within this gene. Nevertheless, the short distance between age-1 and CRE02129 invites the question of how positive and negative selection interfere in this genomic region (Hill and Robertson 1966).
The function of AGE-1 is to convert phosphatidylinositol(4,5)P 2 into phosphatidylinositol(3,4,5)P 3 . Membrane-bound phosphatidylinositol(3,4,5)P3 then recruits the IS kinases PDK-1, AKT-1, and SGK-1, as well as presumably many other signal-transduction proteins that possess a pleckstrin-homology domain ). Loss of function mutations in age-1 not only affect overall kinase activity but also down-regulate the transcription of several genes in the IS pathway as well as in other signaling-pathways ). Mutations in age-1 have high potential to induce broad regulatory effects that affect fitness even beyond its wellstudied role in stress response and aging. Thus, although age-1 appears to be an ideal example of a gene in which a direct connection can be made between the mode of selection in natural populations and trade-off between increased fitness and senescence as predicted by the antagonistic pleiotropy model, tests of specific allelic function are needed to establish whether or not the pattern of selection detected here can be directly attributed to a trade-off between lifespan and reproduction. Interestingly, in a comprehensive comparative analysis of differences in gene expression over development between C. elegans and C. briggsae, Grün et al. (2014) found that expression for genes involved in the insulin-signaling pathway displayed the strongest signal of divergence across the entire genome, potentially indicating adaptive divergence within these species as well.
We did not detect unusual nucleotide diversity for daf-2 (Table 1), although we surveyed only 35% of the daf-2 coding sequence. However, it is noteworthy that positive selection has been detected in the daf-2 ortholog in Drosophila InR (Guirao-Rico and Aguade 2009). Thus, daf-2/InR might provide another example of a gene with antagonistic pleiotropic effects on aging and reproduction that evolves by positive selection.
Caenorhabditis as a model system for population genomics This is the first report of a recent selective sweep localized to a targeted gene in Caenorhabditis. Low nucleotide variation and extensive LD make investigation of selected targets difficult in C. elegans (Cutter 2006;Jovelin et al. 2009;Rockman and Kruglyak 2009;Rockman et al. 2010;Andersen et al. 2012), particularly if the aim is to tie molecular variation to a specific evolutionary context. High nucleotide diversity in C. remanei (Cutter et al. 2013), coupled with the rapid decay of LD , suggest that genome-wide scans will be successful in localizing targets of adaptive evolution in this species. Furthermore, C. remanei displays a great deal of genetic variation for a variety of phenotypes, including those associated with stress resistance and longevity (Reynolds and Phillips 2013). With the wealth of information on genetics, development and cell biology obtained from decades of research in C. elegans and the increasing availability of genomic resources from a number of different species, Caenorhabditis is rapidly joining Drosophila as an excellent model clade for evolutionary genetic analyses. Overall, then, we are now at a stage in which general theories regarding the evolution of biological systems as seemingly complex as aging can be directly tested by combining our rapidly expanding knowledge of the molecular function of critical pathways with comprehensive population genetic analyses of pathway components. n