Mutation Rate Inferred From Synonymous Substitutions in a Long-Term Evolution Experiment With Escherichia coli

The quantification of spontaneous mutation rates is crucial for a mechanistic understanding of the evolutionary process. In bacteria, traditional estimates using experimental or comparative genetic methods are prone to statistical uncertainty and consequently estimates vary by over one order of magnitude. With the advent of next-generation sequencing, more accurate estimates are now possible. We sequenced 19 Escherichia coli genomes from a 40,000-generation evolution experiment and directly inferred the point-mutation rate based on the accumulation of synonymous substitutions. The resulting estimate was 8.9 × 10−11 per base-pair per generation, and there was a significant bias toward increased AT-content. We also compared our results with published genome sequence datasets for other bacterial evolution experiments. Given the power of our approach, our estimate represents the most accurate measure of bacterial base-substitution rates available to date.

genomic basesubstitution rate experimental evolution molecular evolution mutation pressure next-generation sequencing Mutations and genetic recombination provide the variation that fuels adaptation. Knowledge of mutation rates is therefore an important component of a quantitative evolutionary theory (Lynch 2010). In bacteria, spontaneous base-substitution rates have been estimated by Luria-Delbrück fluctuation tests using selective conditions (Drake 1991;Lynch 2006, 2010 andreferences therein) and by comparing DNA sequences from lineages with approximately known divergence times (Ochman et al. 1999). Both methods have limitations. The former requires knowledge of the mutational target size for the relevant phenotype and makes assumptions concerning growth and selection that do not always hold in practice (Sniegowski and Lenski 1995). The latter assumes that synonymous substitutions are selectively neutral, requires estimates of generation times in nature, and is subject to additional uncertainty when there is recombination or selection on codon usage and GC-content (Balbi et al. 2009;Sharp et al. 2010;Touchon et al. 2009). Given these uncertainties, it is not surprising that the mutation rates estimated for E. coli using these two approaches differ by more than an order of magnitude (Drake 1991;Ochman et al. 1999).
More direct measurements of mutation rates are now possible using whole-genome sequences of isolates sampled from evolution experiments. We have previously applied this approach to one population from the long-term evolution experiment with E. coli Barrick and Lenski 2009) in which 12 populations have been propagated independently for over 40,000 generations (Lenski 2004;Philippe et al. 2007). Here, we resequenced genomes of 19 clones that were sampled from 8 populations (Table 1 and supporting information, Table S1) that did not evolve elevated mutation rates early in the experiment (Cooper and Lenski 2000;Sniegowski et al. 1997).

MATERIALS AND METHODS Mutation identification
Genomes were resequenced on the Illumina Genome Analyzer platform using one lane of single-end 36-bp reads per genome. Candidate point mutations were identified in comparison to the ancestral genome of REL606 [GenBank:NC_012967.1] using three computational approaches: (i) the SNiPer pipeline (Marchetti et al. 2010); (ii) the breseq pipeline , freely available online at http://barricklab.org/breseq); and (iii) an unpublished algorithm (O. Tenaillon). All candidates were then examined manually to account for local misalignment errors relative to the reference genome that resulted from gene conversion events, mobile element insertions, and large insertions and deletions. Table S1 presents the resulting consensus list of all synonymous substitutions arranged by population and clone. The dN/dS ratios were calculated for each clone according to Comeron (1995) as implemented in the libsequence library (Thornton 2003).

Synonymous target site calculations
For whole-genome studies of mutations in bacterial evolution experiments, we used in-house scripts to calculate the exact number of protein-coding sites in the ancestral genome according to gene annotations. The effective number of synonymous target sites was approximated as one-third of this number, as three mutational changes are possible from any ancestral base. This analysis does not take into account base composition effects or the small changes in genome size during these experiments. The sequence records used for other published studies were downloaded from Genbank (Accessions: NC_000913.2, AC_000091.1, NC_008095.1, and NC_003197.1). For our dataset, we used the Genbank sequence record for E. coli B strain REL606 (Accession: NC_012967.1) with updated gene annotations. Data files and Perl scripts for performing this analysis are available on J.E.B.'s web site (http://barricklab.org/amr).
n Table 1 Description of 35 synonymous mutations observed in 19 genomes sampled from eight evolving populations

Mutation rate estimate
We used a maximum-likelihood approach to estimate the rates of all six possible types of base-pair substitution mutations. This approach assumed that synonymous substitutions of a given type accumulated as a Poisson process with an expected number equal to the mutation rate multiplied by the number of generations elapsed and the total number of genomic sites at risk for synonymous substitutions of that type. This last factor corrected for regions of the ancestral genome where mutations could not be called in an evolved genome due to deletions, low coverage, or repetitive sequences, as output by the breseq pipeline. We corrected for pseudo-replication due to shared evolutionary history by averaging the calculated log likelihoods for genomes within population blocks. The overall point-mutation rate was then calculated by weighting the separately estimated rates for each type of mutation by the frequency of corresponding sites in the ancestral genome. Tukey's jackknife method was used to estimate overall confidence limits from the statistics of resampled (delete-1) datasets that each dropped all genomes from a single population. Data files and Perl and R scripts for performing this analysis are available on J.E.B.'s web site (http://barricklab.org/amr).

RESULTS AND DISCUSSION
We analyzed synonymous substitutions because, when examining all mutations in the 19 clones, we found dN/dS ratios higher than 1.0 for all but one (Table S1). This observation supports pervasive ongoing positive selection through 40,000 generations in these experimental populations ). Therefore, non-synonymous mutations are inappropriate for estimating the point-mutation rate.
From population genetics theory, the expected number of synonymous mutations in an evolved clone relative to its ancestor is equal to the product of the intrinsic base-substitution rate, the number of genomic sites at risk for synonymous mutations, and the number of elapsed generations (Kimura 1983). The only requisite assumption is that most synonymous mutations are selectively neutral. Importantly, the expected rate of accumulation of neutral mutations in the lineage leading to any particular clone is not affected by selection at other sites in the genome, because an asexual lineage simply represents a chain of replication events spanning the specified number of generations Kimura 1983).
We observed a total of 52 synonymous substitutions in the 19 resequenced genomes (Table S1). However, multiple genomes sampled from the same population are not independent because they share some portion of their history; thus, there were only 35 mutational events (Table 1). We used a resampling procedure to account for this pseudo-replication of multiple genomes isolated from a single population (see supporting information). The resulting estimate of the point-mutation rate is 8.9 · 10 211 per bp per generation (Tukey's jackknife 95% confidence interval, 4.0-14 · 10 211 per bp per generation). This estimate corresponds to a total genomic rate of 0.00041 per generation given the ancestral genome size of 4.6 · 10 6 bp.
Our inferred point-mutation rate is intermediate to other previous estimates based on experimental (Drake 1991) and comparative methods (Ochman et al. 1999). These earlier studies yielded estimates of 5.4 · 10 210 per bp per generation and 1.5 to 4.5 · 10 211 per bp per generation, respectively. Given the limitations of these approaches as noted above, our estimate is probably more accurate. This greater accuracy derives from the accumulation of mutational events across 300,000 generations (summed over the eight replicate populations) and over the entire genome, coupled with precise knowledge of the number of elapsed generations and the reasonable presumption of selective neutrality or near-neutrality for most synonymous mutations. At the same time, it must also be emphasized that mutation rates may differ between strains and species, and they may change depending on the environmental conditions experienced by the cells (Bjedov et al. 2003).
To put our estimate into context, we performed a similar analysis of all other published whole-genome datasets for bacterial evolution experiments with known numbers of generations (Table 2). Taking the other experiments together, we found 10 synonymous SNPs in 18 independently evolved (nonmutator) clones in a total of 30,550 generations. These other datasets combined thus provide only 10% of the power, in terms of cumulative generations, as the long-term dataset that we have generated and analyzed. As a consequence, the estimated point-mutation rates for these other experimental systems are subject to much greater statistical uncertainty.
With 35 independent synonymous mutations, we were also able to examine the mutational spectrum of base substitutions (Figure 1). After correcting for the sequence composition of genomic sites at risk for synonymous mutations in the ancestral genome, the observed n For these calculations, we used only independently evolved end-point clones, and we pooled data from replicate lineages started from the same ancestral strain. a The effective synonymous target size was calculated from the ancestral genome sequences (see Materials and Methods). b The mutation rate m (per bp per generation) was calculated as the number of observed synonymous mutations divided by the product of the total number of generations and the effective number of synonymous target sites. Brackets indicate 95% confidence limits estimated from a binomial distribution. These estimates do not take into account base composition or changes in genome size. c For comparison with the other datasets, we used only the first clone sequenced at the latest nonmutator time point from each of the eight long-term populations: 20K-A for Ara-1,40K for Ara-3, and 40K-A for the other six populations (Table 1). There were 25 synonymous mutations in these clones and 52 overall in the dataset. A more accurate estimate of m and its uncertainty for the long-term lines takes into account the multiple clones sequenced from the same population, the pseudo-replication of clones from the same population, the base signatures of the mutations, and changes in genome size. That comprehensive analysis yields 8.9 [4.0-14] · 10 211 per bp per generation (see text).
transition-to-transversion ratio of 1:1.99 did not differ significantly from the 1:2 ratio expected if there were a uniform probability of all six base-substitution mutations (two-tailed binomial test, P = 0.61). However, transitions were highly skewed. Mutations from C:G to T:A were 14.5 times as likely as A:T to G:C mutations after accounting for sequence composition (two-tailed binomial test, P = 0.00027). This finding is consistent with other recent studies that found a strong mutational bias toward increased AT composition in bacteria (Balbi et al. 2009;Hershberg and Petrov 2010;Hildebrand et al. 2010). This bias in mutation pressure explains the pattern of synonymous mutations seen in our study, and it also implies that selection or gene conversion must account for the characteristic GC-contents observed in divergent groups of bacteria over much longer evolutionary timescales (Rocha and Feil 2010).
Communicating editor: Brenda J. Andrews Figure 1 Expected and observed mutational spectra for synonymous point mutations. White and black bars show the expected and observed base-pair changes, respectively. The expected values reflect the actual base-pair frequencies in the genome and the probability that a particular base-pair mutation (e.g., from C:G to T:A) produces a synonymous change.