Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life

Ossowski et al. (2010).

h

i

Schrider et al. (2013).

j

Sung et al. (2012a); Ness et al. (2015).

k

l

Conrad et al. (2011); O’Roak et al. (2011, 2012); Kong et al. (2012); Campbell and Eichler (2013); Wang and Zhu (2014); The 1000 Genomes Project Consortium (2015).

m

Uchimura et al. (2015).

n

o

Lynch et al. 2008); (Zhu et al. 2014).

p

Ossowski et al. (2010); (Yang et al. 2015).

q

Halligan et al .(2004).

Open in new tab

Table 1

Effective genome size (G_e), indel events per site per generation (u_id), base-substitution mutation rate per generation (u_bs), θ_s (or π_s, denoted by *) measurements for population mutation rate (Watterson 1975; Tajima 1989; Fu 1995), and estimated effective population size (N_e) for seven prokaryotic and eight eukaryotic organisms (see File S1 for details)

Species	Label	G_e (× 10⁷ Sites)	G_c + G_nc (× 10⁷ Sites)	u_id (× 10⁻¹⁰ per Site per Generation)	u_bs (× 10⁻¹⁰ Events per Site per Generation)	θ_s or π_s	N_e (× 10⁶)
Prokaryotes
Agrobacterium tumefaciens	Agt	0.50	0.57	0.30	2.92	0.200*	342.47
Bacillus subtilis	Bs	0.36	0.43	1.20^d	3.35^d	0.041	61.19
Escherichia coli	Ec	0.39	0.46	0.37^e	2.00^e	0.071	179.60
Mesoplasma florum	Mf	0.07	0.08	23.10^f	97.80^f	0.021	1.07
Pseudomonas aeruginosa	Pa	0.59	0.67	0.14^g	0.79^g	0.033*	210.70
Staphlyococcus epidermidis	Se	0.21	0.26	1.13	7.40	0.052	35.14
Vibrio cholerae	Vc	0.34	0.39	0.18	1.15	0.110	478.26
Eukaryotes
Arabidopsis thaliana	At	4.21	5.55^a	11.20^h	69.50^h^,^p	0.008	0.29
Caenorhabditis elegans	Ce	2.50	6.37^b	6.69ⁱ	14.50^q	0.003	0.54
Chlamydomonas reinhardtii	Cr	3.92	5.51	0.44^j	3.80^j	0.032	43.31
Drosophila melanogaster	Dm	2.32	8.86^c	4.61^k	51.65^k	0.018	0.86
Homo sapiens	Hs	3.65	21.75^b	18.20^l	135.13^l	0.001	0.02
Mus musculus	Mm	3.55	27.17^b	3.10^m	54.00^m	0.004*	1.77
Paramecium tetraurelia	Pt	5.68	7.28	0.04ⁿ	0.19ⁿ	0.008	101.80
Saccharomyces cerevisiae	Sc	0.87	1.02^b	0.92^o	2.63^o	0.004	7.78

Species	Label	G_e (× 10⁷ Sites)	G_c + G_nc (× 10⁷ Sites)	u_id (× 10⁻¹⁰ per Site per Generation)	u_bs (× 10⁻¹⁰ Events per Site per Generation)	θ_s or π_s	N_e (× 10⁶)
Prokaryotes
Agrobacterium tumefaciens	Agt	0.50	0.57	0.30	2.92	0.200*	342.47
Bacillus subtilis	Bs	0.36	0.43	1.20^d	3.35^d	0.041	61.19
Escherichia coli	Ec	0.39	0.46	0.37^e	2.00^e	0.071	179.60
Mesoplasma florum	Mf	0.07	0.08	23.10^f	97.80^f	0.021	1.07
Pseudomonas aeruginosa	Pa	0.59	0.67	0.14^g	0.79^g	0.033*	210.70
Staphlyococcus epidermidis	Se	0.21	0.26	1.13	7.40	0.052	35.14
Vibrio cholerae	Vc	0.34	0.39	0.18	1.15	0.110	478.26
Eukaryotes
Arabidopsis thaliana	At	4.21	5.55^a	11.20^h	69.50^h^,^p	0.008	0.29
Caenorhabditis elegans	Ce	2.50	6.37^b	6.69ⁱ	14.50^q	0.003	0.54
Chlamydomonas reinhardtii	Cr	3.92	5.51	0.44^j	3.80^j	0.032	43.31
Drosophila melanogaster	Dm	2.32	8.86^c	4.61^k	51.65^k	0.018	0.86
Homo sapiens	Hs	3.65	21.75^b	18.20^l	135.13^l	0.001	0.02
Mus musculus	Mm	3.55	27.17^b	3.10^m	54.00^m	0.004*	1.77
Paramecium tetraurelia	Pt	5.68	7.28	0.04ⁿ	0.19ⁿ	0.008	101.80
Saccharomyces cerevisiae	Sc	0.87	1.02^b	0.92^o	2.63^o	0.004	7.78

G_c + G_nc is the effective genome size when including the total amount of coding (G_c) and noncoding DNA (G_nc) that is estimated to be under purifying selection. Footnotes in u_id and u_bs indicate data sources (rates pooled when multiple data sources are available), and, when absent, indicate data generated in this study (see Materials and Methods).

a

Haudry et al. (2013).

b

Siepel et al. (2005).

c

d

Sung et al. (2015).

e

Lee et al. (2012).

f

Sung et al. (2012a).

g

Ossowski et al. (2010).

h

i

Schrider et al. (2013).

j

Sung et al. (2012a); Ness et al. (2015).

k

l

Conrad et al. (2011); O’Roak et al. (2011, 2012); Kong et al. (2012); Campbell and Eichler (2013); Wang and Zhu (2014); The 1000 Genomes Project Consortium (2015).

m

Uchimura et al. (2015).

n

o

Lynch et al. 2008); (Zhu et al. 2014).

p

Ossowski et al. (2010); (Yang et al. 2015).

q

Open in new tab Download slide

Open in new tab

To provide additional data for testing whether the power of genetic drift constrains the lower limit of indel mutation-rate evolution, we performed MA experiments in A. tumefaciens str. C58, S. epidermidis ATCC 12228, and V. cholerae 2740-80. Each bacterial MA experiment was initiated from multiple lines derived from a single progenitor colony, each of which was repeatedly bottlenecked to accumulate mutations for an average of 5819, 7170, and 6453 generations, respectively (see Materials and Methods; harmonic mean population sizes between transfers were 13.4 (0.1), 12.6 (0.3), and 14.9 (0.2), respectively). Then, 101-bp paired-end WGS was applied to randomly selected MA lines (47 A. tumefaciens, 22 S. epidermidis, and 46 V. cholerae MA lines, Dataset S1). The average sequencing coverage depth is greater than 20 × per site across all MA lines surveyed in these organisms (Figure S1), and greater than 50 × per site for 93.75% (150/160) of the MA lines, providing high accuracy for measurement of u_bs and u_id. Mutations were called and categorized for each of the three species (Dataset S3 and Dataset S4), with u_bs and u_id shown in Table 1.

To test the DBH, we combined u_bs and u_id from the three bacterial species analyzed in this study with u_bs and u_id from four bacterial and eight eukaryotic MA WGS studies (Table 1, Dataset S1, Dataset S2, Dataset S3, and Dataset S4), and also included the same estimates for human derived from WGS of parent-offspring trios. u_id includes all indel events in each of the 15 study species (see File S1). Due to the highly repetitive DNA sequence in eukaryotic genomes, the number of large indels events (> 9 bp) in eukaryotes may be downwardly biased when using WGS methods. Therefore, our estimate of the number of large indel events also includes events identified by comparative genome hybridization arrays for organisms where data were available (Lynch et al. 2008; Lipinski et al. 2011). Large indel events only account for 15.0% of total indels events across the study bacteria (76/506, Dataset S4), suggesting that any underestimation of the number of large indel events should only have a small effect on u_id.

To determine the genome-wide deleterious burden in each organism associated with indel mutations, we multiplied u_id with G_e, approximating the latter by the proteome size of that organism. A plot of the logs of the two parameters of u_idG_e and N_e against one another yields a strong negative correlation across all of cellular life (Figure 1A, r² = 0.89). Because the power of genetic drift is inversely proportional to N_e, this observation is consistent with the idea that selection operates to reduce mutation rates to a barrier imposed by random genetic drift. Phylogenetic nonindependence may complicate observed relationships between genomic attributes and N_e (Whitney and Garland 2010). However, the relationship between N_e and u_idG_e remains robust even after phylogenetic correction (Figure 2, A and B, r² = 0.83), indicating that the correlation between N_e and u_idG_e reflects a true biological phenomenon across the Tree of Life.

Figure 1

Relationship between the rate of indel events per generation per effective genome (uidGe) and effective population size (Ne). (A) Regression: log10(uidGe) = 2.23(0.48) – 0.73(0.07)log10Ne (r2 = 0.89, P = 6.81 × 10−8, d.f. = 13), with SE of parameter estimates shown in parentheses. Blue circles represent bacteria, red circles multicellular eukaryotes, and black circles unicellular eukaryotes, with all data summarized in Table 1. The full list of indel events for analyzed organisms is presented in Dataset S4. Chromosomal distributions of indel events at each site across all mutation-accumulation experiments are shown in Figure S1, A and B. (B) Relationship when adding the number of estimated noncoding sites under purifying selection into the effective genome size (Gc + Gnc) for eukaryotic organisms. Regression: log10[uid(Gc + Gnc)] = 3.49(0.66) – 0.87(0.09)log10Ne (r2 = 0.87, P = 3.13 × 10−7, d.f. = 13).

Relationship between the rate of indel events per generation per effective genome (u_idG_e) and effective population size (N_e). (A) Regression: log₁₀(u_idG_e) = 2.23(0.48) – 0.73(0.07)log₁₀N_e (r² = 0.89, P = 6.81 × 10⁻⁸, d.f. = 13), with SE of parameter estimates shown in parentheses. Blue circles represent bacteria, red circles multicellular eukaryotes, and black circles unicellular eukaryotes, with all data summarized in Table 1. The full list of indel events for analyzed organisms is presented in Dataset S4. Chromosomal distributions of indel events at each site across all mutation-accumulation experiments are shown in Figure S1, A and B. (B) Relationship when adding the number of estimated noncoding sites under purifying selection into the effective genome size (G_c + G_nc) for eukaryotic organisms. Regression: log₁₀[uid(G_c + G_nc)] = 3.49(0.66) – 0.87(0.09)log₁₀N_e (r² = 0.87, P = 3.13 × 10⁻⁷, d.f. = 13).

Figure 2

Open in new tab Download slide

Relationship between indel events per site per generation (u_idG_e) and effective population size (N_e) after phylogenetic correction. (A) Standardized phylogenetically independent contrasts performed using Compare (Martins 2004), and the PDAP module in Mesquite (Garland et al. 1993), with branch lengths of 1.0. The regression equation of the contrasts through the origin is: u_idG_e = –0.60(0.07)N_e (r² = 0.83, P = 1.28 × 10⁻⁶, d.f. = 13), with SE in parentheses. (B) Phylogenetic tree showing the relationship between organisms.

Discussion

Because the DBH makes general predictions about the pattern of molecular and cellular evolution across the Tree of Life, because our focus is on one of the central determining factors in the evolutionary process (the mutation rate), and because the patterns appear so strong, it is essential to consider the range of factors that might give rise to the observed statistical relationships, and also to alternative evolutionary hypotheses for them. We first consider three issues with respect to estimating the key parameters N_e, u_bs, u_id, and G_e, and then elaborate on the significance and implications of the relationship between u_idG_e and N_e for our understanding of molecular evolution.

First, we address the estimation of N_e, one of the most difficult issues in empirical population genetics. Because populations fluctuate in density over time, any estimate of N_e must reflect a long-term average, presumably approximating a harmonic mean, not the immediate population state. Because evolution is a long-term process, however, the mean is most relevant to the issues being examined herein. Recent selective sweeps or population bottlenecks can transiently modify levels of genetic variation at individual loci (Charlesworth 2009; Karasov et al. 2010), introducing noise into any estimates of N_e derived from limited numbers of genetic loci, but this would reduce the strength of any true underlying correlation between the rate of mutation (u_idG_e), and long-term N_e, i.e., would operate against our ability to detect the expected signal of the DBH.

Such effects are especially likely in asexual species, where the possibility of reduced recombination might subject many neutral nucleotide sites to the effects of selection on nearby, linked sites. Thus, to minimize sampling error, wherever possible, we have relied upon genome-wide sampling of the number of segregating sites to obtain a low-variance estimator of N_eu from observations on silent sites (Watterson 1975). The utilization of an average θ_s across a large number of nucleotide sites and individual isolates reduces the effects of evolutionary sampling variance associated with chromosomally localized and population-specific sweeps arising within individual species (Fu and Li 1993). Using available genomic data, we calculated θ_s across a large number of within-species genotypic isolates, excluding nearly identical lab strains that originated from the same individual (see Materials and Methods). Although no estimates of silent-site diversity (the source of N_e estimates) are without error, estimates derived from segregating polymorphic sites across large-scale genomic data sets appear quite robust (Figure S2). Moreover, should the levels of variation sampled in our various study species reflect recent events, to which mutation-rate evolution has not had adequate time to respond (Brandvain and Wright 2016), this would only introduce noise into the relationship between effective population size and mutation rates.

Second, as we have noted earlier, there is some concern that correlations between estimates of mutation rates and N_e could, in part, be spurious artifacts resulting from the use of estimates of N_e obtained by dividing measures of standing variation at silent-sites by u_bs (Sung et al. 2012a). If the sampling variance of u_bs is substantial enough, this could lead to a negative correlation between the observed u_bs and extrapolated N_e estimates, and, if there were a sampling covariance between u_bs and u_id, this could carry over into the current study. In the Supplemental Material (File S1, Figure S4, Figure S5, Figure S6 and Figure S7), we provide complementary analyses to that in Sung et al. (2012a), indicating that the sampling variance of u_bs from WGS-MA studies is not large enough to explain the negative correlation previously seen between u_bs and N_e estimates. Because u_bs and u_id are measured by different methods, the sampling covariance between these two measures is expected to be negligible. We emphasize that it is the sampling variance, not the evolutionary variance, that is of concern here. The variance of the log-scaled values of u_bs would have to exceed the log-scaled values of N_e by ∼two orders of magnitude in order to create the negative correlations that we observe (File S1). As an extreme way of looking at the situation, if silent-site variation were constant across all taxa, and the parametric values of mutation rates and N_e were obtained without error, the only explanation for the data would be a true underlying negative evolutionary covariance between the two features. In fact, there is a marginal negative correlation between estimates of π_s and u_bs (Figure S3, Figure S4, Figure S5, Figure S6, Figure S7, and Table S2), further bolstering the idea that u_bs and u_id decline evolutionarily as N_e increases.

Third, the DBH proposes that the strength of selection operating to reduce the indel mutation rate is based upon the total indel deleterious mutational load, i.e., the product of the mutational rate of appearance of indels at individual nucleotide sites (u_id), and the number of sites under selective constraint in the genome (G_e, approximated by the proteome size of the organism). However, some noncoding DNA (e.g., noncoding functional RNAs, and cis-regulatory units in untranslated regions or introns) is certainly under selective constraint, with mutations at these sites increasing the deleterious mutational load. Thus, it can be argued that the estimated number of nucleotides affecting fitness (G_e) scales differently than the protein-coding region of the genome, particularly in larger eukaryotic genomes with a considerable number of noncoding sites (Halligan et al. 2004; Siepel et al. 2005; Halligan and Keightley 2006). Difficulties can arise when estimating the proportion of noncoding DNA that is under selective constraint (G_nc), as the estimated number of such sites can vary greatly depending on the model used to define noncoding DNA, and the identification of conserved noncoding DNA is highly sensitive to the available phylogeny (Siepel et al. 2005). Nevertheless, if we sum the estimated total amount of noncoding DNA under selective constraint (G_nc, see File S1) with that of coding DNA (G_c), we find that u_id(G_c + G_nc) and N_e remain highly correlated (Figure 1B, r² = 0.87), simply because the fraction of functional noncoding DNA increases with the total amount of coding DNA.

We currently adhere to the DBH as an explanation for the phylogenetic pattern of mutation-rate variation primarily because it has been difficult to reconcile the patterns with alternative hypotheses. In the introduction, we provided arguments as to why selection for replication speed appears to be unlikely to explain a negative correlation between mutation rates and population size in unicellular species, and, in multicellular species, the simultaneous deployment of hundreds to thousands of origins of replication makes such an explanation even more unlikely. Nor does a general constraint on replication fidelity explain the data.

A second potential explanation for variation in the per-generation mutation rate is that it is driven largely by variation in numbers of germline cell divisions (Ness et al. 2012), but this cannot be reconciled with the fact that the base-substitution mutation rate scales negatively with N_e in analyses entirely restricted to unicellular species (Sung et al. 2012a). In all such species, there is one cell division per generation, and yet the base-substitution mutation rate per site per cell division ranges from ∼10⁻¹¹ in Paramecium tetraurelia (Sung et al. 2012b) to ∼10⁻⁸ in M. florum (Sung et al. 2012a). Similarly, the number of indel mutational events per site per cell division differs by over two orders of magnitude across unicellular organisms (Table 1 and Figure 3), and the negative regression with N_e remains significant when confined to unicellular species (Figure 1, r² = 0.66, P = 0.003).

Figure 3

Open in new tab Download slide

Relationship between the rate of indel events per site per generation (u_id), and the base-substitution mutation rate per site per generation (u_bs). Regression: log₁₀(u_id) = –1.56(0.74) + 0.91(0.08) log₁₀u_bs (r² = 0.90, P = 4.13 × 10⁻⁸, d.f. = 13). SE measurements are shown in parentheses. Blue circles represent eubacteria, red circles multicellular eukaryotes, and black circles unicellular eukaryotes, with all data summarized in Table 1.

A third hypothesis for mutation-rate evolution is that selection is effective enough to reduce the error rate to the point at which the physical laws of thermodynamics take over (Kimura 1967). However, it is difficult to reconcile this argument with the data now showing that mutation rates vary by three orders of magnitude, as there are no known mechanisms by which basic biophysical features (such as diffusion coefficients and stochastic molecular motion) would vary by this degree among the cytoplasms of different taxa. There is, of course, the issue of evolved differences in the biochemical features and efficiency of operation of the proteins involved in replication and repair. However, this type of variation is in the explanatory domain of the DBH. The DBH postulates that replication fidelity is typically not at the maximum possible level of refinement, but just the lowest level possible under the prevailing level of random genetic drift, which varies substantially among lineages.

That a decline in replication fidelity should decline with decreasing effective population size appears to be a unique prediction of the DBH. Although other theoretical work has been done on mutation-rate evolution, in no case is this type of scaling obviously predicted (acknowledging that this has not been a central focus of such work). For example, allowing for a role of beneficial mutations, Kimura (1967) and Leigh (1970) suggested that the long-term rate of adaptation is maximized when the genome-wide mutation rate equals the rate of population fixation of beneficial mutations. The precise predictions of this hypothesis are not entirely clear, but because mutations arise at a higher rate in large populations, and, if beneficial, fix with higher probabilities, a positive association between the mutation rate and N_e seems to be implied. A rather different model argues that populations should evolve genome-wide mutation rates equal to the average effect of a deleterious mutation (Orr 2000; Johnson and Barton 2002), which seems to imply an optimal mutation rate independent of population size (unless one wishes to postulate an association between average mutational effect and N_e, for which we are unaware of any evidence).

The DBH proposes that new alleles that reduce the genome-wide indel mutation rate (i.e., anti-mutators) can be promoted by selection only if they provide a significant enough advantage to offset the power of genetic drift. The average selective effect of an antimutator or mutator allele (which operate opposite to each other) can be approximated by st·∆U_id, with ∆U_id representing the change in the genome-wide indel mutation rate with respect to the population mean rate, s being the average reduction in fitness per mutation (Lynch 2010), and t being the number of generations a mutation remains associated with its mutator genetic background (Lynch 2011). ∆U_id can be approximated by the change in the indel mutation rate over the effective genome, or ∆u_idG_e (Lynch 2011). By setting st∆u_idG_e equal to the power of random genetic drift [1/N_e for haploids, 1/(2N_e) for diploids], we can acquire some sense of the average reduction in the indel mutation rate that is required for the power of selection to exceed power of genetic drift. Using estimates of an average value of the selective coefficient (s = 0.01) (Lynch et al. 1999; Eyre-Walker and Keightley 2007), and assuming that free recombination unlinks mutation-rate modifier alleles from their background every ∼2 generations in sexually outcrossing species (t = 2) (Lynch 2010), solving st∆u_idG_e = 1/N_e [= 1/(2N_e) for diploids] for ∆u_id suggests that the average antimutator must reduce the indel mutation rate by greater than ∼0.1–1% in most organisms (Table S1) in order to be promoted by selection. One major limitation of this kind of analysis is that values of s and t are not well known, and are likely vary across organisms. A second and equally important caveat is that the prior analysis assumes that mutator and antimutator alleles arise with equal frequency. Owing to the high level of refinement of the replication and repair machinery, it seems much more likely that mutations involving the components of such machinery will increase rather than decrease the mutation rate. This will push the equilibrium mutation rate to higher levels than expected (Lynch 2008), although without quantitative information on such bias, it is difficult to determine the exact position at which the mutation rate will stall.

Finally, we note that because recombination unlinks alleles from their genetic background, the capacity of selection to enhance replication fidelity is ultimately a function of the recombination rate (Kimura 1967; Lynch 2008). Thus, it may be viewed as surprising that bacteria, which do not undergo meiotic recombination, exhibit a relationship between u_id and N_e similar to that in eukaryotic species engaging in periodic to regular meiosis (Figure 1, A and B). It should be noted, however, that bacterial recombination occurs through multiple mechanisms (transformation, conjugation, and/or transduction). Many bacterial species are known to naturally undergo high rates of recombination, with ratios of recombination to mutation rates frequently being comparable to those in multicellular eukaryotes (Feil and Spratt 2001; Lynch 2007; Doroghazi et al. 2014; Lassalle et al. 2015), so, in this sense, comparable behavior of bacterial and eukaryotic species is not unexpected.

In summary, as in our previous work on the base-substitution mutation rate (Sung et al. 2012a), the strong correlation between the genome-wide indel rate and N_e appears not to be a statistical artifact. Moreover, among various hypotheses that have been suggested for mutation-rate evolution, the DBH appears to provide the most compatible explanation for the ∼1000-fold range of variation of this trait across the Tree of Life. As noted above, the molecular mechanisms that generate and resolve base-substitution and indel mutations differ in a number of ways, and the rate of occurrence of these two types of mutations differ by one to two orders of magnitude (with u_id ranging from 1.8 to 11.9% of u_bs, presumably because of the elevated deleterious effects of indel mutations). Yet, despite these differences, both u_bs and u_id scale similarly with changes in N_e (Figure 3, r² = 0.89). Because the forces of mutation, selection, and drift apply to all biological traits, the maximum achievable level of refinement for other fundamental cellular traits may also be influenced by the drift barrier.

Acknowledgments

Support was provided by the Multidisciplinary University Research Initiative Award W911NF-09-1-0444, and from the US Army Research Office to M. L., P. Foster, H. Tang, and S. Finkel, and W911NF-14-1-0411 to M. L., P. Foster, J. McKinlay, and J. T. Lennon, by CAREER award DEB-0845851 from the National Science Foundation to V. C., and by National Institutes of Health Awards F32-GM103164 to W.S., and R01-GM036827 to M. L. and W. K. Thomas. This material is based upon work supported by the National Science Foundation under grant no. CNS-0521433, CNS-0723054, and ABI-1062432 to Indiana University.

Author contributions: W.S., C.F., V.C., and M.L. designed the research; W.S., M.A., M.D., and T.P. performed the research; W.S. and M.A. analyzed the data; and W.S., M.A., and M.L. wrote the paper.

Footnotes

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.116.030890/-/DC1

Communicating editor: S. I. Wright

Literature Cited

Brandvain

Y

,

Wright

S I

,

2016

The limits of natural selection in a nonequilibrium world.

Trends Genet.

32

:

201

–

210

.

Campbell

C D

,

Eichler

E E

,

2013

Properties and rates of germline mutations in humans.

Trends Genet.

29

:

575

–

584

.

Charlesworth

B

,

2009

Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation.

Nat. Rev. Genet.

10

:

195

–

205

.

Conrad

D F

,

Keebler

J E

,

DePristo

M A

,

Lindsay

S J

,

Zhang

Y

et al. ,

2011

Variation in genome-wide mutation rates within and between human families.

Nat. Genet.

43

:

712

–

714

.

Denver

D R

,

Morris

K

,

Lynch

M

,

Thomas

W K

,

2004

High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome.

Nature

430

:

679

–

682

.

Denver

D R

,

Feinberg

S

,

Estes

S

,

Thomas

W K

,

Lynch

M

,

2005

Mutation rates, spectra and hotspots in mismatch repair-deficient Caenorhabditis elegans.

Genetics

170

:

107

–

113

.

Denver

D R

,

Dolan

P C

,

Wilhelm

L J

,

Sung

W

,

Lucas-Lledo

J I

et al. ,

2009

A genome-wide view of Caenorhabditis elegans base-substitution mutation processes.

Proc. Natl. Acad. Sci. USA

106

:

16310

–

16314

.

Doroghazi

J R

,

Buckley

D H

,

2014

Intraspecies comparison of Streptomyces pratensis genomes reveals high levels of recombination and gene conservation between strains of disparate geographic origin.

BMC Genomics

15

:

970

.

Drake

J W

,

1991

A constant rate of spontaneous mutation in DNA-based microbes.

Proc. Natl. Acad. Sci. USA

88

:

7160

–

7164

.

Drake

J W

,

Charlesworth

B

,

Charlesworth

D

,

Crow

J F

,

1998

Rates of spontaneous mutation.

Genetics

148

:

1667

–

1686

.

Eyre-Walker

A

,

Keightley

P D

,

2007

The distribution of fitness effects of new mutations.

Nat. Rev. Genet.

8

:

610

–

618

.

Feil

E J

,

Spratt

B G

,

2011

Recombination and the population structures of bacterial pathogens.

Annu. Rev. Microbiol.

55

:

561

–

590

.

Fu

Y X

,

1995

Statistical properties of segregating sites.

Theor. Popul. Biol.

48

:

172

–

197

.

Fu

Y X

,

Li

W H

,

1993

Statistical tests of neutrality of mutations.

Genetics

133

:

693

–

709

.

Garland

T

,

Dickerman

A W

,

Janis

C M

,

Jones

J A

,

1993

Phylogenetic analysis of covariance by computer-simulation.

Syst. Biol.

42

:

265

–

292

.

Halligan

D L

,

Keightley

P D

,

2006

Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison.

Genome Res.

16

:

875

–

884

.

Halligan

D L

,

Eyre-Walker

A

,

Andolfatto

P

,

Keightley

P D

,

2004

Patterns of evolutionary constraints in intronic and intergenic DNA of Drosophila.

Genome Res.

14

:

273

–

279

.

Haudry

A

,

Platts

A E

,

Vello

E

,

Hoen

D R

,

Leclercq

M

et al. ,

2013

An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions.

Nat. Genet.

45

:

891

–

898

.

Johnson

T

,

Barton

N H

,

2002

The effect of deleterious alleles on adaptation in asexual populations.

Genetics

162

:

395

–

411

.

Karasov

T

,

Messer

P W

,

Petrov

D A

,

2010

Evidence that adaptation in Drosophila is not limited by mutation at single sites.

PLoS Genet.

6

:

e1000924

.

Kibota

T T

,

Lynch

M

,

1996

Estimate of the genomic mutation rate deleterious to overall fitness in E. coli.

Nature

381

:

694

–

696

.

Kimura

M

,

1967

On the evolutionary adjustment of spontaneous mutation rates.

Genet. Res.

9

:

23

–

24

.

Kimura

M

,

1983

The Neutral Theory of Molecular Evolution

,

Cambridge University Press

,

Cambridge, UK.

Kong

A

,

Frigge

M L

,

Masson

G

,

Besenbacher

S

,

Sulem

P

et al. ,

2012

Rate of de novo mutations and the importance of father’s age to disease risk.

Nature

488

:

471

–

475

.

Krokan

H E

,

Bjoras

M

,

2013

Base excision repair.

Cold Spring Harb. Perspect. Biol.

5

:

a012583

.

Kunkel

T A

,

2009

Evolving views of DNA replication (in)fidelity.

Cold Spring Harb. Symp. Quant. Biol.

74

:

91

–

101

.

Lassalle

F

,

Perian

S

,

Bataillon

T

,

Nesme

X

,

Duret

L

et al. ,

2015

GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands.

PLoS Genet.

11

:

e1004941

.

Lee

H

,

Popodi

E

,

Tang

H

,

Foster

P L

,

2012

Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing.

Proc. Natl. Acad. Sci. USA

109

:

e2774

–

e2783

.

Leigh

E G

Jr.,

1970

Natural selection and mutability.

Am. Nat.

104

:

301

–

305

.

Li

H

,

Durbin

R

,

2009

Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics

25

:

1754

–

1760

.

Li

H

,

Handsaker

B

,

Wysoker

A

,

Fennell

T

,

Ruan

J

et al. ,

2009

The sequence alignment/map format and SAMtools.

Bioinformatics

25

:

2078

–

2079

.

Lipinski

K J

,

Farslow

J C

,

Fitzpatrick

K A

,

Lynch

M

,

Katju

V

et al. ,

2011

High spontaneous rate of gene duplication in Caenorhabditis elegans.

Curr. Biol.

21

:

306

–

310

.

Loh

E

,

Salk

J J

,

Loeb

L A

,

2010

Optimization of DNA polymerase mutation rates during bacterial evolution.

Proc. Natl. Acad. Sci. USA

107

:

1154

–

1159

.

Lynch

M

,

2007

The Origins of Genome Architecture

,

Sinauer Associates

,

Sunderland, Massachusetts.

Google Preview

Lynch

M

,

2008

The cellular, developmental and population-genetic determinants of mutation-rate evolution.

Genetics

180

:

933

–

943

.

Lynch

M

,

2010

Evolution of the mutation rate.

Trends Genet.

26

:

345

–

352

.

Lynch

M

,

2011

The lower bound to the evolution of mutation rates.

Genome Biol. Evol.

3

:

1107

–

1118

.

Lynch

M

,

Marinov

G K

,

2015

The bioenergetic costs of a gene.

Proc. Natl. Acad. Sci. USA

112

:

15690

–

15695

.

Lynch

M

,

Blanchard

J

,

Houle

D

,

Kibota

T

,

Schultz

S

et al. ,

1999

Spontaneous deleterious mutation.

Evolution

53

:

645

–

663

.

Lynch

M

,

Sung

W

,

Morris

K

,

Coffey

N

,

Landry

C R

et al. ,

2008

A genome-wide view of the spectrum of spontaneous mutations in yeast.

Proc. Natl. Acad. Sci. USA

105

:

9272

–

9277

.

Martins, E. P., 2004 Compare, Version 4.6b. Computer Programs for the Statistical Analysis of Comparative Data. Department of Biology, Indiana University, Bloomington, IN. Available at: http://compare.bio.indiana.edu.

McCulloch

S D

,

Kunkel

T A

,

2008

The fidelity of DNA synthesis by eukaryotic replicative and translesion synthesis polymerases.

Cell Res.

18

:

148

–

161

.

Mira

A

,

Ochman

H

,

Moran

N A

,

2001

Deletional bias and the evolution of bacterial genomes.

Trends Genet.

17

:

589

–

596

.

Morita

R

,

Nakane

S

,

Shimada

A

,

Inoue

M

,

Iino

H

et al. ,

2010

Molecular mechanisms of the whole DNA repair system: a comparison of bacterial and eukaryotic systems.

J. Nucleic Acids

2010

:

179594

.

Ness

R W

,

Morgan

A D

,

Colegrave

N

,

Keightley

P D

,

2012

Estimate of the spontaneous mutation rate in Chlamydomonas reinhardtii.

Genetics

192

:

1447

–

1454

.

Ness

R W

,

Kraemer

S A

,

Colegrave

N

,

Keightley

P D

,

2015

Direct estimate of the spontaneous mutation rate uncovers the effects of drift and recombination in the Chlamydomonas reinhardtii plastid genome.

Mol. Biol. Evol.

33

:

800

–

808

.

O’Roak

B J

,

Deriziotis

P

,

Lee

C

,

Vives

L

,

Schwartz

J J

et al. ,

2011

Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations.

Nat. Genet.

43

:

585

–

589

.

O’Roak

B J

,

Vives

L

,

Girirajan

S

,

Karakoc

E

,

Krumm

N

et al. ,

2012

Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations.

Nature

485

:

246

–

250

.

Orr

H A

,

2000

The rate of adaptation in asexuals.

Genetics

155

:

961

–

968

.

Ossowski

S

,

Schneeberger

K

,

Lucas-Lledo

J I

,

Warthmann

N

,

Clark

R M

et al. ,

2010

The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana.

Science

327

:

92

–

94

.

Sancar

A

,

Lindsey-Boltz

L A

,

Unsal-Kacmaz

K

,

Linn

S

,

2004

Molecular mechanisms of mammalian DNA repair and the DNA damage checkpoints.

Annu. Rev. Biochem.

73

:

39

–

85

.

Schrider

D R

,

Houle

D

,

Lynch

M

,

Hahn

M W

,

2013

Rates and genomic consequences of spontaneous mutational events in Drosophila melanogaster.

Genetics

194

:

937

–

954

.

Siepel

A

,

Bejerano

G

,

Pedersen

J S

,

Hinrichs

A S

,

Hou

M

et al. ,

2005

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.

Genome Res.

15

:

1034

–

1050

.

Sniegowski

P

,

Raynes

Y

,

2013

Mutation rates: how low can you go?

Curr. Biol.

23

:

R147

–

R149

.

Sniegowski

P D

,

Gerrish

P J

,

Johnson

T

,

Shaver

A

,

2000

The evolution of mutation rates: separating causes from consequences.

BioEssays

22

:

1057

–

1066

.

Sung

W

,

Ackerman

M S

,

Miller

S F

,

Doak

T G

,

Lynch

M

,

2012

a

Drift-barrier hypothesis and mutation-rate evolution.

Proc. Natl. Acad. Sci. USA

109

:

18488

–

18492

.

Sung

W

,

Tucker

A E

,

Doak

T G

,

Choi

E

,

Thomas

W K

et al. ,

2012

b

Extraordinary genome stability in the ciliate Paramecium tetraurelia.

Proc. Natl. Acad. Sci. USA

109

:

19339

–

19344

.

Sung

W

,

Ackerman

M S

,

Gout

J F

,

Miller

S F

,

Williams

E

et al. ,

2015

Asymmetric context-dependent mutation patterns revealed through mutation-accumulation experiments.

Mol. Biol. Evol.

32

:

1672

–

1683

.

Tajima

F

,

1989

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

Genetics

123

:

585

–

595

.

The 1000 Genomes Project Consortium

,

2015

A global reference for human genetic variation.

Nature

526

:

68

–

74

.

Uchimura

A

,

Higuchi

M

,

Minakuchi

Y

,

Ohno

M

,

Toyoda

A

et al. ,

2015

Germline mutation rates and the long-term phenotypic effects of mutation accumulation in wild-type laboratory mice and mutator mice.

Genome Res.

25

:

1125

–

1134

.

Vieira-Silva

S

,

Touchon

M

,

Rocha

E P

,

2010

No evidence for elemental-based streamlining of prokaryotic genomes.

Trends Ecol. Evol.

25

:

319

–

320

; author reply 320–311.

Wang

H

,

Zhu

X

,

2014

De novo mutations discovered in 8 Mexican American families through whole genome sequencing.

BMC Proc.

8

:

S24

.

Watterson

G A

,

1975

On the number of segregating sites in genetical models without recombination.

Theor. Popul. Biol.

7

:

256

–

276

.

Whitney

K D

,

Garland

T

Jr.,

2010

Did genetic drift drive increases in genome complexity?

PLoS Genet.

6

:

e1001080

.

Yang

S

,

Wang

L

,

Huang

J

,

Zhang

X

,

Yuan

Y

et al. ,

2015

Parent-progeny sequencing indicates higher mutation rates in heterozygotes.

Nature

523

:

463

–

467

.

Zhu

Y O

,

Siegal

M L

,

Hall

D W

,

Petrov

D A

,

2014

Precise estimates of mutation rate and spectrum in yeast.

Proc. Natl. Acad. Sci. USA

111

:

e2310

–

e2318

.