The Rate and Spectrum of Spontaneous Mutations in Mycobacterium smegmatis, a Bacterium Naturally Devoid of the Postreplicative Mismatch Repair Pathway

Mycobacterium smegmatis is a bacterium that is naturally devoid of known postreplicative DNA mismatch repair (MMR) homologs, mutS and mutL, providing an opportunity to investigate how the mutation rate and spectrum has evolved in the absence of a highly conserved primary repair pathway. Mutation accumulation experiments of M. smegmatis yielded a base-substitution mutation rate of 5.27 × 10−10 per site per generation, or 0.0036 per genome per generation, which is surprisingly similar to the mutation rate in MMR-functional unicellular organisms. Transitions were found more frequently than transversions, with the A:T→G:C transition rate significantly higher than the G:C→A:T transition rate, opposite to what is observed in most studied bacteria. We also found that the transition-mutation rate of M. smegmatis is significantly lower than that of other naturally MMR-devoid or MMR-knockout organisms. Two possible candidates that could be responsible for maintaining high DNA fidelity in this MMR-deficient organism are the ancestral-like DNA polymerase DnaE1, which contains a highly efficient DNA proofreading histidinol phosphatase (PHP) domain, and/or the existence of a uracil-DNA glycosylase B (UdgB) homolog that might protect the GC-rich M. smegmatis genome against DNA damage arising from oxidation or deamination. Our results suggest that M. smegmatis has a noncanonical Dam (DNA adenine methylase) methylation system, with target motifs differing from those previously reported. The mutation features of M. smegmatis provide further evidence that genomes harbor alternative routes for improving replication fidelity, even in the absence of major repair pathways.

spectrum of spontaneous mutations in an organism (Lynch et al. 2008;Halligan and Keightley 2009), allowing us to examine the forces driving the mutation process.
The general strategy of a bacterial MA experiment is to repeatedly bottleneck parallel lineages originated from a single cell for hundreds to thousands of generations. In this process, the strong bottlenecks minimize the efficacy of selection, enabling all but the most severely deleterious mutations to accumulate in an effectively neutral fashion (Muller 1927(Muller , 1928Bateman 1959;Mukai 1964;Kibota and Lynch 1996). Through the MA process, unbiased estimates of genome-wide spontaneous mutation rates and spectra have been characterized for a number of eukaryotic and prokaryotic organisms (Lynch et al. 2008;Denver et al. 2009;Keightley et al. 2009Keightley et al. , 2014Keightley et al. , 2015Ossowski et al. 2010;Lee et al. 2012;Sung et al. 2012aSung et al. , 2012bBehringer and Hall 2015;Dillon et al. 2015;Farlow et al. 2015;Long et al. 2015b;Ness et al. 2015), and have led to a general hypothesis explaining how mutation rates have evolved (Lynch 2010(Lynch , 2011Sung et al. 2012a). However, it remains unclear how DNA replication and repair interact to ultimately determine the mutation rate. Thus, further comparative work is needed to understand alternative evolutionary solutions to setting cellular mutation rates.
The Mycobacterium genus consists of a biologically diverse group of bacteria, 120 or more described species, including both human obligate pathogens, such as M. tuberculosis and M. leprae, and free-living saprophytes, such as M. smegmatis (Smith et al. 2009). M. smegmatis is relatively fast-growing, nonpathogenic, and genetically facile, so it provides an accessible model to study Mycobacteria in general (Snapper et al. 1990;Shiloh and DiGiuseppe Champion 2010). M. smegmatis has a genome of 7 Mb (Mohan et al. 2015), which is larger than most other Mycobacterium strains, including members of the pathogenic M. tuberculosis complex (MTB complex,4.4 Mb), and M. leprae (3.3 Mb) (Brosch et al. 2001). The reduction in genome size of these mycobacterial pathogens is attributed to pathogenicity evolution (Brosch et al. 2001). Mycobacteria are classified as Actinomycetales, and some members of this group, such as Nocardia and Corynebacterium, have unusually high genomic GC-contents when compared to other bacteria (65.6%-GC in M. smegmatis). Genome sequencing has revealed that Mycobacteria, like all Actinomycetales, do not have any identifiable genes encoding the widely conserved mutLS-based postreplicative mismatch repair (MMR) system (Cole et al. 1998;Ford et al. 2013), suggesting that Mycobacteria lack canonical MMR, and thus might have unusual mutation features.
MMR maintains the fidelity of genomes by typically removing a fraction of the replication errors (Kunkel and Erie 2005;Lee et al. 2012). Previous studies showed that mutation rates of some MMRknockout organisms are 10-100 · higher than in MMR-functional organisms (Lee et al. 2012;Lang et al. 2013). In addition, given the fact that MMR deficiency is common in some species (Garcia-Gonzales et al. 2012), an organism can significantly reduce DNA damage by using other repair or prevention pathways. Thus, because M. smegmatis is naturally devoid of known MMR genes, it may have compensatory mechanisms for efficient protection against mutations.

Mutation accumulation
Eighty independent M. smegmatis MC2 155 (ATCC 700084) MA lines were initiated from a single colony. 7H10 agar medium, with 0.5% glycerol and OADC enrichment 10% as recommended by ATCC, was used for the mutation-accumulation line transfers.
Every 2 d, a single isolated colony from each MA line was transferred by streaking to a new plate, ensuring that each line regularly passed through a single-cell bottleneck (Kibota and Lynch 1996). Each line passed through 4900 cell divisions (Supplemental Material, Table S1). The bottlenecking procedure used for this experiment ensures that mutations accumulate in an effectively neutral fashion. MA lines were incubated at 37°under aerobic conditions. Frozen stocks of all lineages were prepared by growing a final colony per isolate in 1 ml 7H9 broth medium with 0.2% glycerol and ADC enrichment 10%, incubated overnight at 37°, and frozen in 20% glycerol at 280°.

DNA extraction and sequencing
The 75 lines that survived through the end of MA were prepared for whole genome sequencing. DNA was extracted with the Wizard Genomic DNA Purification kit (Promega, Madison, WI). DNA libraries for Illumina HiSequation 2500 sequencing (insert size 300 bp) were constructed using the Nextera DNA Sample Preparation kit (Illumina, San Diego, CA). Paired-end 150-nt read sequencing of MA lines was done by the Hubbard Center for Genome Studies, University of New Hampshire, with an average sequencing depth of 126 · across all lines (Table S1).

Mutation identification and analyses
A consensus approach for identifying fixed base substitutions and small-indels in the MA lines was modified from Sung et al. (2015). Briefly, paired-end reads from each MA line were mapped to the reference genome (GenBank accession number: NC_018289.1) using BWA 0.6.2 (Li and Durbin 2009), and read alignment and duplicateread removal around indels was performed using GATK (McKenna et al. 2010;DePristo 2011). The output was parsed with SAMTOOLS ); mapped reads needed to pass filters for sequencing/PCR/mismapping errors; 26 lines were removed from the final analysis due to library construction failure or cross-line contamination (Table S1). Candidate mutations were called if they differed from the consensus sequence of all MA lines. Using the BAM and SAM formatted files from the BWA pipeline, BreakDancer 1.1.2 (Chen et al. 2009) and Pindel 0.2.4w (Ye et al. 2009) were also used to realign reads and identify small-indels. Both the consensus pipelines and the realignment programs support the final reported indels.

Statistics and calculations
We used R v3.1.0 (R Development Core Team 2014) for all statistical tests and calculations; 95% Poisson confidence intervals were calculated using a x 2 estimation (Johnson and Kemp 1993).

Data availability
Raw sequence reported in this study has been deposited in NCBI SRA (Bioproject No.: PRJNA320082; Study No.: SRP074205).

RESULTS
To estimate the mutation rate in M. smegmatis MC2 155, a mutationaccumulation experiment was carried out for 381 d (4900 generations) with 80 independent lineages, all derived from the same ancestral colony of M. smegmatis. Every 2 d, a single colony from each line was restreaked onto a fresh plate, minimizing the effective population size. We analyzed the mutation rate and spectrum across 49 MA lines that were successfully sequenced and not contaminated by other MA lines.

Mutation rates
Across the 49 sequenced M. smegmatis MA lines (with an average of 6.77 Mb analyzable sequence per line, 97% of the total genome), we identified 856 base-substitution changes (Table S1 and Table S2), yielding an overall base-substitution mutation rate of 5.27 · 10 210 (SE = 1.93 · 10 211 ) per site per generation, or 0.0036 per genome per generation. Our analysis also reveals 207 short insertions and deletions 1-27 bps in length (141 insertions, and 66 deletions), yielding an insertion/deletion rate of 1.27 · 10 210 (SE = 1.08 · 10 211 ) per site per generation (Table S1 and Table S3). Although the insertion rate is 2.1 · greater than the deletion rate, the total size of all insertions is 206 bp while the deletions total 225 bp, resulting in a net loss of 19 bp in DNA sequence across all lines, consistent with the universal prokaryotic deletion bias hypothesis (Mira et al. 2001). Of the small indels, 78.74% occur in simple sequence repeats (SSRs), e.g., homopolymer runs (Table S3), and these small-indels comprised 15.33% of all mutations.
Using the annotated M. smegmatis MC2 155 genome (NCBI accession: NC_018289.1), we identified the functional context of each base substitution (Table S2). Across the 49 lines, 716 of the 856 (83.64%) substitutions are in coding regions (90% of the genome represents coding regions), while the remaining 140 are found at noncoding sites (Table S2). To test for the absence of selection in our experiment, we asked whether the ratio of nonsynonymous to synonymous mutations is significantly different from the random expectation. Given the codon usage and the transition/transversion ratio (see below) in M. smegmatis, the expected ratio of nonsynonymous to synonymous mutations is 2.60, which is not significantly different from the observed ratio of 2.11 (486/230) (x 2 = 3.00, P . 0.01). Thus, selection does not appear to have had a significant influence on the distribution of mutations in this experiment.

Comparison of mutation rates with various bacteria
Mutation rates in MMR-deficient genome backgrounds in several prokaryotic and eukaryotic organisms have been investigated by using whole-genome sequencing of mutation-accumulation lines. Most of these studies have found that MMR deficiency results in a .100-fold increase in the mutation rate compared to wild-type lines. In striking contrast, MMR-devoid M. smegmatis has a mutation rate comparable to that of other naturally MMR-proficient wild-type organisms (Table 1). These results suggest that M. smegmatis employs mechanisms that somehow compensate for the absence of MMR in order to reach the same mutation rate as other organisms that harbor the essential MMR enzymes.
Previous studies have observed low levels of nucleotide diversity in M. tuberculosis and M. leprae populations (Sreevatsan et al. 1997;Monot et al. 2009), and have proposed that this is a result of recent population bottlenecks (Smith et al. 2009). However, low levels of nucleotide diversity can also be explained by low mutation rates (Lynch 2010). Low mutation rates observed in pathogenic strains of M. tuberculosis (2-to 7-fold lower than that observed in M. smegmatis in this study) are consistent with the latter explanation (Ford et al. 2011). However, the mutation rate difference between M. smegmatis and M. tuberculosis could also result from different experimental systems.  (Table S1), which is significantly higher than the m G=C/A=T rate (95% Poisson confidence intervals for m G=C/A=T 3.9624.74 · 10 210 , for m A=T/G=C 5.4026.75 · 10 210 ). Given these conditional A/T4G/C mutation rates, the expected GC content from mutation alone is 58.2% (SE = 4.67%), significantly lower than the actual chromosomal GC content of 65.6%.
Methylated bases are mutational hotspots in bacteria (Schaaper and Dunn 1991;Lee et al. 2012). Previous studies have found that mycobacterial species contain methyltransferases that are not canonical Dam or Dcm DNA methyltransferases (Shell et al. 2013;Sharma et al. 2015;Zhu et al. 2016), but are associated with the presence of 6-methyladenine in their genomes (Shell et al. 2013). We examined mutation rates at noncanonical Dam target sites (Schlagman and Hattman 1989;Clark et al. 2012;Shell et al. 2013), and previously suggested noncanonical methylation sites in other bacteria (Long et al. 2015a), and found that 45% of the A:T/C:G transversions (50 of 111) fall in motifs of 59GACC39 (30) and 59CACC39 (20), a 6.8-fold elevation from the transversion rate of A:T sites not falling in these motifs. The mutation hotspots at noncanonical Dam target sites suggest methylation at these sites (Table S4). Surprisingly, the n reported Mycobacterial Adenine Methyltransferase sites 59GAATTC39 (Nikolaskaya et al. 1985) and 59CTGGAG39 (Shell et al. 2013) are not enriched for A:T/C:G transversions, suggesting that these sites are not routinely methylated in the M. smegmatis genome.
Although the presence of 5-methylcytosines was previously reported in M. tuberculosis and M. smegmatis genomes (Srivastava et al. 1981;Hemavathy and Nagaraja 1995), recent studies have found no 5-methylcytosine modification in the genomes of M. tuberculosis complex strains (Shell et al. 2013;Zhu et al. 2016). In our study, 21% (65 of 302) of the G:C/A:T transitions fall in the motifs of 59CCGC39, 59CGCC39, 59CGCG39, and 59CGGC39, which were not reported previously, and 48% (145 of 302) fall in 59CpG39 sites, 3.5-fold elevated from cytosines not in these sites (Table S5). As shown in yeasts, cytosines at 59CpG39 sites may have an elevated mutation rate even without methylation (Zhu et al. 2014;Behringer and Hall 2015;Farlow et al. 2015).

DISCUSSION
Because most mutations have slightly deleterious fitness effects (Baer et al. 2007;Eyre-Walker and Keightley 2007), natural selection is thought to operate to minimize replication errors and maximize DNA repair efficiency (Kimura 2009). It has been proposed that the efficacy of selection in reducing mutation rates is determined by the power of random genetic drift, which is inversely proportional to the effective population size (Lynch 2010, Lynch 2011. Given this theoretical framework, because population sizes in free-living bacteria are expected to be large (on the order of 10 7 -10 9 ), we expect different species to have roughly similar per genome mutation rates if they have similar population sizes (Sung et al. 2012a;Sniegowski and Raynes 2013). Consistent with this idea, M. smegmatis has roughly the same mutation rate as other free-living bacteria (Lee et al. 2012;Long et al. 2015b;Sung et al. 2015). Yet M. smegmatis lacks critical MMR enzymes, suggesting that either Mycobacterium pre-MMR replication fidelity is higher than that of other prokaryotes, or that alternative biochemical mechanisms are used to arrive at the equivalent mutation rates.
Alternative pathways for replication fidelity Three main processes influence DNA-replication fidelity: nucleotide insertion fidelity of the DNA polymerase, removal of mispaired nucleotides by the DNA proofreading exonuclease, and MMR. Sequential action of these three steps is responsible for the typically low bacterial error rate of 10 210 per base replicated (Schaaper 1993;Kunkel 2004). However, it remains possible that a deficiency in any one of these processes may be compensated for by increased fidelity in the others (Lynch 2012): in M. smegmatis, as in the case of Deinococcus radiodurans (Long et al. 2015a), it appears that a mechanism arising from such evolutionary layering must compensate for MMR deficiency.
Replication of the Escherichia coli chromosome is performed by the DNA polymerase III holoenzyme, which replicates the leading and lagging strands simultaneously (Kelman and O'Donnell 1995). The alpha (a) and epsilon (ɛ) subunits have a major effect on fidelity of the DNA polymerase III holoenzyme, allowing DNA synthesis to proceed with 10 27 errors/bp replicated (prior to proofreading) (Schaaper 1993;Kunkel and Erie 2005). The proofreading subunit of the DNA polymerase, the epsilon (e) exonuclease, is also essential for high-fidelity DNA replication in E. coli, with inactivation increasing the mutation rate up to 200-fold (Schaaper 1993). However, surprisingly, Rock et al. (2015) found that although the proofreading exonuclease in M. tuberculosis is present, it is completely dispensable for fidelity, and an alternative exonuclease contributes to replicative fidelity in Mycobacteria. They found that the Mycobacterial DNA polymerase DnaE1 performs DNA proofreading with a polymerase and histidinol phosphatase (PHP) domain; inactivation of the PHP domain increased the mutation rate by more than 3000-fold (Rock et al. 2015). This decrease in proofreading fidelity suggests that the burden of DNA repair placed on MMR in other species may instead be placed onto the DnaE1 proofreader in Mycobacteria.

Role of DNA methylation in biased mutation spectrum
In the M. smegmatis genome a subset of adenine and cytosine sites have an elevated mutation rate. These sites are associated with specific sequence motifs: 45% of the A:T/C:G transversions (50 of 111) occur at adenines in the motifs 59GACC39 (30) and 59CACC39 (20), which are known noncanonical Dam methylation sites. In addition, 21% (65 of 302) of the G:C/A:T transitions occur at cytosines in the motifs 59CCGC39, 59CGCC39, 59CGCG39, and 59CGGC39, and overall 48% (145 of 302) of the G:C/A:T transitions fall in 59CpG39 sites. These two classes of mutation account for nearly a quarter of all base-substitution changes that we observed, and on balance they sum to a strongly biased G:C/A:T transition, making the overall A:T/C:G rate dependent on the mutation spectrum at unmethylated sites. While adenine methylation has been reported in M. smegmatis, cytosine methylation has not been seen (Shell et al. 2013;Zhu et al. 2016), although our results suggest this should be reexamined.
Alternative forms of DNA repair Different DNA repair processes may generate the unusual mutation spectrum observed in M. smegmatis. A near universal mutation bias toward A/T has been observed in most species (Hershberg and Petrov 2010), but we find a bias to G/C in M. smegmatis. Notably, M. smegmatis is GC-rich, suggesting that mutation bias may have a role in determining the GC content in this genome. For the species in which a GC mutation bias observed (Dillon et al. 2015;Long et al. 2015a), this may be a product of methylation, deamination, and/or repair. For example, Mycobacteria have a high level of redundancy in the base excision repair (BER) pathway (van der Veen and Tang 2015), which could reduce the number of G:C/T:A transversions (Wallace 2002) associated with oxidative damage (David et al. 2007). In both Bacillus subtilis and E. coli, MutY can compensate for MMR enzymes, by removing adenines that are mispaired with cytosines, and preventing G:C/A:T mutations (Kim et al. 2003;Bai and Lu 2007;Debora et al. 2011).
GC-rich genomes may deploy additional enzymes to survey the fidelity of GC-sites, which are highly susceptible to cytosine deamination (Dos Vultos et al. 2009).The UdgB enzyme plays a more important role in removing uracils in M. smegmatis than for bacteria with known MMR activities (Wanner et al. 2009;Malshetty et al. 2010). For example, based on conserved sequences, six Udg families have been identified in various eubacteria, with different substrate specificities (Pearl 2000;Sartori et al. 2002;Srinath et al. 2007;Lee et al. 2011). Mycobacteria encode one family 1 Ung, and one family 5 UdgB (Sartori et al. 2002). The latter has only been characterized in a few organisms such as hyperthermophilic archaea and M. tuberculosis, and eukaryotes do not have this enzyme (Sartori et al. 2002;Starkuviene and Fritz 2002;Hoseki et al. 2003;Srinath et al. 2007). In vitro assays show that UdgB removes uracil from both ssDNA and dsDNA (Sartori et al. 2002), and excises hypoxanthine (Hx) from oligonucleotide substrates in vitro (Sartori et al. 2002;Srinath et al. 2007). Wanner et al. (2009) found that the mutation frequency in a udg knockout strain of M. smegmatis is 8-fold higher relative to wild type (Wanner et al. 2009). Furthermore, M. smegmatis Ung is more efficient at excising uracils from hairpin-loop substrates than that of E. coli (Purnapatre and Varshney 1998), and the frequency of mutations in double udgB ung mutants is 56-fold higher than in wild-type M. smegmatis (Wanner et al. 2009). Similar to these results, Malshetty et al. (2010) showed synergistic effects of UdgB and Ung in mutation prevention in M. smegmatis: the mutation rate of a udgB knockout is 2.1-fold higher, and the rate of a ung knockout is 8.4-fold higher than wild-type M. smegmatis. But the double knockout (udgBung -) shows a 19.6-fold increase in mutation rate (Malshetty et al. 2010). By contrast, uracil DNA glycosylase (ung) mutants increase mutation frequency by only 2-fold in B. subtilis (López-Olmos et al. 2012), and 5-fold in E. coli (Duncan and Weiss 1982).
In conclusion, we have shown that M. smegmatis has a typical bacterial mutation rate, even though it lacks the near-universal MMR system, has an unusual A:T/C:G biased mutation spectrum, and has motifs for both probable adenine and cytosine methylation, which act as genomic mutational hotspots. We have discussed possible mechanisms that allow M. smegmatis to evolve a low mutation rate despite the apparent absence of MMR. Consistent with the drift-barrier hypothesis (Lynch 2010;Sung et al. 2012a), M. smegmatis has evolved to a summed replication fidelity and repair rate equal to that observed in most freeliving bacteria. However, the lack of MMR in M. smegmatis necessitates compensatory selection to improve alternative enzymatic pathways that limit the mutation rate expected for its population size. Further biochemical assays are required to determine whether the discussed pathways replace the role of MMR with DNA replication fidelity, or if novel repair pathways exist.