Whole-Genome Sequence of the C57L/J Mouse Inbred Strain

We sequenced the complete genome of the widely used C57L/J mouse inbred strain. With 40× average coverage, we compared the C57L/J sequence with that of the C57BL/6J and identified many known as well as novel private variants. This genome sequence adds another strain to the growing number of mouse inbred strains with complete genome sequences and is a valuable resource to the scientific community.

Structural variant (SV) calling: Using the mapped reads, structural variants (insertions, deletions, and inversions) were called using BreakDancerMax ) and Pindel (Ye et al. 2009). The two call-sets were merged using SVmerge (Wong et al. 2010) and only SVs that were at least 100-bp long were retained.
Genotyping using the MegaMUGA DNA from the same mouse was submitted to GeneSeek (Lincoln, NE) for hybridization on the Mega Mouse Universal Genotyping Array (MegaMUGA), which provides 77,800 SNP markers built on the Illumina Infinium platform. Data were analyzed using BEDtools v2.17.0 (Quinlan and Hall 2010).
Data accession code: The BAM files (containing aligned and unaligned reads) can be accessed from the NCBI Sequence Read Archive (SRA) using the following SRA accession code: SRS635099.

Alignment and coverage
Sequencing of the C57L/J resulted in 1, 744,197,122 reads, of which 1,237,576,596 (71%) were considered of high enough quality for alignment. Reads were aligned to the published C57BL/6 genome [December 2011 release of the mouse reference genome (mm10) from UCSC]. Approximately 95.8% of the reference genome was covered by at least five reads, with a mean genome-wide coverage of 39.2· (mean n

Early stop codons and frame shifts
We categorized the identified intra-genic SNPs and indels as high (0.05%), moderate (0.38%), and low (0.32%) impact and focused on the 114 variants with high impact that were unique to C57L/J. These are variants that lead to splice site changes, frame shifts, loss of the start or stop site, and the gain of early stop codons. Among these is the SNP in Mlph (p.R31 Ã ) that leads to an early stop codon and gives C57L/J its distinct coat color. We performed Sanger sequencing on all 114 variants. Our Sanger sequencing of the high-impact variants gives a good indication of the false discovery rate. Of the 69 variants with an allele frequency below 0.8 that we tested, 12 were confirmed (FDR = 0.83) and the others were false positives. Among the 45 variants with an allele frequency of 1.0, 36 were confirmed (FDR = 0.20). The data for the confirmed high impact variants are summarized in Table 1.

DISCUSSION
Because the C57L/J strain is used regularly in mapping of quantitative traits like physical activity (Leamy et al. 2010), obesity (Taylor and Phillips 1997), and gallstones (Paigen et al. 2000), as well as a mapping strain for ENU mutants (Aljakna et al. 2012), obtaining the full genome sequence and a comparison with the related C57BL/6J is beneficial to the research community. It provides SNPs for denser genetic mapping as well as the rapid identification of possible causal variants in candidate genes. We sequenced the genome of a male C57L/J mouse and, subsequently, compared the sequence with that of the published genomes of 18 inbred strains (https://www.sanger.ac.uk/resources/mouse/ genomes/) (Keane et al. 2011) and the male and female C57BL/6J genomes. The $40· average coverage of the 2.7 billion base pair reference genome confirms the already known SNPs between C57BL/6J and C57L/J and presents a large number of novel SNPs.
We selected the C57L/J private variants and focused on the ones that had a high impact. Among these is the variant leading to a premature stop codon in Mlph (p.R31 Ã ), which causes the distinct leaden coat color. Striking in this list of 49 variants is the significant number of genes associated with susceptibility to viral infections (Klra17, H2-D1, and H2-T3). Several of these are within the confidence interval of a QTL for resistance to murine cytomegalovirus in a cross between C57L/J and MA/My (Stadnisky et al. 2010). According to a previous study, disruptions in Ankrd17 are embryonic lethal (Hou et al. 2009). Therefore, we were surprised to find a frame shift mutation and a premature stop codon in Ankrd17, which one would predict to lead to a similar phenotype, yet C57L/J mice are viable. Another interesting finding is that C57L/J has a unique variant in Oplah leading to a frame shift in the C-terminal part of 5-oxoprolinase. Mutations in this gene lead to 5-oxoprolinuria in humans (Calpena et al. 2013).
Despite its relatedness to C57BL/6J, the Mouse Phenome Database shows large phenotypic differences between the two strains. For example, both strains are on opposite extreme ends of the strain distribution for plasma sodium levels in 18-month-old female mice. Genetic mapping identified Nalcn as a candidate gene underlying this difference (Sinke et al. 2011), and comparing the coding sequence shows us a nonsynonymous SNP in exon 44 leading to a p.T1699S amino acid change.
In conclusion, we present a high-quality genome sequence of the C57L/J mouse inbred strain and further expand the number of strains with complete genome sequences. These data allow for better genetic mapping and identification of QTL genes when using the C57L/J strain. In addition, some of the variants unique to C57L/J might identify this strain as a novel model for some human phenotypes, like 5-oxoprolinase and plasma sodium levels.