Neo-sex Chromosomes in the Monarch Butterfly, Danaus plexippus

We report the discovery of a neo-sex chromosome in the monarch butterfly, Danaus plexippus, and several of its close relatives. Z-linked scaffolds in the D. plexippus genome assembly were identified via sex-specific differences in Illumina sequencing coverage. Additionally, a majority of the D. plexippus genome assembly was assigned to chromosomes based on counts of one-to-one orthologs relative to the butterfly Melitaea cinxia (with replication using two other lepidopteran species), in which genome scaffolds have been mapped to linkage groups. Sequencing coverage-based assessments of Z linkage combined with homology-based chromosomal assignments provided strong evidence for a Z-autosome fusion in the Danaus lineage, involving the autosome homologous to chromosome 21 in M. cinxia. Coverage analysis also identified three notable assembly errors resulting in chimeric Z-autosome scaffolds. Cytogenetic analysis further revealed a large W chromosome that is partially euchromatic, consistent with being a neo-W chromosome. The discovery of a neo-Z and the provisional assignment of chromosome linkage for >90% of D. plexippus genes lays the foundation for novel insights concerning sex chromosome evolution in this female-heterogametic model species for functional and evolutionary genomics.

The initial effort to identify Z-linked sequences via sequencing coverage analysis applied filters that seemed likely to remove any Wlinked scaffolds. Specifically, we excluded short scaffolds (i.e. length < N90) and any scaffolds where samples were missing data (i.e. where median coverage equaled zero for any sample). Since W-linked scaffolds may well be short (given the known complexities of assembling heterochromatic and repetitive sequences) and also have few or no male reads aligning to them, such scaffolds would not have been included in our initial screen for Z-linked scaffolds.
We therefore re-examined the normalized coverage data without a size limit and including all scaffolds, regardless of average coverage values. We identified 12 scaffolds with strong female bias (e.g. more than 2-fold greater average coverage in females than males; Supplemental Text Table 1). Scaffolds below 9kbp were not analyzed further, given their small size, inconsistent coverage patterns, and paucity of genes.
We first attempted to identify any broad-scale patterns of homology between these potentially W-linked scaffolds the Z-linked scaffolds already identified. To do so, we used the PROmer algorithm in MUMmer to query the potentially W-linked scaffolds against a reference of Z-linked scaffolds. No obvious, broad-scale patterns of similarity were detected, as is evident from the PROmer alignments (Supplemental Text Figure 1). Despite this lack of similarity at the level of entire scaffolds, we further investigated the possibility of homology remaining at the level of genes (i.e. that there were gametologs present on the potential W-scaffolds with homology to Z-linked loci). To do so, we performed a tBLASTn search of protein sequences from potential W-linked genes against the entirety of D. plexippus genome assembly scaffolds, summarized in Supplemental Text Table 2. Only 6 of the 17 queried proteins had any significant (E-value < 1e-5) similarity to Z-linked scaffolds. However, in each case, the proteins also had at least one and typically multiple hits with much greater significance to autosomal scaffolds. Three of these appear to be related to transposable elements, while the other three may directly function gene regulation or immunity. Overall, these blast results do not suggest any obvious genic homology exists between Z-linked genes and these potentially W-linked sequences.

DPOGS200002
Hits multiple scaffolds in the genome, many with much greater significance than any Z scaffolds. Hits many proteins labeled as "toll precursor".

DPOGS203463
Hits one other autosomal scaffold with greater significance. Only 35 aa long. No genbank hits

DPOGS203467
Hits multiple scaffolds in the genome, many with much greater significance than any Z scaffolds. Several hits to Pol-like protein from various inseects; likely a TE.

DPOGS203471
Hits multiple scaffolds in the genome, many with much greater significance than any Z scaffolds.
Homology to proteins predicted to regulate Hox genes (e.g Jim Lovell, TamTrack). Likely has DNA-binding activity, hence the blast hits

DPOGS213849
Hits multiple scaffolds in the genome, many with much greater significance than any Z scaffolds. Several hits to Pol-like protein from various inseects; likely a TE.

DPOGS213850
Hits multiple scaffolds in the genome, many with much greater significance than any Z scaffolds. Several hits to Pol-like protein from various inseects; likely a TE.
In conclusion, analysis of sequence homology between Z-linked and potentially W-linked scaffolds in the D. plexippus genome assembly does not provide support for the presence of a neo-W. However, these analyses are far from sufficient to exclude the possibility that a neo-W exists. Successful sequencing and assembly of degenerate sex chromosomes like the Y and W is a notoriously difficult task and it is a distinct possibility that much W-sequence (neo or otherwise) is not represented in the current genome assembly. Thus, it is likely that much further focused effort and novel data will be required to robustly assess the possibility that D. plexippus harbors a neo-W chromosome.
The identification here of several potentially W-linked sequences harboring with functional protein-coding genes is noteworthy, given the paucity of known W-linked protein-coding loci in Lepidoptera (Sahara et al. 2011;Van't Hof et al. 2013). This presents an interesting opportunity for future study, but given the lack of homology to the Z, it is not of immediate consequence to the question of a neo-W in D. plexippus.