A Hybrid Genetic Linkage Map of Two Ecologically and Morphologically Divergent Midas Cichlid Fishes (Amphilophus spp.) Obtained by Massively Parallel DNA Sequencing (ddRADSeq)

Cichlid fishes are an excellent model system for studying speciation and the formation of adaptive radiations because of their tremendous species richness and astonishing phenotypic diversity. Most research has focused on African rift lake fishes, although Neotropical cichlid species display much variability as well. Almost one dozen species of the Midas cichlid species complex (Amphilophus spp.) have been described so far and have formed repeated adaptive radiations in several Nicaraguan crater lakes. Here we apply double-digest restriction-site associated DNA sequencing to obtain a high-density linkage map of an interspecific cross between the benthic Amphilophus astorquii and the limnetic Amphilophus zaliosus, which are sympatric species endemic to Crater Lake Apoyo, Nicaragua. A total of 755 RAD markers were genotyped in 343 F2 hybrids. The map resolved 25 linkage groups and spans a total distance of 1427 cM with an average marker spacing distance of 1.95 cM, almost matching the total number of chromosomes (n = 24) in these species. Regions of segregation distortion were identified in five linkage groups. Based on the pedigree of parents to F2 offspring, we calculated a genome-wide mutation rate of 6.6 × 10−8 mutations per nucleotide per generation. This genetic map will facilitate the mapping of ecomorphologically relevant adaptive traits in the repeated phenotypes that evolved within the Midas cichlid lineage and, as the first linkage map of a Neotropical cichlid, facilitate comparative genomic analyses between African cichlids, Neotropical cichlids and other teleost fishes.


Midas cichlid double-digest
RADSeq synteny segregation distortion RAD markers mutation rate Cichlid fishes are a well-known evolutionary model system for studying the mechanisms of speciation and the formation of adaptive radiations. African Rift Lake cichlids in particular display an astonishing variability in phenotypes and comprise an enormous number of species, which makes Cichlidae one of the most species-rich families in vertebrates (Fryer and Iles 1972;Salzburger and Meyer 2004;Berra 2007). Because of this mega-diversity, Rift Lake African cichlids have been the primary focus of genomic research in this family to date (reviewed in Fan et al. 2012), but the .60 million year divergent (Matschiner et al. 2011) Neotropical lineage of cichlids is pertinent if we are to understand cichlid diversity at the family level and in a phylogenetic context. For example, the Neotropical Midas cichlid lineage alone features some of the key traits that characterize African cichlid fish diversity: polymorphisms in color, jaw, lip, tooth, and body shape (Meyer 1990a,b;Elmer et al. 2010b). Because of their geographic distribution, which includes the two Nicaraguan Great Lakes and the independent colonizations of at least eight crater lakes of Nicaragua, Midas cichlid communities comprise repeatedly evolved populations at different evolutionary stages of local adaptation and speciation (Wilson et al. 2000; Barluenga and Meyer 2010). The two oldest crater lakes house most of the described species (9 of 11). Interestingly, some of the younger crater lakes harbor notably different ecotypes in sympatry that are genetically not differentiated from each other (e.g., Elmer et al. 2010c). What makes this system so interesting is that similar morphologies, such as elongated body shapes or hypertrophied lips, have evolved independently and repeatedly in different crater lakes (Elmer et al. 2010b;Manousaki et al. 2013). These two key aspects, (1) the crater lakes as "natural experiments" that contain replicates of cichlid evolution at various stages of diversification, and (2) the ecomorphological variability that evolved in different environments in parallel, facilitates the investigation of mechanisms and discovery of principles of how and which genetic material underlies adaptation and the formation of new species . For example, knowing the genetics that lead to phenotypic changes via quantitative trait loci (QTL) mapping and association studies in closely and more distantly related species will ultimately help to elucidate how selection has led to the astonishing biodiversity of African and Neotropical cichlids observed today.
In at least two instances, Midas cichlid radiations have formed repeatedly in Nicaraguan crater lakes; specifically the independent formation of several high-bodied endemic species that occupy benthic niche and one elongated limnetic species each in two lakes (Vivas and McKaye 2001;Barluenga et al. 2006;Elmer et al. 2010b). Apart from body shape, limnetic and benthic species also differ in diet, pharyngeal jaw morphology, behavior, and coloration (Barlow and Munsey 1976;Baylis 1976a,b;Vivas and McKaye 2001;Barluenga et al. 2006;Elmer et al. 2010b;Lehtonen et al. 2011). In Lake Apoyo, which is the oldest of all Nicaraguan crater lakes at 22,000 years of age, speciation into these two distinct ecotypes has occurred in situ in the form of sympatric species by ecological speciation (Barluenga et al. 2006). Evidence suggests that benthic-limnetic divergence in Xiloa was also in sympatry (Kautt et al. 2012). The linkage mapping analysis in this study is based on a cross of the sympatrically occurring limnetic Amphilophus zaliosus and benthic Amphilophus astorquii from crater Lake Apoyo.
There is a paucity of evidence about the genetic basis of early divergence between two lineages as they emerge from a common ancestor (reviewed in Coyne and Orr 2004;Nosil and Schluter 2011). For species that arose through allopatric speciation without exchanging genetic material, the random accumulation of genetic incompatibilities by time might explain their reproductive isolation. However, especially in stages of speciation with gene flow, it is of great interest to ask (1) whether few or many genes and where they are located, e.g., clumped, in the genome; and (2) whether the same type of genes are involved in the process of speciation Feder et al. 2012). In only a few wild vertebrate populations have these questions been addressed via gene mapping and QTL mapping studies; thus, only very few crucial genes responsible for ecological divergence (e.g., Colosimo et al. 2005;Chan et al. 2010) and speciation (Coyne and Orr 2004;Nosil and Schluter 2011) have been identified so far. The development of a Midas cichlid linkage map will aid substantially in identifying the genes that underlie phenotypically and ecologically important traits in fishes early during their adaptations to local conditions and the process of incipient speciation.
Researchers have long questioned why some cichlid fishes evolve in such a diverse and fast manner compared with other vertebrate lineages (Meyer 1990a;Meyer 1993;Wittbrodt et al. 1998), or even other fishes in the same environment (Elmer et al. in press). Behavioral and morphological characters that have been proposed to be responsible for the elevated speciation rate of cichlids include parental care, sexual selection, functional design, and phenotypic plasticity and polymorphisms (Meyer 1987;Stiassny and Meyer 1999;Salzburger 2009). A particular genomic architecture, perhaps derived from whole-genome duplications, has been proposed as a driver of rapid response to selection and higher diversification rates in fishes in general and perhaps cichlids in particular (Meyer and Schartl 1999;Kuraku and Meyer 2008). The evolution of a cichlid-specific physical order of particular genes along the chromosomes, and certain interactions between these genes, might have resulted in a genomic architecture in cichlid lineages that favored diversification (Salzburger 2009). As an alternative mechanism hypermutability of the genome or specific genomic regions might allow organisms to adapt at a higher pace to changing selection pressures. Through a greater neutral mutation rate, the chance that beneficial mutations might emerge is enhanced, therefore allowing neofunctionalization to occur faster, as has been proposed for the facilitation of morphological diversity in teleosts (Ravi and Venkatesh 2008).
The mechanisms responsible for the incomparable diversity of cichlids are unknown. Only an accumulation and comparison of more and more analyzed genomic resources, such as transcriptomes, whole genomes, and linkage maps of cichlids and related organisms can shed light on these unresolved questions. Some progress has already been made in this direction (Albertson et al. 2003;Lee et al. 2005;Sanetra et al. 2009;Elmer et al. 2010a;Fan et al. 2012;Cichlid Genome Consortium, 2006). The present study seeks to contribute to this much-needed resource and analysis with a Neotropical lineage and, based on the pedigree of parents to F 2 offspring, give an estimate of a genome-wide mutation rate.
For decades, the analysis of genomic data were restricted to a few model organisms and organisms of economical importance. Next-generation sequencing platforms allow for the cost-effective sequencing of whole or partial genomes in a relatively short time; this is greatly accelerating genomic research in non-model organisms (Stapley et al. 2010;Elmer and Meyer 2011). Apart from whole genome sequencing, with restriction site-associated DNA sequencing (RADSeq) a powerful new tool to examine the genomes of organisms has become available (Baird et al. 2008;reviewed in Davey and Blaxter 2010). The advantage of RADSeq is the reduction of the genomic information of an organism to a usable and comparable fraction of homologous single nucleotide polymorphisms across a set of individuals. By choosing enzymes that differ in their cutting frequency, RADSeq allows for the development of variable numbers of repeatable genetic markers (RAD markers) to be sequenced by Illumina next-generation sequencing technology (Baird et al. 2008). RAD markers are polymorphic loci (i.e., sequences with at least one single-nucleotide polymorphism [SNP]) within or between individuals. Here, we apply a newly developed double-digest RADSeq (ddRADSeq) protocol (based on Peterson et al. 2012) to create a linkage map of various species of Midas cichlids.
In the present study, we construct a linkage map with the F 2 progeny of an interspecific cross between the benthic A. astorquii and the limnetic A. zaliosus Midas cichlids as a step toward uncovering the genetic basis of adaptive traits in this species complex. The linkage map will improve genomic resources for cichlid fishes in general, and Neotropical cichlids in particular, and along with whole-genome sequencing, contribute to insights into how cichlids achieved their astonishing phenotypic variability.

Pedigree
A wild-caught male A. zaliosus (2007, Lake Apoyo, Nicaragua) and a wild-caught female A. astorquii (2007, Lake Apoyo, Nicaragua) were reared separately in the laboratory until sexual maturity and then crossed to obtain F 1 hybrid progeny. The offspring were raised to sexual maturity in a community tank to allow full-sib pairing. When a pair formed, they were moved to a separate tank and after spawning naturally their eggs were harvested and reared in another tank. F 2 offspring were killed when they reached approximately 2 cm total length to get enough tissue for DNA extraction. The F 1 pair that had the greatest number of F 2 offspring was chosen for construction of the linkage map.
ddRADSeq library preparation and sequencing Genomic DNA was extracted from the two parents ("P generation"), the F 1 hybrid pair and 347 F 2 individuals using the QIAGEN DNeasy Blood & Tissue Kit following the manufacturer's instructions including RNase A treatment. Then, 1 mg of DNA template from each individual was double-digested using the restriction enzymes PstI-HF (20 units/reaction) and MspI (20 units/reaction) in one combined reaction for 3 hr at 37°. Subsequently, each fragmented sample was purified using the QIAGEN MinElute Reaction Cleanup Kit, eluted in 20 mL of Elution Buffer (EB), and quantified. Fragments were then ligated to P1 adapters that bind to PstI-HF created restriction sites and P2 adapters binding to overhangs generated by MspI (see Supporting Information, Table S1 for adapter sequences). In each reaction,~0.4 mg of DNA, 1 mL of P1 adapter (10 mM), 1 mL of P2 adapter (10 mM), 1 mL of T4 ligase (1,000 U/mL), 4 mL of 10· T4 ligation buffer, and double-distilled water were combined to a total volume of 40 mL. The ligation was processed on a polymerase chain reaction (PCR) machine using the following conditions: 37°for 30 min, 65°for 10 min, followed by a decrease in temperature by 1.3°per minute until reaching 20°.
After we ligated each individual's DNA to the adapters, samples were pooled and size-selected from an agarose gel. A preliminary Illumina sequencing run containing the parental and F 1 DNA revealed that the pilot analysis gel size range of 2502500 bp resulted in sequencing too many fragments and therefore a range of~3302400 bp was chosen for subsequent libraries. Manual size selection from agarose gel electrophoresis was performed on the library containing the parental and F 1 samples and samples were then cleaned using the QIAGEN MinElute Gel Purification Kit and eluted in 10 mL of EB. Fifty F 2 individuals were pooled per library and size selected using Pippin Prep technology (Sage Science, Beverly, MA). Seven PCRs (several are necessary to minimize PCR bias) were performed in replicate on size-selected fragments [(Peterson et al. 2012) Figure 1]. Each PCR contained 10220 ng of library DNA template, 4 mL of dNTPs (100 mM), 4.0 mL of 5· Phusion HF buffer (NEB), 0.2 mL of Phusion Taq polymerase (NEB), 1.0 mL of each RAD primer (10 mM), and filled up to 20 mL with double-distilled water. PCR conditions were performed as follows 98°(30 sec), [98°(10 sec), 65°(30 sec), 72°( 30 sec)] · 10, 72°(300 sec). Gel electrophoresis was performed on pooled products to remove remaining oligo dimers and other contaminants. Fragments ranging from 330 to 400 bp were excised from a gel after electrophoresis and cleaned using the QIAGEN MinElute Gel Extraction Kit and eluted in 10 mL of EB. The eight libraries (one containing the parental and F 1 DNA and the remainder seven each containing 50 F 2 individuals) were diluted to 1 nM and sequenced single end to 150 bp length with the Illumina TruSeq SBS Kit v5 in eight lanes on an Illumina Genome Analyzer IIx (GeCKo: Genomics Center University of Konstanz) over two consecutive runs, each containing a single PhiX control lane.

Linkage mapping analysis
Bioinformatic processing of the reads (i.e., RADSeq tags) was performed using Stacks . Reads were trimmed to a length of 110 bp, cleaned from erroneous and low-quality reads, and sorted by individual. Processing of the 347 F 2 -offspring individuals was performed using the denvo_map.pl script, which creates stacks of loci (i.e., RADSeq stacks) within individuals, followed by building a catalog of loci present in the two parents and then mapping all stacks from the offspring to the parental catalog and finally creating the offspring genotypes for those loci. Multiple SNPs in one RADSeq stack were treated as a single polymorphic marker (i.e., one RAD marker). The parameters were optimized and set as follows: the minimum amount of identical reads to create a stack in each parent and progeny was set to three (2m 3, 2P 3) the number of mismatches allowed between loci from the parents was set to three (2n 3), and highly repetitive RAD tags were removed or disjointed (2t), other parameters were set to default.
JoinMap 4 (Van Ooijen 2006) was used to perform linkage-mapping analysis. Genotypes were exported from Stacks and were filtered to only include those markers for which there were genotype calls in at least 90 F 2 individuals with a coverage of at least 5·. Linkage analyses were performed using all markers (of the type aa/bb, aa/ab, ab/aa, ab/ ab, ab/ac, ab/cc in the P generation as defined by Stacks) containing alleles from the F 1 pair that could be matched with confidence to each parental unit and did not deviate from Mendelian segregation ratios. Markers under segregation distortion (P = 0.0520.001) were added step by step from the less significant to the most and their effect on the map was checked. Marker groupings were calculated using an independent logarithm of odds (LOD) score of 6.0, but the number of calculated linkage groups did not change between LOD scores of 6.0 to a more stringent LOD score of 10.0. Each linkage group was calculated using the regression mapping algorithm and Kosambi's mapping function (Kosambi 1943) with the following thresholds: recombination frequency of 0.400, LOD value of 1.0, and a goodness-of-fit jump of 5.00. Marker order was optimized by performing a ripple function after each added locus.

Comparative analysis
All RAD markers used in the linkage map were indexed with the respective linkage group and position and compared with the genomes of the fishes tilapia (Oreochromis niloticus; version from 25.01.2012), stickleback (Gasterosteus aculeatus; version BROADS1.61), and medaka (Oryzias latipes; version MEDAKA1.48) using blastn searches and an e-value cut-off of 10 215 for tilapia (a basal cichlid) and 10 210 for stickleback and medaka. Tilapia was used as a representative of the African cichlid lineage, whereas stickleback and medaka were used as phylogenetically close teleost outgroups to the cichlids. If a marker mapped to more than one locus it was either excluded or, if the difference in the bit score between best hit and second-best hit was more than 20, the best hit was retained. Markers that mapped to unlinked scaffolds were excluded from the synteny analyses. Synteny was calculated in percentage of markers mapping to a single orthologous linkage group in contrast to those mapping to different linkage groups. Markers that solely map to a single linkage group, i.e., if no other markers mapped to that same linkage group, were excluded.

Mutation rate estimation
De novo mutations are expected to be recognized as high-confidence SNPs present in one or a few offspring but not in the parents. Loci from two F 1 individuals and eight randomly chosen F 2 individuals with high coverage were compared to the parental (P) loci and checked for SNPs only present in the progeny. The coverage for a locus to be included in the analysis was set to at least 8· (minimally 4· per allele to reduce the chance of candidate new mutations being sequencing errors). In addition, an empirical distribution of allele-to-allele coverage ratio was estimated from all heterozygous RAD markers in the F 2 generation. New mutation candidates in the progeny were checked against this distribution, and only those mutations that had an allele ratio within the 95% confidence interval of the loci present in the empirically estimated distribution were retained. This step yielded loci with a ratio lower than 0.5 or greater than 2.0 to be discarded and was performed to exclude candidate loci with odd allele-to-allele ratios (e.g., repetitive sequences confounded with alleles). For example, a locus with a mutation candidate and 10 reads in total, including seven reads of the original allele and three reads of the allele with the new mutation, would be excluded because the allele-to-allele ratio is lower than 0.5.

SNP and marker detection
A total of 254 million reads were used, with an average of 31.8 million reads per library (SD: 3.1 million reads). On average, approximately 588,000 reads were obtained per individual (SD: 201,454 reads) with 15· coverage per read (SD: 5.1·). Within the parental A. astorquii individual, a SNP frequency of 0.80 SNPs/kb (11,172 SNPs in 139,109 loci) and for A. zaliosus 0.89 SNPs/kb (12,764 SNPs in 142,740 loci) were detected (Table S2). Of 118,213 loci that were scored in both parents, 12,429 were polymorphic with one to three SNPs, with a total of 15,035 SNPs and a frequency of 1.2 SNPs/kb. Of those loci that were polymorphic, 18.5% were fixed between parents. A total of 38,203 loci were detected in the parents and were present in at least 90 of the 343 F 2 individuals. The number of loci shared between parents and F 2 progeny was smaller than that between parents and F 1 because the F 2 progeny gel size range was narrower and therefore included fewer fragments. A total of 3,184 loci that were shared between parents, F 1 and F 2 s were polymorphic and retained for further analyses. Of these loci, 755 were informative (alleles that segregated in a 1:2:1 segregation ratio in the F 2 generation) and could serve for the construction of the linkage map ( Figure 2). Because different libraries did not exactly contain the same loci and some of the loci did not pass the coverage threshold, we obtained~25% missing data. Almost 50% of the loci were present in more than 300 individuals ( Figure S1).

Linkage groups
Of the 347 F 2 individuals genotyped with ddRADSeq, four were excluded because of low coverage (,6· per individual; see Table S3 for genotype data). From linkage mapping analysis, 25 linkage groups were identified with a total size of 1427 cM and an average marker spacing distance of 1.95 cM (Figure 2, Table S4). Linkage groups were between 51.5 and 77.1 cM in length except for the two small linkage groups LG24 and LG25 with 2.5 and 2.0 cM, respectively. With the exception of two small linkage groups with only two and three linked markers, all linkage groups consisted of at least 10 markers (minimum: 10 markers, maximum: 54 markers). On average, a typical linkage group comprised 30 markers (Table S4). A total of 17 markers (2.3% of all markers) were completely linked (no recombination was observed between them). The maximum distance between two markers on a single linkage group was 30.7 cM on LG20.

Segregation distortion
Almost 10% of all markers (74 of 755 markers) deviated from Mendelian segregation ratios with a P-value between 0.05 and 0.001 (Table S3). Markers that were distorted significantly with P , 0.0001 were not considered because ratios that are too skewed might create inaccurate distances and false linkages (Cervera et al. 2001). Closely linked markers with similar patterns of segregation distortion probably can be attributed to biological causes, rather than genotyping error. Of these 74 markers, in fact 19 could be found on LG1 (39% Figure 1 Schematic of the ddRADSeq protocol. Arrows indicate restriction sites for a rare-cutting enzyme in green and a frequent-cutting enzyme in orange. P1 adapters specifically bind to the rare-cutting enzyme (green) and P2 adapters bind to the frequent-cutter (orange) restriction sites. Each individual contains a different barcode of five nucleotide bases specified by the P1 adapter. Sequenced RAD tags start with the barcode (individual one: red barcode, individual two: blue barcode). The bioinformatic software Stacks allows for the detection of single nucleotide polymorphisms between individuals (outlined by red box in sequence alignment).
distorted markers on LG1) with most of them directly strung together. In all these markers the ratio was skewed toward a higher frequency of the paternal allele. Apart from the large segregation distortion block on LG1, four other smaller blocks of segregation distortion were identified: five adjacent markers on LG6, three markers in close proximity at the distal part of LG15, seven markers on LG23 spanning a distance of~15 cM and all three markers on LG24.
LG6 and LG24 distorted markers exhibit a higher ratio of the maternally inherited allele, whereas distorted markers both on LG7 and LG23 display an excess of heterozygotes. Thus, half of all distorted markers (37 of 74) are placed on these five blocks (Figure 2, Table S3).

Comparative genomics
The vast majority of markers present in any given particular Midas cichlid linkage group also map to a single orthologous linkage group in tilapia (92 of 107 markers that uniquely mapped to the tilapia genome, excluding those markers that mapped to a single linkage group with only this one syntenic marker), a basal African cichlid that last shared a common ancestor at least 60290 million years ago (Salzburger and Meyer 2004). Four linkage groups in tilapia (LGs 1, 4, 10, and 21) and six linkage groups in Midas (LGs 8,12,16,23,24,and 25) are not represented by more than two markers that map to any orthologous linkage group. Midas cichlid linkage group 23 mapped to 14 loci of different scaffolds in the tilapia genome. For the other five Midas linkage groups that did not map to orthologous linkage groups in tilapia, this is caused by an insufficient number of markers used to anchor the tilapia genome (Table S5) or small linkage group sizes (due to insufficient data in tilapia genomic resources) in tilapia. In addition to the linkage groups that share homologous markers (conserved synteny), the respective position of these markers (linkage order) is mostly the same, suggesting retained syntenic relationships among those loci (Figure 3). However, divergent marker orders, presumably caused by intrachromosomal rearrangements, can be detected to some extent between the Midas cichlid and tilapia (e.g., LGs 13, 15, and 21 in the Midas cichlid). In general, however, it is clear that overall levels of synteny between these two cichlid lineages are quite highly conserved, with 86% of all markers mapping to a single orthologous linkage group when those linkage groups containing two or more markers are considered.
As expected, fewer homologous markers were found for stickleback (44 markers) and medaka (38 markers) compared with tilapia and the Midas cichlid (Figure 3). The stickleback shows many markers of conserved synteny (78%) with the Midas cichlid. In contrast, fewer markers show conserved synteny (38%) between medaka and Midas cichlids.

Mutation rate
Two new mutations in the F 1 progeny and five new mutations in the F 2 progeny were found. The same mutation was observed in two different F 2 individuals and was therefore only counted as one mutation. The F 1 mutation rate was calculated to 7.4 · 10 28 and the F 2 5.8 · 10 28 , leading to an average mutation rate of 6.6 · 10 28 mutations per nucleotide per generation.

DISCUSSION
Here, we report the first linkage map of a Neotropical cichlid species. It will serve as important addition to the genomic resources for cichlid fishes and facilitate in searching for genes that contribute to ecological and phenotypic diversity in Midas cichlids in particular and cichlids more generally. With a total of 25 linkage groups, the present linkage map closely matches the number of chromosomes (n = 24) in the Midas cichlid (Feldberg et al. 2003). The small groups LG24 and LG25 might represent larger linkage groups or collapse with another linkage group when further markers can be incorporated. Alternatively, in the case of LG24, markers in greater distance might not have been identified because of highly distorted segregation ratios rather than missing genotypes. Significant parts of a linkage group might be completely missing if it contains distorted regions (Cervera et al. 2001;Doucleff et al. 2004). Alternatively, the distorted markers constitute false linkages (e.g., Nelson et al. 2010), though at least in the case of LG24 this seems improbable given that all three markers are in very close proximity and a small number of recombination events have occurred between them.
Several genes associated with phenotypic divergence between cichlid species have been identified to date (Kijimoto et al. 2005;Kobayashi et al. 2006;Albertson and Kocher 2006;Salzburger et al. 2007;Roberts et al. 2009;Fan et al. 2011;Roberts et al. 2011). In African cichlids, Figure 2 Linkage groups constructed with 755 RAD markers and ordered by size. Blocks of segregation distortion are indicated by brackets. Markers used for synteny analyses are shown in orange. For further information on marker IDs, position, segregation distortion and markers used in synteny analyses see, Table S3. a specific linkage group (LG5) has been proposed to constitute a "speciation chromosome" containing genes responsible for sex determination, tooth shape, visual sensitivity, and color pattern (Streelman and Albertson 2006). This linkage group is orthologous to the Midas LG4. It will be interesting to test whether these genes and this particular linkage group are also important for traits under selection in Neotropical cichlids.
As the number of genetic markers that can be quickly generated has exploded with new sequencing technology, the limiting factor for constructing a high-resolution linkage map is the biological resources: the number of individuals and recombination events. Thousands of markers can be produced in a short time with ddRADSeq but highresolution mapping can only be achieved by maximizing the number of recombinations, i.e., the individuals. Under laboratory conditions, Midas cichlids can spawn naturally almost once per month with more than a 300 offspring per clutch, which allows for harvesting a high number of individuals. Because of the Midas cichlids' amenable life history, we were able to create a genetic map based on more than 340 F 2 individuals from one F 1 family. The number of markers that can be generated and the amount of missing data that is accepted constitute a trade-off. When minimizing missing data, markers that are not present in all individuals (e.g., through library effects or coverage thresholds) will be lost. Accordingly, while maximizing the number of loci, more missing data are accepted, which might result in incorrect distances in linkage map construction and in QTL analyses (Hackett and Broadfoot 2003).
To perform synteny analyses and achieve a high marker density, we allowed more missing data for the purpose of this study. Compared with the five other linkage maps that have been constructed with RADSeq technology so far Baxter et al. 2011;Chutimanitsakun et al. 2011;Pfender et al. 2011;Parnell et al. 2012), the map of the present study has the second highest number of progeny genotyped for markers and hence a high number of recombinations. The gar linkage map consisted of more markers (~5500), although it was based on fewer individuals . The average spanning distance between two markers is 1.95 cM in our study, which is the lowest value published to date for a RADSeq study. In comparison with the published cichlid linkage maps from A. burtoni (Sanetra et al. 2009), tilapia (Lee et al. 2005), and Lake Malawi cichlids (Albertson et al. 2003;Parnell et al. 2012), the present Midas cichlid linkage map has the second highest number of individuals, the highest number of markers, and lowest average distance between two markers. This result highlights the promise of next-generation sequencing based genotyping, and especially the ddRADSeq protocol, for linkage mapping experiments when biological material is sufficient. Although A. astorquii and A. zaliosus have clearly distinct ecologies, their divergence time is very recent [should be less than 10,000 years (Barluenga et al. 2006) and maximally 20,000 years (Kutterolf et al. 2007)]. For this reason, only 10.5% of our markers were polymorphic and of those 18.5% were fixed between species. Given this young divergence, it is of great interest how these species developed and maintain reproductive isolation. Besides premating isolation, chromosomal rearrangements or genetic incompatibilities might act as isolating barriers to gene flow in postzygotic reproduction (reviewed in Coyne and Orr 2004). As the species are young and can be crossed easily, incompatible loci seems more probable than large genomic rearrangements (Moyle and Graham 2005). Genetic incompatibilities between species detected on the basis of segregation distortion have been suggested in many cases, including lake whitefish , Drosophila (Phadnis and Orr 2009), Nasonia (Gadau et al. 1999;Niehuis et al. 2008), and some plant species (Foolad et al. 1995;Whitkus 1998;Myburg et al. 2004).
In our linkage mapping analysis, we found that 10% of the markers deviated (P , 0.05) from expected Mendelian segregation ratio. Single markers deviating from the expected ratio might result from genotyping errors, though blocks of deviating linked markers as on LGs1, 6, 15, 23, and 24 probably have a biological basis and might be caused by segregation distortion genes (e.g., Lyttle 1993) or genetic incompatibilities by differential fixation of alleles between parents (Dobzhansky 1937;Muller 1942;. If genomic incompatibilities between species exist, chromosomal regions should consistently exhibit segregation distortion. As our data only originates from a single family, further testing in laboratory bred or wild A. astorquii and A. zaliosus hybrids will be necessary (Danzmann and Gharbi 2001) to reveal whether the five linkage groups exhibiting blocks of distorted markers result from interspecific genetic incompatibilities.
Genomic architecture has been suspected of being a main factor contributing to the cichlids' evolutionary success in terms of biodiversity (Salzburger 2009). The comparative genomic analysis between the Midas cichlid and the tilapia reveals a highly conserved synteny with 86% of the markers mapping to a single orthologous linkage group. This finding points toward a high level of conserved synteny between Neotropical and African cichlids, in spite of their relatively ancient split (60 mya according to Vences et al. 2001;Matschiner et al. 2011;but see Farias et al. 1999;Salzburger and Meyer 2004;Azuma et al. 2008). A relatively high level of conservatism in number of chromosomes across cichlid genera and species is observed (see Feldberg et al. 2003), indicating that interchromosomal rearrangements are rare. This observation and our results suggest that at least large-scale chromosomal rearrangements are probably not a main factor contributing to the species richness in cichlids and also agree with previous studies demonstrating a high degree of conserved synteny between cichlid genomes (Sanetra et al. 2009). As a comparison, the human and lemur lineages diverged~74 mya (e.g., Huchon et al. 2007), although their number of chromosomes differs considerably (Homo sapiens: 23 chromosomes; Microcebus murinus: 33 chromosomes), including several interchromosomal rearrangements that occurred after their split (Dutrillaux 1979).
Medaka and stickleback show less conserved synteny with the Midas cichlid (medaka: 38%, stickleback: 78% markers with conserved synteny). Although the data for genomic comparisons between the Midas cichlid and the noncichlid fish species stickleback and medaka are limited because of the large phylogenetic distance of more than 100 million years since these lineages last shared a common ancestor (Matschiner et al. 2011), it allows for two conclusions: (1) the degree of conserved synteny with these species is lower than compared with tilapia and (2) medaka shares considerably less conserved synteny with the Midas cichlid compared with stickleback. As chromosomal rearrangements leading to linkage and synteny disruptions accumulate with time, stickleback (divergence time~140 mya, Matschiner et al. 2011) and medaka (divergence time~100 mya, Matschiner et al. 2011) display less conserved synteny to the Midas cichlid, as expected, compared with the younger split of African vs. Neotropical cichlids. However, synteny disruption seems to be greater between medaka and the Midas cichlid compared with stickleback, which was unexpected because most phylogenies place the stickleback basal to cichlids and medaka (Chen et al. 2003;Mabuchi et al. 2007;Azuma et al. 2008, Matschiner et al. 2011). Incorrect homology determination or mapping errors might account for a lower degree of conserved synteny between the Midas cichlid and medaka. Alternatively, the medaka lineage has undergone considerable more rearrangements compared with cichlids and stickleback, and/or phylogenetic relationships are incorrect, as relationships between these teleosts remain controversial (Chen et al. 2003;Steinke et al. 2006).
The mutation rate (u) estimation of 6.6 · 10 28 per site per generation in Midas cichlids indicates a somewhat elevated mutation rate compared to estimates from other vertebrates [mouse: 3.8 · 10 28 , human: 3.0 · 10 28 (Lynch 2010)]. A mutation rate based on the synonymous substitution rate and UTRs from pooled Midas cichlid expressed sequence tags yielded an estimate (~1.25 · 10 26 per site per year) higher than that of the present study, although it was thought to be elevated by population-level polymorphism (Elmer et al. 2010a). For mitochondrial DNA control region, the substitution rate has been estimated as 7.1 · 10 28 per site per year (Barluenga and Meyer 2004), which, despite the fact that control region should have a high mutation rate, is fairly well in agreement with our current genome-wide mutation rate estimate. However, all calculations of mutation or substitution in recently diverged populations differ because of the role of lineage sorting in decreasing rates with time (i.e., pedigree vs. population vs. phylogenetic rates) [see e.g., (Ho et al. 2005;Subramanian et al. 2009)]. Although vertebrate estimates only include model species such as human, mouse, and rat, this finding is congruent with interspecific comparisons of mutation rates in teleost compared with mammalian lineages that revealed that teleosts exhibit a higher neutral mutation rate (Ravi and Venkatesh 2008). It might be argued that an elevated mutation rate might explain the teleost morphological diversity and species richness, since the chance that a beneficial mutation arises on which selection can act is increased. More estimates of vertebrate mutation rates are needed in order to ascertain whether the cichlid lineage, and species-rich lineages in general, exhibit higher mutation rates.
Cichlid fishes represent wonderful examples for phenotypic diversity and convergence, rapid speciation, and adaptive radiation (Fan et al. 2012). Yet, we have still to understand how the underlying genetics of this diversity operate to translate into the ubiquitous phenotypic richness of this evolutionary lineage. The present linkage map adds valuable genomic information to tackle these fundamental questions in evolutionary biology. Future genome-wide analyses on the level of populations, species, and among distantly related lineages in cichlid fishes will profit from this resource and it will help to shed light on the genomics of adaptation in this extraordinarily diverse evolutionary lineage. P. Franchini and S. Fan for bioinformatic support and the GeCKo for sequencing. We are grateful for helpful suggestions from J. Weber during the ddRADSeq library preparation. This work was supported by the University of Konstanz, the European Research Council (ERC) advanced grant 293700 Genadapt and Deutsche Forschungsgemeinschaft (DFG) grants to A.M.