Pattern of Mutation Rates in the Germline of Drosophila melanogaster Males from a Large-Scale Mutation Screening Experiment

The sperm or eggs of sexual organisms go through a series of cell divisions from the fertilized egg; mutations can occur at each division. Mutations in the lineage of cells leading to the sperm or eggs are of particular importance because many such mutations may be shared by somatic tissues and also may be inherited, thus having a lasting consequence. For decades, little has been known about the pattern of the mutation rates along the germline development. Recently it was shown from a small portion of data that resulted from a large-scale mutation screening experiment that the rates of recessive lethal or nearly lethal mutations differ dramatically during the germline development of Drosophila melanogaster males. In this paper the full data set from the experiment and its analysis are reported by taking advantage of a recent methodologic advance. By analyzing the mutation patterns with different levels of recessive lethality, earlier published conclusions based on partial data are found to remain valid. Furthermore, it is found that for most nearly lethal mutations, the mutation rate at the first cell division is even greater than previous thought compared with those at other divisions. There is also some evidence that the mutation rate at the second division decreases rapidly but is still appreciably greater than those for the rest of the cleavage stage. The mutation rate at spermatogenesis is greater than late cleavage and stem-cell stages, but there is no evidence that rates are different among the five cell divisions of the spermatogenesis. We also found that a modestly biased sampling, leading to slightly more primordial germ cells after the eighth division than those reported in the literature, provides the best fit to the data. These findings provide conceptual and numerical basis for exploring the consequences of differential mutation rates during individual development.

. Differential mutation rates during development also provide new insights on male-driven evolution (Gao et al. 2011).
Until recently, there has been little progress on dissecting mutation rates during germline development at the level of individual cell divisions. Although next-generation sequencing may hold great promise for providing rich information for such purposes, welldeveloped classic experiments can still be a powerful and cost-effective approach, particularly for some model organisms. Gao et al. (2011) reported an analysis of the mutation patterns from a large-scale experiment for screening recessive lethal or nearly lethal mutations and found that mutation rates differ substantially in the germline lineage. In particular, the first cell division harbors the greatest mutation rate, followed by the divisions in spermatogenesis, whereas cell divisions in between have at least a magnitude smaller mutation rate. Gao et al. (2011) analyzed only those mutations of extreme recessive lethality, which are only a small fraction of the available data, due to a technical difficulty that also limited the analysis to families of exactly 20 offspring with at most two mutations per family. Fu (2013) extended the previous inference framework (Gao et al. 2011) with a new method for approximating the probability of a mutation pattern and a refined coalescent algorithm for simulating sample genealogies which are necessary for deriving coefficients used in the analysis. The refined inference framework not only removes the limitation of at most two mutations per families but also allows families of different sizes. This framework also established the confidence for conclusions derived from likelihood ratios based statistical tests for mutation screening data.
Taking advantage of the aforementioned methodologic progress, we report in this paper the analysis of the complete data set from the experiment, which consists of 9872 families of various sizes and which contains three times more mutations than previously reported. The greater resolution of the data as well as the contrast of results from analyzing mutations with different recessive lethalities allows us to obtain more accurate/stable estimates of mutation rates, to explore some aspects of the mutation process, and to test hypotheses that were unattainable previously. As a result, a deeper understanding of the mutation rate patterns along germline development is achieved. Furthermore, new hypotheses are presented and the male-driven hypothesis is reevaluated in light of the new results from the current analysis.

Materials
Drosophila melanogaster stocks from Woodruff's laboratory in Bowling Green State University were used. These flies were maintained by taking advantage of the balancer chromosomes that were pioneered by H. J. Muller (Muller 1928) for the purpose of maintaining newly isolated mutations, including recessive lethals, without selection (Muller and Oster 1963;Abrahamson and Lewis 1971;Ashburner 1989;Greenspan 1997). Balancers for each of the major chromosomes of these D. melanogaster contain multiple inversions and one or more dominant visible mutations. The inversions, which were mapped by using giant polytene chromosomes, act as crossover suppressors and the clearly visible dominant mutations allows for the identification of heterozygotes. With these chromosome stocks, new lethal or nearly lethal mutations are maintained in the heterozygous state against the balancer chromosomes without the concern of being lost due to recombination. The experiment employed three types of autosomal haploid chromosomes (genomes), which are denoted by b, g, and z. More specifically they are as follows: b ¼ Tð2; 3ÞA1 2 W; Cy L Ubx g ¼ Tð2; 3ÞB18; Pm Sb z ¼ þ; þ: The b type balancer is homozygous lethal and is marked with the dominant visible, and recessive lethal mutations, Curly (Cy) wings, Lobe (L) eye, and Ultrabithorax (Ubx) enlarged halteres. This balancer segregates as a unit and suppresses crossing over on both the second and third chromosomes effectively (Lindsley and Zimm 1992;Thompson 1977). Similarly, the g chromosome is also homozygous lethal and carries dominant visible markers. Type z represents a haploid genome with wild-type second and third chromosomes that are free of lethal mutations at the start of experiment. Recessive lethal or nearly lethal mutations in z are the screening target of the experiment. Graf et al.'s (1992) methods for culturing the flies were followed with some modifications. Drosophila medium containing water, glucose, agar, corn meal, and the antifungal agent methyl-p-hydroxybenzoate was cooked and dispensed into glass culture vials (10 cm in height and 3.5 cm in diameter) via a self-made dispenser. Also, self-made vial holders (designed to hold vials in a 10 · 10 array to match the dispenser) were used to facilitate the work. After drying and cooling the medium, a small piece of sterilized filter paper was folded and inserted into the medium with forceps to increase the surface and to regulate humidity within the vial, and then a small amount of live baker's yeast was seeded into the vial. The culture vials, with cotton plugs added, were used to start the cultures. The flies were reared in a chamber, which allowed simultaneous culturing of more than 20,000 culture vials of flies under standard conditions (25°, approximately 60% relative humidity and 16-hr light:8-hr dark). The temperature in the culturing chamber was adjusted with air-conditioners and a self-regulating electric heating system, and humidity was manually controlled with a humidifier.

Experiment
As briefly described in Gao et al. (2011), the mutation screening experiment, which is similar to protocols that have been used in various laboratories (Thompson and Woodruff 1980;Woodruff et al. 1984;Mason et al. 1985;Brodberg et al. 1987;Woodruff et al. 1996), consists of two parts. The first is to employ a three-generation assay to identify autosomal-recessive lethal or nearly lethal mutations in approximately 1200 genes on the second and third chromosomes in D. melanogaster. The second component of the experiment is known as the allelism test, which is to delineate the identity of mutations leading to different mutants.
The first component of the experiment was designed to screen b/z male offspring of crosses between single b/z males and multiple b/g virgin females to see whether a new lethal or nearly lethal mutation occurred in the z chromosomes during germline development of the father. In essence, the mating scheme within each family derived from a single b/z male is as follows: Parental : Multiple b=g virgin ♀ · single b=z ♂ ð20 2 35 b=z ♂ offspring were each subjected to the following assayÞ F 1 : Multiple b=g virgin ♀ · single b=z ♂ ðmultiple b=z ♀ and ♂ were obtained and were used for the F 2 Þ F 2 : Multiple b=z virgin ♀ · multiple b=z ♂ F 3 : Identify the genotype of each surviving offspring; which is either b=z or z=z: The intention here was to follow 20 offspring (lines) per family, but because some lines would not succeed for a variety of reasons, including death and failing quality control, up to 35 lines were initiated per family. As a result, the number of offspring in a family ranged from 2 to 35. Parental flies were removed from culture vials well before we examined the progeny and collected virgin females. The progeny adult flies were etherized using the Drosophila Fly Anesthetizer (Burco) and examined for phenotype and sex under a stereo microscope. Virgin b/g or b/z females were collected within about 8 hr after we removed parental flies and then collection was repeated every 8 hr (usually at about 8:00 AM, 4:00 PM, and midnight) during the eclosion.
Care was taken to ensure that b/g females were virgin during the experiment. As a quality control measure, if b/g offspring in the F 3 were observed, the line was deemed disqualified because such an event can only result from nonvirgin b/z females. Also, to make sure the degree d of recessive lethality or simply lethality (Fu 2013), which is one minus the percent of z/z homozygote among all offspring, is adequately estimated, 40 offspring were set to be the minimum number of offspring examined in F 3 . A line is declared to be tentatively a mutant with lethality d if the percentage of its z/z homozygotes among all surviving offspring is not larger than d.
The b/z males used in the Parental stage were selected from families of F 3 in which no mutation of detectable lethality was found (i.e., the percentage of z/z offspring is normal). In addition, the selection favored vigorous young males with distinct phenotypes. The process ensured that the chromosome z in the b/z male was devoid of recessive lethal mutation in general, but newly arisen recessive lethal mutation in F3 may escape such surveillance. The counting of F 3 offspring was performed when there were a sufficient number of matured offspring. By the time a line was judged to be devoid of the targeted mutations, it would be typically 224 d after the minimum age for a matured adult.
The second component of the experiment is the allelism test, which is to delineate the identity of the mutation leading to each mutant. This was achieved by a series of crosses between mutant lines, with one line contributing b/z males and another virgin b/z females. There are usually many ways the crosses can be arranged, but each mutant line must be involved in at least one cross. The percentage of z/z individuals among offspring of a cross is expected to be similar to those of the parental lines if the mutations in the two different lines are the same and significantly higher when they are different (if mutations are indeed recessive). When there are multiple mutant lines in a family, a series of interconnected crosses between different mutant lines are necessary to resolve ambiguities.
Statistical method Using the same notation as Fu (2013), the information conveyed by mutation(s) in a family can be represented by a mutation pattern in which each element represents a mutation and its value is the number of offspring carrying the mutation, or simply the size of the mutation. After a line obtains a mutation of lethality $ d%, a further mutation will likely do nothing or increase the level of lethality, but the effect is usually difficult to distinguish from the first one under our experimental setting. This often led to masking or nondetection of the second mutation (Fu 2013). As a result, each identified mutant offspring is associated with one and only one mutation in the mutation pattern. Therefore, there is a natural constraint that i + j + k . . . is not larger than the family size. The aggregation of mutation patterns for families of size f can be concisely represented as where n(k i ) is the number of occurrences of pattern k i and for brevity can be omitted if its value is 1. The statistical analysis of the experimental results consists of determining the mutation pattern in each family, estimating mutation rates, and testing hypotheses.
Determination of the mutation pattern in a family: The loglikelihood ratio test is used to test the null hypothesis that two mutant lines share the same causal mutation against the alternative that they result from independent mutations. Under the null hypothesis, the likelihood is the product of three binomial distributions all having the same frequency for z/z offspring, while under the alternative the binomial distribution for the cross has different frequencies for z/z offspring. Suppose z i and n i are the numbers of z/z and total offspring for parental line i(i = 1, 2) respectively, and z 3 and n 3 are the corresponding numbers for the cross. Then the test statistic is where p 123 = (z 1 + z 2 + z 3 )/(n 1 + n 2 + n 3 ), p 12 = (z 1 + z 2 )/(n 1 + n 2 ) and p 3 = z 3 /n 3 . The test statistic follows asymptotically a x 2 distribution with one degree of freedom under the null hypothesis that the cross results in the same z/z percentage as the two parental lines. Therefore, a significant test result indicates that the two lines result from different mutations. Often, multiple crosses are performed among lines within a family. If the overall significance level is a, and there are m crosses, then the level of significance for each test should be set to about a/m. In general, larger values of a will result in more significant tests and thus more independent mutations will be inferred, but the number of mutants for a given lethality does not change significantly, which are mostly determined by the percentage of z/z offspring in the F 3 . The results presented in this paper corresponds to a = 0.10.
Estimating mutation rate and hypothesis testing: Suppose the development from the fertilized egg to the sperms are divided into I consecutive time intervals [t i 2 1 + 1, t i ] (i = 1,. . .I) (t 0 = 0 and t I = maximum number of cell divisions), and let u i be the mutation rate per cell division within the ith interval. The estimate of m = (u 1 , . . . u I ) and statistical tests about them were performed through the maximum likelihood approach developed in Gao et al. (2011) and further refined in Fu (2013). The inference makes use of the information about population dynamics, intervals of cell divisions, and coalescent structure of the sample genealogy ( Figure 1), but in essence, the inference framework is based on evaluating the likelihood function where f represents family size, k is a mutation pattern for a given family size f, n f (k) is the number of occurrences of pattern k in families of size f, and p f (k) is the probability of mutation pattern k, which is a function of m and coefficients that are derived from the dynamics of the germline population. The probability can be computed through a combined approach of both analytic derivation and coalescent simulation of samples. The first product enumerates over all the different family sizes, while for each family size the second product enumerates over all observed patterns. The maximum likelihood estimatem is the value of mthat maximizes the above likelihood function.
The overall mutation rate m is defined as the sum of mutation rates over all divisions in the development. Thus the maximum likelihood estimate of m ism Alternatively, an unbiased estimate of m can be obtained by a classical approach (Fu and Huai 2003), which is This estimate has the advantage of being unbiased regardless of the assumptions made on the dynamics of the germline lineage development.

Data and summary
More than 10,000 families were screened in a 4-yr period (200422008), from which 9872 succeeded with at least one survival line at F 3 . To obtain positional information about observed mutation (s), we further require at least two completed lines for a family. To make sure b/g females in F 2 are virgin, we further require that no b/g offspring among the F 3 was observed. Furthermore, when the recessive lethality of a line exceeds a given threshold and the identity of the mutation cannot be determined, the line was also removed. This occurs when such a line was not used in crosses with other mutant lines (if they exist). After the cleanup, 9594 families passed the quality control, which resulted in a total of 271,794 lines, which is 90% of the 300,737 examined lines. The mean number of lines per families is 28.3 and the mean number of offspring in the qualified lines in F 3 is 117.9. The distribution of the family sizes is given in Figure 2. The histogram of the percentage of z/z offspring in the 271,794 lines is given in Figure 3. There are two obvious modes in the histogram, one at 1% and another at 18%. The first one corresponds to those with relatively high lethality mutations, and the latter corresponds to the normal z/z homozygotes. Note that because b/b is lethal, only b/z and z/z genotypes can potentially survive. If the genotypes are of the same fitness, their proportions will be 2/3 and 1/3, respectively. Figure 3 shows that z/z appears to be less fit than the b/z genotype, resulting in an average of approximately 17.5% only. One possible reason may be that the normal z chromosome used in the initial experiment contained some mildly deleterious recessive mutations. The reduced fitness of the z/z homozygotes does not affect significantly the identification of recessive mutations of high lethality but makes the dissection of those of lower lethality more challenging.
The identification of mutations starts with lines having a low percentage of z/z offspring in the F 3 and proceeds with cross experiments and inference. One example is presented to illustrate the process. Table 1 shows an example of the F 3 results of a family early in the experiment. The family starts with 30 lines but after initial quality control (one with b/goffspring and two with too few total offspring and a few lines did not survive), a total of 17 lines are recorded.
It is clear that all lines, except for line 5, are suspects of new mutations of relatively high lethality. For a line to be declared as a mutant line for lethality level d, at minimum it should have a percentage of z/z offspring not larger than 1 2 d and it is crossed with at least one other line if available. We were not sure how low the d can be so it was decided to perform as many crosses as feasible. As a result, all the lines except 5 and 2 are used for cross experiments. The exclusion of line 2 was not intentional but was due to the late completion of F 3 for that line.
There are multiple ways to carry out allelism tests but the principle is that ambiguity needs to be resolved. Figure 4 shows the diagram of crosses used and their results. It is clear that there is no evidence that the six lines, represented by white circles, resulted from different mutations because the percentages of z/z offspring from crosses among them are similar to their parental lines. The crosses between lines 1 and 3, lines 19 and 30, and lines 12 and 24, resulted in significant test results, but after adjusting multiple tests as described in the Statistical Method section, only the cross between 1 and 3 is significant at the 10% level. Therefore, two mutations are identified, the first one includes line 3 with the z/z percentage being 0, and the second one includes the other 13 lines tested with an overall z/z percentage of 0.0171.
The number of identifiable mutations of relatively high lethality depends on the extensiveness of lines used in the allelism test. Although we were interested in determining as many mutations as feasible, it becomes too laborious when the number of lines to be crossed is large. Therefore, a compromise had to be made, which was achieved progressively. Figure 5 shows the mean minimum z/z percentage for lines in a family that are not subjected to the allelism crossing experiment. It can be seen that after the initial 3000 families, the minimum was raised from approximately 8% to approximately 12% for a short period of time. The practice was deemed impractical, and the minimum then dropped back to approximately 7%. The overall average is 8.2%.
Despite the fact that lines with a recessive lethality of more than 92% were generally subjected to the allelism test, there is an increasing level of difficulties and ambiguity in determining the identity of mutations with decreasing lethality. As a result, we are cautious about those mutations identified with a recessive lethality lower than 95%. To be conservative, yet allowing for sufficient contrast, we will focus on those with lethality of 97% or greater. The mutation patterns identified through the allelism test depends on how stringent the criteria is, which is controlled by the value of a as described in the Materials and Methods section. Because a is the overall error rate, the larger it is, there will be more independent mutations declared. Table 2 provides the summary of mutations identified under different lethality intervals under two values of a. Both the number of mutations and the number of mutants are greatest in the lethality interval [97%, 98%), followed by the lethality interval [98%, 99%). Overall, the pattern appears to agree with that of the z/z percentages shown in Figure  3, where the second percentile is the most frequent mutant type and is followed by the third. As predicted previously, the number of mutations under a = 0.5 is larger in most cases, particularly for greater lethalities. However, the overall mutation rate does not change significantly in any of the cases. It turns out that subsequent analysis of mutations from different a values also leads to a relatively small difference. Therefore, we will focus on the presentation of analysis resulting from the mutations identified with a = 0.10.
The data analyzed by Gao et al. (2011) is a subset of those with lethality [99%,100%] (or $99%). However, a 5% significance level without adjustment for multiple tests was used, so their data are not strictly a subset of column 1 of Table 2 from either a = 0.10 or a = 0.50. Because the number of crosses in a family increases with the number of likely mutants, the major effect of adjusting for multiple tests is that it makes the declaration of different mutations easier for families with a smaller number of likely mutants and harder for families with many likely mutants. Comparatively, because the mean number of crosses for families with a cross is approximately 17 (data not shown), this translates into about a 3% significance level on average per family, even a = 0.5 still appears slightly more stringent than the criteria used by Gao et al. (2011). It is reassuring that the subsequent inference is rather robust and as a result no qualitative conclusions made previously need to be revoked, which will be seen from the inference in a later section.
It is also useful to consider mutations for given minimum lethality levels. Table 3 provides the distribution under four minimum lethalities (notice that lethality $99% is the same as [99%, 100%]). It should be pointed out that mutation patterns for a given minimum lethality, say 97%, cannot be obtained as a simple aggregation of those in the    Table 3 in which there are families with four mutations, whereas there are none when considering lethality intervals separately.
There are 111 families of size 20 ( Figure 2) and under the lethality level [99%,100%] the collection of mutation patterns is 20 : h1i 11 h2ih3ih17ih1; 1i 2 which indicates that 11 have one mutation of pattern ,1., one each for patterns of ,2., ,3., and ,17., respectively and two have a mutation of pattern ,1, 1.. The number of mutant families is 11 + 1 + 1 + 1 + 2 = 16 and, thus, the number of nonmutant families is 111 2 16 = 95. Similarly, the mutation patterns for a family of size 20 under lethality level [98%, 99%) and [97%,98%) are, respectively, 20 : h1i 6 h2i 2 h3i 2 and 20 : h1i 8 h2ih3i 2 h14ih19ih2; 10i: The complete mutation patterns of the data corresponding to different lethality levels are given Table A1, Table A2, and Table A3 in the Appendix. For the sake of space, mutation patterns for given minimum lethalities will not be listed but can be obtained from the authors.

Inference about mutation rates
Although it is desirable to make an inference about the mutation rate for every cell division along germline development, the lack of sufficient resolution in the experimental data and the prohibiting computational burden limit our inference to six or fewer intervals. As described in the Statistical Method section, the intervals can be represented by a series of integers defining the boundary locations. For example, the sequence 1, 2, 14, and 31 means that there are five intervals: [1, 1], [2, 2], [3,14], [15, 31], and [32, n] where n is the last cell division (e.g., 36). However, because different sperm may experience different numbers of divisions (but at least 36), while regardless of the number, the last five are spermatogenesis, it is more logical to put the last five divisions into the last interval and divisions from the 15th to just prior to spermatogenesis as the fourth. This interval definitions is conveniently represented by 1, 2, 14, 25. indicates the numbers of z/z and total offspring of the cross, respectively. Three crosses, between lines 19 and 30, between 12 and 24, and between lines 1 and 3 are individually significant (with Ã and ÃÃÃ representing, respectively, significance at 5 and 1% level). After adjusting for multiple tests, all three crosses are significant at the 50% significance level but only the cross between lines 1 and 3 remains significant at the 10% significance level.  n The evaluation of the likelihood function makes use of coefficients computed from simulated samples from the germline population, which is dependent on the assumptions about the population dynamics. Let N(i) be the population size of germline lineage after the ith cell division. The assumptions about N(i) used in Gao et al. (2011) are provided in Table 1 in Fu (2013) and one assumption that has significant impact on the analysis is about N(8). Previous knowledge Lee 1995, 1998) suggests that after the eighth division, about four to six cells move to the posterior region and become the primordial germ cells (PGCs). For a relatively small number of PGCs, are the PGCs a random sample out of the 256 cells available after the eighth division, or are they more closely related being descendants of an ancestral cell a few divisions earlier, or something in between? We will refer to this factor as sampling bias leading to N(8). This issue can be investigated effectively by examining the value of N(5), because after three divisions a cell in N(5) will have eight descendant cells in N(8), making it possible that all the PGCs are derived from a single cell in N(5). On the other hand, if N(5) is set to 32, it is equivalent to assuming that N(8) is a random sample out of the 356 cells.
The maximum number D of cell divisions is one factor that impacts the likelihood analysis. Because 36 cell divisions is the minimum and young mature males were used in the experiment, D was set to 36 as in Gao et al. (2011), although large values also were examined to some extent. As pointed out in the Experiment section, the likely range of cell divisions for sperm leading to F 1 offspring is between 36 to about 40. Hence, we conducted likelihood analysis with D from 36 to 42, and found that D = 38 yields the overall best fit, which is significantly better than the case with D = 36, but goodness of fit declines gradually after D . 40. Even so, the pattern of likelihood estimates differ only marginally for D in the range between 36 and 42 and the impact on the interpretation of inference is relatively low. Because we are reasonably confident that the majority of sperm used in the experiments are from young males, we shall report results throughout the paper with D = 38.
Assumptions on the dynamics of the germline population Although we are primarily interested in the distribution of mutation rates during germline development, it is important to evaluate the impact of the assumptions about the dynamics of the population on the inference. One such assumption concerns how the four to six cells are selected after the eighth cell division. The primary analysis in Gao et al. (2011) assumed that these cells are randomly drawn from 256 cells which resulted from the eighth cell division. Although this does seem consistent with the Drosophila embryonic development literature (Sonnenblick 1965), the impact of biased sampling were evaluated to some extent. We carried out more extensive analysis using the full data set.
We first investigated the impact of the values of N(5) and N(8) on the value of the likelihood function. To make the computation manageable, we divided the range (1232) of N(5) into groups each containing two consecutive integers, and similarly the range (12256) of N(8) into groups each containing three consecutive integers. Because the dynamics of the germline population are properties of the germline and thus should be more or less independent of types of mutations, one uses as much data as possible for judging its goodness of fit. Therefore, we examine the effect of gradually adding more data into the analysis on the strength of conclusions. Table 4 shows the loglikelihood values for a number of combinations of N(5) and N(8) for data under three lethality levels. Table 4 shows that regardless of recessive lethality the maximum likelihood is achieved at the combination of N(5) = 5~6 and N(8) = 7~9. Because twice the difference is the value of the log-likelihood ratio test (with one degree of freedom when one of the N(5) and N(8) is fixed), it follows that for 98% and 97% recessive lethality, any other combination of N(5) and N(8) can be rejected at the 1% significance level. For 99% recessive lethality, the resolution is less, but all other combinations except N(5) = 5~6 and N(8) = 10~12 can be rejected at the 5% level (including the random sampling). These results imply that N(5) is about 526, which represents a sampling for the eighth population that is far more restrictive than random sampling, which corresponds to N(5) = 32. Note that Gao et al. (2011) categorically defined N(5) = 4 as mild sampling bias largely due to mistaking the value as the number of ancestral sequences after the fifth cell division. Given the amount of reduction from 32 to 526, its classification as modest bias (at least) appears to be in order. These results also suggest that N(8) = 4 2 6, based on various experimental observations, appears to be conservative. Hence, a slightly larger number for N(8) may be more the norm.

Estimates of mutation rates
In the previous section, we established that N(5) = 5~6 and N(8) = 7~9 is the best assumption for the germline population dynamics; therefore, we will use this assumption for subsequent analysis. Because we investigate the mutation rates for different lethality levels, it is desirable to know whether a slight deviation of the optimal parameters will lead to a significant difference in the subsequent mutation rate estimation and hypothesis testing. We performed full likelihood analysis for several lethalities around N(5) = 5 2 6 and N(8) = 7 2 9 and found that indeed the impact is rather marginal unless the deviation leads to a substantially smaller likelihood value. Table 5 lists, as examples, the maximum likelihood estimates of mutation rates for combinations of N(5) and N(8) that differs no more than one step from the n optimal. As can be seen, overall the estimates are rather similar to those under the optimal, but there are some notable differences in u 2 for cases two steps away from the optimal; but then the difference of likelihood to that of the optimal is substantial. Therefore, we are confident that conclusions based on the optimal N(5) and N(8) will be robust. Table 6 gives the estimates of u for three recessive lethalities for 38 cell divisions and the optimal combination of N(5) and N(8). The most striking pattern is that the mutation rate for the first division varies considerably, but regardless of how the data are examined, it is markedly larger than those for subsequent divisions. Similar to the pattern observed in Gao et al. (2011), the spermatogenesis has a relatively higher mutation rate than the interval divisions. With data aggregation, there appears to be a trend of appreciable mutation rate for the second division, and a similar pattern is observed for the stem cell stage although the magnitude is less appreciable. The cleavage stage, excluding the first (and perhaps also the second), harbors the smallest mutation rate. The ratio of the first division mutation rate to the mean mutation rate of the internal divisions are all larger than 100, with the highest ratio of 620 for lethality [97%, 98%).

Mutation rate hypotheses testing
As defined in Materials and Methods, m = (u 1 , u 2 , . . . u I ) with u i being the mutation rate per cell division in the ith interval. Then different constraints on the rates will affect the estimates of m and the associated log-likelihood values are used as the basis for testing the hypotheses about the mutation rates. The following nine hypotheses will be considered: All the hypotheses except H 4b are labeled the same as those in Gao et al. (2011) andFu (2013). Table 7 gives the values of these tests against the H 8 . As in Gao et al. (2011), the assumption of equal mutation rate during germline development is soundly rejected and the evidence is stronger with data aggregation. As mentioned previously, with data aggregation there appears to be a trend that the second division and the stem-cell stage also have appreciable mutations (Table 6), but the log-likelihood ratio tests in Table 7 does not provide significant support for the trend. In fact, there is no significant evidence to reject the hypothesis that all internal divisions, that is, from the second until right before gametogenesis, share the same mutation rate.
To investigate the sensitivity of our inference with regard to possible sporadic inclusions of families with pre-existing mutations, we can exclude a certain fraction of families with a high percentages of mutants among offspring. However, it is not easy to decide the proper fractions to use, so a conservative approach was taken by removing all families with a percentage of mutants exceeding a given threshold value. Table 8 shows the results of the likelihood ratio tests for two different thresholds. At a 90% threshold value, for example, families of size 20 with 18 or more mutants were excluded, and families of size 30 with 27 or more mutants were excluded. As expected, the number of families excluded increases with the width of the lethality interval for a given threshold value, and decreasing the threshold from 90% to 85% results in doubling the number of families being removed. For the latter threshold value, as high as 20% of mutant families were removed. Even with such extreme exclusions, all the cases with significant test results shown in Table 7 remain the same.
So far we have grouped the five cell divisions in the spermatogenesis as one interval and thus assumed that mutation rates are constant within the interval. This hypothesis can be investigated by n dividing the gametogenesis into two intervals. Table 9 lists the maximum likelihood estimates for different partitions of the five cell divisions in gametogenesis (to reduce the amount of computation, the third cell division and the remaining cleavage divisions are combined into one interval, since there is little evidence to suggest that their rates are different). It follows that the maximum difference in the loglikelihood value to that for the case of equal rate (i = 0) is 0.7. Therefore, there is little evidence to suggest different mutation rates in the process of gametogenesis, despite the trend that early cell divisions have a greater estimated mutation rate than the later divisions.

DISCUSSION
The full data set from our large-scale mutation screening experiment provides rich information for exploration in much more detail than previously possible for various aspects of the mutation process during germline development. The inference, taking advantage of the improved framework, leads to the following conclusions: (1) mutation rates during germline development are not equal, (2) the first division harbors the greatest mutation rate, (3) gametogenesis has mutation rates greater than other divisions except the first (and perhaps the second as well), (4) after the first cell division, the rate drops rapidly, and after the second division the rate becomes flat throughout the cleavage stage, (5) no evidence of rate difference during the gametogenesis, and (6) the number of PGCs after the eighth division is likely greater than that reported in the literature and they are not derived at random from the 256 cells at that stage nor from one or two ancestral cells at the fifth division. Because of reduced DNA repair efficiency during gametogenesis, greater mutation rate at gametogenesis is expected. Although it is generally known that zygotic control starts toward the end of the cleavage stage which for Drosophila is around the 10214th divisions, current knowledge of Drosophila development does not provide an adequate explanation of why the first division (or with the second division) harbors a much elevated mutation rate compared with the rest of the cell divisions during the cleavage stage, which was pointed out earlier by Gao et al. (2011). However, it is now known that some mutations in the sperm may be repaired in the early cleavage after fertilization (Rathke et al. 2014), so a portion of observed mutations in the early cleavage may be the result of incomplete repairing of preexisting mutations. If all early cleavage mutations are derived as such, one would expect that the mutation rate for the first cell division would be similar to or smaller than that of gametogenesis; therefore, the much elevated mutation rate for the first cell division remains to be illuminated biologically.
To safeguard our conclusions against artifacts in both the experiment and inference, we also carried out analysis with combinations of parameters deviated from the reported optimal set. Examining the impact of assumptions on the dynamics of population size is one such effort, another is to examine the impact of sporadic pre-existing mutations of high lethality that had managed to escape our surveillance. Such mutations would be identified by the experiment as ones that lead to 100% mutant offspring (if the experiment was perfect) or close to 100% mutant offspring if the z/z percentages in a few lines fluctuate upward to escape both the initial screening and subsequent allelism tests. Our analysis (Table 8) shows that the aggressive removal of families with a high percentage of mutants does not change the main conclusions. Therefore, the possibility of some sporadic pre-existing mutations escaping our surveillance cannot be the primary cause of the sharp contrast of mutation rates along the germline development. The experiment and subsequent dissection of mutations was not perfect, which led to our investigation of the impact of the overall significance level on assigning mutants into mutational groups and subsequent inference. Again, all the main conclusions remains intact. One can conclude that the consistency of conclusions under various adjustments come from overwhelming information (both quantity and quality) in the data.
The greater resolutions in data from 98% and 97% recessive lethality have resolved some ambiguity previously encountered but also reveals a striking pattern. Although the rate for gametogenesis increases from 1 to 1.5 and 1.9, the rate increase of the first cell division is much more profound, from 4 to 30 for 98% lethality and 60 for 97% lethality. Although some bias might have been introduced in the delineation of mutations from cross data, the impact of such factors is likely modest at best. This is because the same analysis was conducted for mutation patterns generated by using two different a values (0.5 and 0.95) and the conclusions remain mostly intact except for some relatively small changes in numerical values. Therefore, we are confident that the elevated mutation rate for the first cleavage with decreasing lethality is beyond reasonable doubt. However, there is an increasing discrepancy between the overall mutation rate estimated n from likelihood analysis and that from the classical estimator when the rate is significantly higher than 1%. We are not certain about the cause of this discrepancy but again different criteria for delineating cross data are not the major cause. This is an issue that deserves further investigation and one possible reason may be due to increasing inaccuracy of approximating the larger probabilities of mutation patterns used in the likelihood analysis. If this is true, it implies that the mutation rate for the first cell division might have been underestimated. Nevertheless, the qualitative conclusions from the current analysis are unlikely to be altered significantly. The pattern of the ratio of the mutation rate of the first cell division and the mean rate of the interval divisions also deserve to be examined carefully. The lower ratio for greater lethality level ($99%) than those of lower lethality, say [97%,98%) for example, is apparent. One possible cause may be that a significant number of mutations, which are completely lethal or nearly lethal, have a dominant effect and the degree of dominance increases with the lethality in general. When the lethality is reduced, the second division starts to harbor a greater mutation rate, which also helps to lower the ratio. Another way to look at the phenomenon is to examine the average cluster size of mutations for different levels of lethality. Table 2 shows that on average (i.e., n m /m t ), mutations of lethality $ 99 is 2.2 whereas the average for lethalities of level [98%, 99%) and [97%, 98%) are 8.4 and 12.8, respectively. In general, mutations leading to smaller clusters tend to occur later than those to larger clusters and thus the selective disadvantage of mutations of high lethality may be one reason which prevents them from reaching a high frequency in the cell population within a host.
It is clear that our experimental data have the resolution to test the validity of some assumptions about the dynamics of the population. The number of cell divisions for the sperm in the parental stage was initially thought to be 36, but after examining the experimental procedure carefully, we realized that by the time the b/z males were introduced into the parental stage, additional two to three stem cell divisions might have occurred because on average it takes 32 hr for a stem-cell cycle (Wallenfang et al. 2006). Furthermore, each male can only mate a limited number of times during 24 hr; thus, the offspring in the F 1 may result from sperm of an even wider range of ages. Therefore, sperm in the Parental stage likely range from a minimal of 36 cell divisions to approximately 40 divisions. Indeed, the likelihood analysis suggests that 38 cell divisions provide the best overall fit. The population size after the eighth division was thought to be in the range of four to six as reported in the literature Lee 1995, 1998), but the best fit suggests that may be a slight underestimate, and a more appropriate range should be 729. Although the aforementioned two quantities have previous knowledge to judge their validity, little can be found from the literature about how the PGCs after the eighth division are derived or selected among the 256 available cells. Constraints of physical space and the tendency that recently derived cells tend to cluster near each other suggests that the PGCs may not be a random sample from the 256 cells as assumed by default. This analysis provides strong evidence that they are derived from a relatively small number of ancestral cells at the fifth division, but the assumption that their common ancestor is a cell at the fifth generation can be soundly rejected.
Regardless of the mechanism leading to the increased mutation rate ratio with decreasing lethality between the first division and gametogenesis, the trend is clear, and one can envisage that for neutral or nearly neutral mutations, the ratio may be even greater. This prospect lends stronger support to the explanation in Gao et al. (2011) concerning a lower ratio of male vs. female mutation rate than expected due to the difference in the number of cell divisions between sperm and eggs. This study further suggests that the rate differential between the first one or two with the remaining cell divisions can differ for mutations of different nature (here different lethalities). It is conceivable that this may be true for different genes/regions in the genome as well. Human genetic disease cases (Vogel and Motulsky 1997) as well as DNA sequence data (Miyata et al. 1987;Shimmin et al. 1993;Li 1997;Crow 2006) has led to the conclusion known as male-driven evolution, i.e., males dominate females in generating inheritable mutations in evolution. For 30-yr-old human males and females, there are roughly 400 and 30 cell divisions leading to sperm and eggs, respectively (Drost and Lee 1995), so the expected ratio of male-to-female mutation rate is larger than 10; however, estimates from sequence data vary and in general are lower than this ratio. Several possible causes have been put forward for this apparent discrepancy, but if the first (or first and second) cell division has a mutation rate many-fold larger than the average mutation rates for the remaining cell division, the phenomenon becomes easily explainable. For example, suppose the ratio of the mutation rate of the first cell division and the mean of the internal divisions is 100, then the male to female ratio would be expected to be (100 + 399)/(100 + 29) % 3.9 and if the ratio is 500, then the male to female mutation ratio would be about 1.7. Because different mutation types or regions may exhibit rather different ratios between the first division and the average, the ratio between male-to-female mutation rates can be expected to be quite variable.