How Good Are Indirect Tests at Detecting Recombination in Human mtDNA?

Empirical proof of human mitochondrial DNA (mtDNA) recombination in somatic tissues was obtained in 2004; however, a lack of irrefutable evidence exists for recombination in human mtDNA at the population level. Our inability to demonstrate convincingly a signal of recombination in population data sets of human mtDNA sequence may be due, in part, to the ineffectiveness of current indirect tests. Previously, we tested some well-established indirect tests of recombination (linkage disequilibrium vs. distance using D′ and r2, Homoplasy Test, Pairwise Homoplasy Index, Neighborhood Similarity Score, and Max χ2) on sequence data derived from the only empirically confirmed case of human mtDNA recombination thus far and demonstrated that some methods were unable to detect recombination. Here, we assess the performance of these six well-established tests and explore what characteristics specific to human mtDNA sequence may affect their efficacy by simulating sequence under various parameters with levels of recombination (ρ) that vary around an empirically derived estimate for human mtDNA (population parameter ρ = 5.492). No test performed infallibly under any of our scenarios, and error rates varied across tests, whereas detection rates increased substantially with ρ values > 5.492. Under a model of evolution that incorporates parameters specific to human mtDNA, including rate heterogeneity, population expansion, and ρ = 5.492, successful detection rates are limited to a range of 7−70% across tests with an acceptable level of false-positive results: the neighborhood similarity score incompatibility test performed best overall under these parameters. Population growth seems to have the greatest impact on recombination detection probabilities across all models tested, likely due to its impact on sequence diversity. The implications of our findings on our current understanding of mtDNA recombination in humans are discussed.


LD vs distance
In the presence of recombination, the strength of linkage disequilibrium between two alleles should decrease with physical distance. A significant negative correlation could, theoretically, indicate recombination. There are several measures of LD, and two have been used for assessing recombination in mtDNA: r 2 and D' (Hill and Robertson 1968;Lewontin 1964). Both are dependent on allele frequencies but with slightly different properties. They are described by D' = D / Dmax (Lewontin 1964), and r 2 = D 2 / pApapBpb (Hill and Robertson 1968), where D is the linkage disequilibrium coefficient, D = pAB -pApB; A, B, a and b are alleles; AB is a haplotype composed of alleles A and B; and p is population frequency. For both measures of LD Pearson's correlation coefficient was used, and the statistical significance of the correlation was estimated after 1000 random permutations of the data using a Mantel test, all implemented in the stand-alone version of RecombiTEST (Piganeau et al. 2004).

Homoplasy Test
If more homoplasies (co-occurrence of a polymorphism on separate branches of a phylogenetic tree) occur in a most parsimonious tree than expected by recurrent mutation under a model of clonal inheritance, then recombination may be the most likely explanation (Smith and Smith 1998). This was tested using the Homoplasy Test (Smith and Smith 1998) and implemented in the Linux operating system using a C translation of the original QBasic version, kindly provided by David Posada (University of Vigo). To simulate the process of synonymous site selection, a step recommended by the authors to control for the compounding effects of selection on recombination detection, a second file was generated for analysis, using every third base pair of the 6854bp-long simulated sequence.

Max Chi Squared
The Max Chi Squared method compares the arrangement of segregating sites between 2 sequences either side of a putative recombination break point, with all other sequences in the alignment (Smith 1992  The NSS describes the extent of clustering of compatibilities (either compatible or incompatible) of adjacent informative sites in a sequence alignment (Jakobsen and Easteal 1996). Two sites are said to be compatible only if their history includes no recurrent or convergent mutation, otherwise they are incompatible. Higher NSS values than expected by chance can be explained by recurrent mutation, gene conversion or recombination. Significance of the observed NSS is achieved by randomly permuting the order of informative sites 1000 times, and determining the fraction of random scores that are at least as high as the observed data.

Pairwise Homoplasy Index
The PHI measures the mean refined incompatibility score between sites within a window of sequence of preset length, and reflects the minimum number of homoplasies on any tree required to describe the genealogical history of a pair of sites (Bruen et al. 2006). Compatibility is negatively correlated with recombination. If recombination is responsible for homoplasies, PHI scores should be lower than if recurrent mutation is responsible, as recurrent mutation is not correlated with physical distance. An estimate of the statistical significance of the PHI was achieved by randomly permuting site positions in the alignment (simulating no recombination) 1000 times, and calculating the proportion of times the permuted PHI score is less than or equal to the observed score.
C versions of Max Chi Squared, NSS and PHI was run in linux and can be downloaded from http://www.maths.otago.ac.nz/~dbryant/software.html.