Table 1 Questions, reasoning and results
No.QuestionsReasoningResults
1How much does the correct classification rate of an RCA increase with an increasing number of used loci?It is hypothesized that an increased number of loci (or alleles) increases informativeness and thus the larger the number of loci the higher the correct classification rate of the assignment (Kalinowski 2002).Increasing the number of SNP and STR loci (see heights of bars with correct color in Figure S1, Figure S2, Figure S3, Figure S7, and Figure 1) increased the RCA correct classification rate. However, the relationship between number of loci and correct classification rate seems to follow a sigmoid curve. This means that the correct classification rate increases more rapidly when loci are added to few compared with many loci. It also means that there was a limit to which relatedness categories could be estimated with >95% (80%) correct classification rate (see question 4).
2How do correct classification rates differ between categories with different degrees of relatedness?With decreasing relatedness, average allele sharing is expected to decrease, while variance increases. On average PO and FS share half, R = 0.25 one quarter, and R = 0.125 one eighth of their genome IBD. Thus, actual differences in mean expected allele sharing decrease with increasing category of relatedness and are more prone to be misdiagnosed. PO dyads may be prone to being misdiagnosed as FS dyads, or vice versa, because both categories share on average half their genome IBD. However PO dyads should have little variance of relatedness, whereas all other categories have increasing variance with decreasing relatedness.Categories of more closely related dyads were assigned with higher correct classification rate or >95% (80%) correct classification rate was reached with fewer loci, respectively (Figure S1, Figure S2, Table S2, Figure 1, and Table 2), with the exception of FS in a scenario with a promiscuous mating system (never reached a >80% correct classification rate; see yellow bars in 2nd column in Figure 1)
3Does the mean MAF of SNPs influence the correct classification rate of the RCA?Loci with greater MAF are considered more informative than loci with lower MAF (Anderson and Garza 2006). Therefore, a set of loci with a high mean MAF may lead to RCAs with a greater correct classification rate compared with a set of loci with a low mean MAF. However, loci with low MAF give better insurance against IBS being confused with IBD, and thus better diagnosis. Therefore the effect of MAF is not certain, and needs to be investigated.The lower the MAF the more loci were required for an RCA to reach >95% (80%) correct classification rate. The effect seemed larger between MAF = 0.05/0.25 than between MAF = 0.25/0.5 (Table 2, Table S2).
4Which relatedness categories can be assigned with acceptable correct classification rate, defined as >80% or 95% of dyads that are assigned to a category are true members of that category?Natural variance in allele sharing is expected to increase with decreasing relatedness. Therefore the correct classification rate is expected to decreases with decreasing category of relatedness (question 2).PO, FS (except promiscuous) and R = 0.25 could be assigned with >95% correct classification rate when the informativeness of genetic markers was sufficient (i.e., enough loci or alleles and/or high MAF for SNPs). The minimum numbers of loci necessary for an RCA with >95% correct classification rate are depicted in Figure S1, Figure S2, Figure 1, and Table 2. In the promiscuous scenario, when the informativeness of genetic markers was high (e.g., 3200 STRs /MAF 0.05 or ≥400 STRs/MAF ≥ 0.25) the number of dyads assigned to FS was small (<10), however, when the informativeness was low (i.e., 50 SNPs/MAF 0.05) on average 4010.3 dyads (SD 1249) were assigned as FS.
In a single simulation using 50,000 SNPs and six relatedness categories, R = 0.125 could be assigned with 81.72% correct classification rate in a monogamous scenario with MAF 0.5 (8th blue bar in subplot (5,4) in Figure S3).
R = 0.125 was assigned with a >80% correct classification rate for some scenarios when the category R = 0.0625 was included in the analyses (Table S4; Figure S7; question 7).
Note that with the population size and parameters used, more than ≥95% of individuals are unrelated, so even if all dyads were assigned to the category “unrelated” the correct classification rate might be >95% [average proportion of unrelated individuals in simulated population with/without R = 0.0625 considered as related: 0.95/0.98 (monogamy), 0.95/0.98 (polygyny), 0.96/0.98 (promiscuity)].
5Does a population’s mating system influence the correct classification rate of the RCA?The kinship composition (proportion of dyads/relatedness category) differs between mating systems. Dyads of some categories are expected to occur less frequently in certain mating systems (e.g., FS in a promiscuous system). If it is true that the ability to distinguish between two categories of relatedness depends upon the pair of categories being considered (question 4) then the performance of the RCA might differ depending on the mating system and so might the minimal number of loci required for an RCA with 80/95% correct classification rate.The minimum number of loci required for an RCA with >95% (80%) correct classification rate differed between mating systems (Table S4 and Table 2). Categories which were not expected to occur frequently had large proportions of false positives and therefore should be ignored in subsequent analyses (e.g., FS in a promiscuous system; Table 2, 2nd column of subplots in Figure 1). The ranges of correct classification rates between single simulations with the same input parameters are presented in Table S2.
6Does the proportion of the population sampled affect the correct classification rate of the RCA?This requires investigation because two opposing processes can be envisaged. First, allele sharing between individuals does not change with increasing proportion of the population sampled. However, the assignment of relatedness categories is based on allele frequencies and thus the correct classification rate of the assignment may depend on accurate allele frequency estimates. The power of allele frequency estimates is expected to increase with an increasing proportion of the population sampled (Figure S6).For RCAs with 3200 SNPs, it appears that, independent of mating system and MAF, the proportion of the population sampled did not influence the correct classification rate of the RCA for PO, FS, and R = 0.25 (data not shown). However, for R = 0.125 and the same number of SNPs, RCA correct classification rate seemed to increase with decreasing proportion of the population sampled (data not shown). A similar observation was made with 400 available SNP loci for categories R = 0.25 and R = 0.125 (third and fourth columns of subplots in Figure S4). Because R = 0.125 rarely reached a >80% correct classification rate (nor did R = 0.25 with 400 SNP loci, question 4) the apparent increase of correct classification rate with decreasing proportion of the population sampled did not influence the conclusions of questions 1 to 4.
Second, the number of dyads in a sample increases exponentially with increasing sample size, with the number of unrelated dyads increasing much faster than that of related dyads (Figure S5). Therefore the proportion of falsely classified unrelated individuals may increase faster than that of correctly assigned related individuals in categories.
7Does excluding or adding certain relatedness categories from consideration alter the correct classification rate of the RCA?Inevitably, some categories of very distant relatives will not be investigated in every study, so decisions must be made about what categories to assess. Compared to this study, fewer genetic markers are recommended to be used by studies in which only two relatedness categories are considered (Wang 2006). For the choice of categories included in the RCA calculations, researchers can consider whether certain categories should be excluded. (i) Particular categories might have low correct classification rates in many studies; for example, the proportion of false positives in R = 0.125 is high and thus the correct classification rate low, so it could be beneficial not to assess this category. (ii) Particular categories may not be expected to occur frequently in the study population. For example, in a population with a promiscuous mating system it might be reasonable not to consider the FS category because FS are expected to occur infrequently.By excluding certain categories (i.e., R = 0.125) from consideration, correct classification rate of the RCA decreased (Table S3). For example, the correct classification rate of R = 0.25 decreased to <80% in all simulated mating systems. Adding an additional category to be considered (i.e., R = 0.0625) appeared to improve the correct classification rate of R = 0.125 marginally to >90% in some scenarios (compare Table S4 and Table 2, and the 4th columns of subplots in Figure 1 and Figure S7).
An alternative way to increase the correct classification rate may be to leave all categories in the assignment and even add more for the calculations but then not use the results of certain categories for inferences. Assessing additional categories may help exclude many false positives.
8Does a combination of SNP and STR markers improve the correct classification rate of an RCA?Many research groups are in transition from STR to SNP markers. More markers (if unlinked) are likely more informative and thus provide higher correct classification rates in RCAs; this should also be true for a combination of SNP and STR markers.A combination of SNP and 20 STR markers improved the results of an RCA when few markers were available or it decreased the required number of SNPs to achieve >95% (or >80%) correct classification rate, respectively (Table 2, Table S3, Table S4, Table S5, Table S6; compare 1st and 2nd, 3rd and 4th, 5th and 6th subplot rows in Figure 1, Figure S1, Figure S2, Figure S4, Figure S7, Figure S8, Figure S9, Figure S10).
9How large is the effect of typing error (due to mutations, allelic dropout, erroneous scoring) on the correct classification rate of an RCA?Typing errors decrease the chance that a dyads is assigned to the correct category because the dyad’s expected and observed allele sharing for the correct category may differ (e.g., no shared allele for PO).A 2% typing error decreased the correct classification rate of an RCA thus increasing the number of loci required for 95% correct classification rate for most categories (Table S5) compared with typing error free data (Table 2). With typing error, some categories could not achieve a 95% correct classification rate anymore under the tested conditions (PO monogamous/MAF0.05, FS polygynous/MAF0.05). The correct classifications rate of categories PO and FS was affected more by typing error than that of more distantly related categories.
10In populations with non-overlapping generations, which relatedness categories can be assigned with >95% correct classification rate?Trans-generational dyads do not coexist in populations with non-overlapping generations. This changes the expected proportions of observed pedigree dyads and may thus impact on the correct classification rates of an RCA, by changing the proportion of false positives/true positives.If generations do not overlap, sampling during a single time-period could not include certain relatedness categories. This leads to fewer pedigree categories being assessed correctly, i.e., only two (unrelated and FS) or three (unrelated, FS and R = 0.25) (Table S6; Figure S8, Figure S9, Figure S10). The average proportions of unrelated individuals were: 0.98 (monogamy), 0.97 (polygyny), 0.99 (promiscuous). Interestingly, the assignment for the FS category had 95% correct classification rate even in promiscuous scenarios (with high enough marker informativeness, Table S6, 2nd subplot column Figure S10).
11What effect does incorporating additional data, such as individual sex, age or mitochondrial DNA (mtDNA) haplotype, have on the correct classification rate of an RCA?Some false-positive results (e.g., FS not sharing mtDNA haplotype or PO differing in age less than the age at sexual maturity) are expected to be excluded when additional information is available which should lead to an increase in RCA correct classification rates.Age and mtDNA haplotype data increased RCA correct classification rates. For example, the mean correct classification rate in a monogamous system increased from 0.729 (genetic data only) to 0.899 (genetic, age, and mtDNA data) for PO and from 0.414 to 0.699 for FS based on 20 STRs (Figure 2). Based on 100 SNPs (MAF 0.5) and in the same scenario, the correct classification rate increased from 0.973 to 0.997 (PO) and from 0.830 to 0.907 (FS; Figure S11).
Age data had a more positive effect on RCA correct classification rates than mtDNA data for the category PO, and mtDNA had a more positive effect than age for the category FS (Figure 2, Figure S11). The correct classification rate of the category R = 0.25 did not seem to change with additional data.
  • RCA, relatedness category assignment; SNP, single-nucleotide polymorphism; STR, single-tandem repeat; PO, parent−offspring; FS, full sibs; R = 0.25 half sibs, grandparent-grandchild, avuncular; R = 0.125 first cousins; IBD, identity by descent; MAF, minor allele frequency; R = 0.0625 half first cousins, first cousins once removed, double second cousins.