Table 3 SNVs distinguishing donor from recipient genomes—Filtering and cross-validation
Rd Reference86-028NP Reference
Total initial variants detected in 91 short-read datasetsa40,39844,591
Short indelsb8181228
Ambiguous lift-over position in reciprocal referencec17125495
Invariant/ambiguous control genotyped22511074
High-frequency mixed genotypee16162
Invariant/ambiguous genotype at lifted-over positionf2261257
Conflict between genotypes in reciprocal alignmentsg312
Final set of “gold-standard” filtered SNVsh35,063
Transforming SNVs (≥1)i10,449
  • SNV, single-nucleotide variation.

  • a-fBecause the sequence reads were aligned to both the donor and recipient genome references, variant calls and filtering were initially carried out independently on the two sets of alignments, one for each reference. This allowed for cross-validation of variant calls and elimination of alignment artifacts.

  • a Total positions with ≥1 alternate allele out of 91 samples aligned to each parental reference (Rd or 86-028NP). Due to selectable markers introduced in the donor strain, genotypes were manually corrected for MAP7-specific variation prior to counting.

  • b The number of short indels in ≥1 clones (predominantly simple sequence repeat variants). These were excluded from further analysis as SNVs.

  • c-fProgressive filters against error-prone and ambiguous calls; report filtered positions that passed the previous filter.

  • c Positions where whole-genome alignment gave two different lift-overs (conversions between recipient and donor coordinates), depending on which reference was used as the query during Mauve alignment.

  • d Positions where parental base calls were invariant or ambiguous (excludingc). The expected pattern is that donor reads would have the reference base against 86-028NP and an alternate base against Rd, while the recipient reads would have the reciprocal.

  • e Positions where >5% of samples gave an ambiguous/mixed call (excludingc-d), removing most error-prone positions but not mixed genotypes arising from transformation.

  • f Positions in which the lift-over position (coordinate the reciprocal alignment) were invariant or ambiguous (excluding c-e), reconciling the variant positions between the two alignments.

  • g Positions passing the above filters,c-f but with ≥1 conflict in the base call made depending on the reference used.

  • h Final set of filtered SNVs. All positions have a valid lift-over to the reciprocal reference, a low frequency of ambiguous/mixed genotypes, and consistent genotypes between both parental control reads and reciprocal alignments of the parental references.

  • i Count of SNVs for which ≥1 of the 72 selected recombinants had a donor allele.