Table 5 Predictions of the effect of mutations between Pan NAT1 and NAT2 coding sequences according to PolyPhen, SIFT and PANTHER cSNP Scoring.
PolyPhenSIFTPANTHER cSNP Scoring
HaplotypescDNAproteinScoreaPredictionbScorecPredictiondPSEPePredictionf
Pan NAT1 (reference NAT1*1)g
NAT1*3A789GI263M0.279 (0.91-0.88)B0.08 (3.08, 80)T220POD
NAT1*4T597GI199M0.369 (0.9-0.89)B0.01 (3.07, 81)A220POD
NAT1*5G76AD26N0.377 (0.9-0.89)B0.1 (3.08, 76)T91B
NAT1*7G760CE254Q0.892 (0.82-0.94)POD0.07 (3.08, 80)T455PRD
NAT1*8T341CI114T0.099 (0.93-0.85)B0.06 (3.07, 81)T220POD
NAT1*11A518CE173A0.013 (0.96-0.78)B0.17 (3.07, 81)T30B
Pan NAT2 (reference NAT2*4)h
NAT2*2C578TT193M1 (0.00-1.00)PRD0 (3.07, 51)A456PRD
NAT2*6A514GN172D0.001 (0.99-0.15)B0.26 (3.07, 51)T220POD
NAT2*7G145AE49K0.002 (0.99-0.3)B0.5 (3.07, 50)T324POD
NAT2*8iG191AR64Q1.00 (0.00-1)PRD0 (3.07, 50)A4200PRD
NAT2*9iA72CL24F1 (0.00-1)PRD0 (3.07, 50)A4200PRD
  • a PolyPhen score: probability that a substitution is damaging; sensibility and specificity in brackets.

  • b PolyPhen prediction: “benign” (B), “possibly damaging” (POD), “probably damaging” (PRD).

  • c SIFT score: probability that a substitution is tolerated; median sequence information and number of sequences used for the prediction in brackets.

  • d SIFT prediction: T: “tolerated” (T), A: “affect protein function” (A).

  • e PANTHER cSNP Scoring PSEP (position-specific evolutionary preservation): length of time (in millions of years) of preservation of a position.

  • f PANTHER cSNP Scoring prediction: “probably damaging” (PRD), “possibly damaging” (POD), “probably benign” (B).

  • g The reference Pan NAT1 haplotype used is the basal haplotype in the network of NAT1 sequences (Supplementary Figure S2).

  • h The reference Pan NAT2 haplotype used is the basal haplotype in the network of NAT2 sequences (Supplementary Figure S3). Since NAT2*1 differs from NAT2*4 at a single position located 61 bp downstream the coding exon relative to the stop codon (A934G, Table 3), the two haplotypes likely translate into a similar gene product, so that haplotypes deriving from NAT2*1 could be predicted using NAT2*4 as a reference. Instead, both haplotypes NAT2*8 and NAT2*9 derive from NAT2*7, which differs from the basal haplotype at SNP G145A (E49K, Table 3). Thus, for the non-synonymous mutations defining NAT2*8 and NAT2*9, predictions were performed using NAT2*7 as a reference.

  • i Haplotypes NAT2*8 and NAT2*9 both bear the G145A mutation defining haplotype NAT2*7. Since the prediction tools do not allow the simultaneous specification of two substitutions, we ran the prediction tools for G191A and A72C against NAT2*7 as a reference, instead of NAT2*4.