Genome-Wide Association Study of Down Syndrome-Associated Atrioventricular Septal Defects

The goal of this study was to identify the contribution of common genetic variants to Down syndrome−associated atrioventricular septal defect, a severe heart abnormality. Compared with the euploid population, infants with Down syndrome, or trisomy 21, have a 2000-fold increased risk of presenting with atrioventricular septal defects. The cause of this increased risk remains elusive. Here we present data from the largest heart study conducted to date on a trisomic background by using a carefully characterized collection of individuals from extreme ends of the phenotypic spectrum. We performed a genome-wide association study using logistic regression analysis on 452 individuals with Down syndrome, consisting of 210 cases with complete atrioventricular septal defects and 242 controls with structurally normal hearts. No individual variant achieved genome-wide significance. We identified four disomic regions (1p36.3, 5p15.31, 8q22.3, and 17q22) and two trisomic regions on chromosome 21 (around PDXK and KCNJ6 genes) that merit further investigation in large replication studies. Our data show that a few common genetic variants of large effect size (odds ratio >2.0) do not account for the elevated risk of Down syndrome−associated atrioventricular septal defects. Instead, multiple variants of low-to-moderate effect sizes may contribute to this elevated risk, highlighting the complex genetic architecture of atrioventricular septal defects even in the highly susceptible Down syndrome population.

ABSTRACT The goal of this study was to identify the contribution of common genetic variants to Down syndrome2associated atrioventricular septal defect, a severe heart abnormality. Compared with the euploid population, infants with Down syndrome, or trisomy 21, have a 2000-fold increased risk of presenting with atrioventricular septal defects. The cause of this increased risk remains elusive. Here we present data from the largest heart study conducted to date on a trisomic background by using a carefully characterized collection of individuals from extreme ends of the phenotypic spectrum. We performed a genome-wide association study using logistic regression analysis on 452 individuals with Down syndrome, consisting of 210 cases with complete atrioventricular septal defects and 242 controls with structurally normal hearts. No individual variant achieved genome-wide significance. We identified four disomic regions (1p36.3, 5p15.31, 8q22.3, and 17q22) and two trisomic regions on chromosome 21 (around PDXK and KCNJ6 genes) that merit further investigation in large replication studies. Our data show that a few common genetic variants of large effect size (odds ratio .2.0) do not account for the elevated risk of Down syndrome2associated atrioventricular septal defects. Instead, multiple variants of low-to-moderate effect sizes may contribute to this elevated risk, highlighting the complex genetic architecture of atrioventricular septal defects even in the highly susceptible Down syndrome population. KEYWORDS congenital heart defect trisomy birth defect complex trait aneuploidy Congenital heart defects (CHDs) comprise the most common birth defect and the largest contributor to infant mortality and morbidity (Moller et al. 1993;Boneva et al. 2001;Hoffman and Kaplan 2002;Cleves et al. 2003;Reller et al. 2008;Hoffman 2013;Shuler et al. 2013). CHDs represent a diverse group of structural and functional abnormalities of the heart that occur during early embryogenesis. With an incidence of nearly 1%, CHDs pose a serious global health concern and cause significant financial and social burden (Waitzman et al. 1994;Van Rijen et al. 2005;Russo and Elixhauser 2007) which remains despite major advances made to improve diagnoses and treatment (Fahed et al. 2013).
Trisomy 21, the cause of Down syndrome (DS), has a birth prevalence of one in 700 and is the most common chromosomal aneuploidy that survives to term. Nearly 50% of newborns with DS have some form of CHD (Freeman et al. 1998(Freeman et al. , 2008. One of the common types of CHD associated with DS is atrioventricular septal defect (AVSD) or atrioventricular canal defect, a severe structural anomaly that requires surgery at a very young age. With a birth prevalence of 0.83 in 10,000 live births, AVSD is rare in the general population (Hartman et al. 2011); however, in the trisomy 21 population, the associated risk is increased by 2,000-fold, occurring in about 20% of individuals with DS having AVSD (Freeman et al. 2008). Among all AVSD cases, more than 65% occur in children with DS (Ferencz et al. 1989). Yet despite the dramatically increased risk of AVSD among children with DS, 80% of infants with DS do not have AVSD, and nearly half of them do not have any CHD.
Together, these epidemiologic observations suggest that although trisomy 21 predisposes the heart to form abnormally, genetic variants on chromosome 21 (ch21) or other chromosomes may act to modify the risk of developing an AVSD. Thus, individuals with DS or trisomy 21 can be considered a "sensitized" population with respect to CHD. The study of this cohort may help reveal CHD susceptibility factors, much as has been done in model organisms (Zwick et al. 1999;St Johnston 2002;Li et al. 2012).
Our original hypothesis was that the genetic architecture of this increased risk may be relatively simple. A few common modifying variants could have a large effect size on CHD in a trisomic background while having little or no effect in a euploid individual. This hypothesis was demonstrated in a recent study using a mouse model of DS, in which the authors showed that mutations in AVSD risk factor genes Creld1 and Hey2 were individually benign on a euploid background but substantially increased risk for septal defects when expressed on a trisomic background or when inherited together in euploid mice (Li et al. 2012). If our hypothesis is correct, these common variants would be expected to produce a stronger statistical signal in a genome-wide association study among individuals chosen from the extreme ends of the phenotypic distribution. Here we report the results of a genomewide association study comparing individuals with DS and complete AVSD (DS + AVSD, cases = 210) with individuals with DS and a structurally normal heart (DS + NH, controls = 242).

Study subjects
The study sample described is the same as that used in Ramachandran et al. (2014) to investigate the role of CNVs in DS-associated AVSD.
Details regarding the recruitment and enrollment methods have been documented previously (Freeman et al. 2008;Locke et al. 2010). Briefly, participants with a diagnosis of full trisomy 21 were enrolled through multiple centers across the United States. Protocols were approved by institutional review boards at each participating center. Written and oral consent were obtained from custodial parents for each participant because most of the subjects themselves were unable to give consent as the result of cognitive deficits. Cases were defined as individuals with DS who had a complete, balanced AVSD documented most often by echocardiogram or surgical reports (DS + AVSD). Control subjects were defined as those with a structurally normal heart (DS + NH), documented by echocardiogram in the vast majority. Individuals with patent foramen ovale or patent ductus arteriosus were included in the control population, because these defects affect structures with different ontology. Only participants whose mother reported being non-Hispanic European Americans were included in the current study.

Genotyping
Genomic DNA was isolated from low passage lymphoblastoid cell lines (between one and four passages) and genotyping was carried out using the Affymetrix Genome-Wide Human SNP 6.0 array at Emory University according to manufacturer's instructions. Genotype calling was performed using the Birdseed algorithm (version 2), as implemented in the Affymetrix Power Tools software (APT 1.12.0). To assess initial quality of arrays, we followed Affymetrix's recommended quality control thresholds: Individual arrays with ,86% call rate, ,0.04 contrast quality control (QC), and mismatched gender concordance were excluded from downstream analyses. These thresholds were selected because genotype calling of SNPs on the trisomic ch21 using standard methods (APT 1.12.0) is unreliable and lowers the overall call rate. Genotype calling for ch21 in trisomic individuals was performed at the University of Pittsburgh using methods similar to those described in Lin et al. (2008).

Quality control steps and statistical analyses
Before conducting association tests, we performed rigorous sample and SNP cleaning. The details of this process are documented in Ramachandran et al. (2014). In brief, QC was done on 471 samples. Nine samples were excluded because of poor data completeness or inconsistent family structure. An additional seven were excluded as outliers defined by principal component analysis performed using Eigenstrat software v4 (Price et al. 2006). The plot of principal component analysis for the trisomic sample set in the final analyses is provided in Supporting Information, Figure S1 ( Ramachandran et al. 2014). The final sample set consisted of a total of 210 DS + AVSD cases and 242 DS + NH controls. Data cleaning was performed in PLINK v1.07 for the non-ch21 autosomal SNPs (855,628 SNPs) (Purcell et al. 2007). SNPs with .5% missing data, minor allele frequency ,5%, as well as those that deviated from Hardy-Weinberg equilibrium (P , 1 Ã 10 25 ) were excluded from analysis. A total of 606,195 autosomal SNPs were retained for downstream genome-wide association analysis, whereas 249,433 SNPs were excluded.
We performed the case-control association test using logistic regression analysis in an additive model adjusting for the top five eigenvectors as covariates, implemented in PLINK v1.07 with a 95% confidence interval for odds ratio (OR). For SNPs on ch21, the association test was conducted in a similar manner (additive model, adjusting for top five covariates) using an in-house R script to account for the four genotype calls per SNP as expected with trisomy. P value of SNPs at the candidate regions was made using LocusZoom (Pruim et al. 2010). The regulatory activity of the candidate regions was visualized through multiple annotation tracks using the University of California, Santa Cruz genome browser (Rosenbloom et al. 2013).
We assessed the power of our study using an additive disease model with a disease prevalence of 20% in the DS population. We assumed an underlying quantitative liability trait, with disease (AVSD) being present when exceeding a threshold on the liability scale. Minor allele frequencies were allow to vary from~0.1 to 0.5, assuming 210 cases and 242 controls. Alpha was set at 0.05 / total number of markers (606,195, i.e., Bonferroni correction was for the total number of autosomal markers excluding those on ch21).

Data availability
Raw genotype data are available in Gene Expression Omnibus (GEO) data respository with accession number: GSE60607.

Association of autosomal (non-ch21) SNPs with AVSD
We first sought to identify common autosomal (non-ch21) SNP variants associated with an increased risk of DS-associated AVSD. We genotyped 452 trisomic individuals, consisting of 210 cases and 242 controls of European Americans ancestry. A complete list of SNPs with P-values , 1 Ã 10 24 are provided in Table S1. The quantile-quantile plot showed good agreement between the observed and expected P-values ( Figure  S2). We did not find any SNPs that exceeded genome-wide significance (P , 8 Ã 10 28 , Figure 1). We had 80% power to detect a common marker that explained 10-15% of variance or an OR greater than 2.0 after Bonferroni correction. We thus conclude there are no common alleles with large effect size (OR . 2.0) that explain the increased risk of DSassociated AVSD.
We identified four regions with suggestive evidence of association at 1p36.3, 5p15.31, 8q22.3, and 17q22 (Table 1). A closer examination of the regions tagged by these SNPs revealed they were in close proximity to genes with roles important for heart development and/or function. Furthermore, multiple annotation tracks from ENCODE indicated strong regulatory activity, providing further evidence of putative function. Within the 1p36.3 candidate region, the strongest signal (rs1698973) was located adjacent to NPHP4, a ciliome gene (Figure 2) (Habbig et al. 2011). Interestingly, recent studies on heart phenotypes in a trisomic background implicate a significant role for ciliome genes in the etiology of DSassociated AVSD (Ripoll et al. 2012;Ramachandran et al. 2014;Li et al. 2015). The second region of interest was at 5p15.31. The strongest signal at this region (rs1428986, P , 1.09 Ã 10 25 ) falls within FLJ33360, a long noncoding RNA gene (lncRNA). lncRNAs are associated with gene regulation, and recent studies point to an emerging role in the pathophysiology of complex human diseases (Wapinski and Chang 2011;Ma et al. 2012). Adjacent to FLJ33360 is the MED10 gene ( Figure 3). Mutations in MED10 have been associated with cardiac defects (Lin et al. 2007). In addition, multiple ENCODE annotation tracks suggest a weak enhancer activity at both 1p36.3 and 5p15.31 regions (Figure 2 and Figure 3). The third candidate region, 8q22.3 (rs3107646 and rs1522707, both SNPs with P , 2.4 Ã 10 25 ), is located next to FZD6, encoding Wnt receptor protein. Wnt signaling plays a key role in cardiovascular physiology (reviewed by Cohen et al. 2008). Moreover, annotations from ENCODE indicate a strong enhancer activity at this region ( Figure S3). The fourth region of interest, at 17q22 (rs7225274, P , 1.2 Ã 10 25 ), is located at an intergenic region. Evidence from multiple annotation tracks suggests a strong regulatory activity. This region includes several binding sites for transcription factors, including GATA proteins and NR2F2 ( Figure S4). Mutations in both of these genes have been associated with CHD, including AVSD (Garg et al. 2003;Al Turki et al. 2014). Nevertheless, the association and ENCODE findings at our top four regions are not genome-wide statistically significant and require replication in an independent cohort. Association of ch21 SNPs with AVSD A case-control association test was carried out separately for 12,584 SNPs on the trisomic ch21. None of the ch21 SNPs achieved genomewide significance. However, two regions, KCNJ6 at 21q22.13 and the PDXK gene at 21q22.3, were noteworthy. SNPs at these locations had the lowest P-values. The strongest signal at KCNJ6 was rs860795 (P , 3.3 Ã 10 24 ) ( Figure 4) and at PDXK, rs2838355, had a P , 1 Ã 10 24 ( Figure 5). Overexpression of genes at these regions has been shown to be associated with DS pathology, including heart defects. Furthermore, evidence from multiple ENCODE annotation tracks indicate strong regulatory activity, possibly an enhancer/promoter function ( Figure 4 and Figure 5). The list of SNPs on ch21 (P , 1 Ã 10 23 ) and the corresponding genomic location from the case-control association data are provided in the supplemental section (Table S2). Nevertheless, the association and ENCODE findings at the two ch21 are not genomewide statistically significant and require replication in an independent cohort.

DISCUSSION
Infants with trisomy 21, or DS, are at increased risk for developing congenital heart defects, especially AVSD. Compared with the general euploid population, individuals with trisomy 21 have a 2000-fold increased risk of developing AVSD. We conducted a genome-wide association study to identify the genetic variants contributing to this phenotype using a carefully phenotyped collection of 210 DS + AVSD cases and 242 DS + NH controls to complement our genome-wide CNV study (Ramachandran et al. 2014). Our study had 80% power to detect common variants with an odds ratio greater than 2.0; none of the SNPs we tested exceeded this threshold. We therefore conclude that the enormous increased risk of AVSD in infants with DS is not caused by a few common variants of large effect, as we originally hypothesized.
We did, however, identify four non-ch21 autosomal regions worthy of replication in an independent sample set. In the first region at 1p36.3, the most significant SNP, rs1698973 (P , 2.07 Ã 10 25 ), is located about 21 kb downstream of NPHP4. NPHP4 is involved in renal tubular development and function. NPHP4 is expressed in heart, kidney, skeletal muscle, and liver, and mutations in NPHP4 have been associated with cardiac laterality defects (French et al. 2012). Interestingly, cardiac laterality defects and renal dysfunction are hall marks of defects in the cilia genes (Hou et al. 2002;Fliegauf et al. 2007). In addition to its role in left-right patterning, compelling evidence from recent studies using mouse models implicate a complex and critical role for the cilium in CHD pathogenesis in a much broader context (Friedland-Little et al. 2011;Li et al. 2015). A gene expression study using a smaller trisomic sample set by Ripoll et al. (Ripoll et al. 2012) showed enrichment for ciliome genes in their DS+AVSD subjects. A recent genome-wide CNV analysis performed by our group using the same cohort as presented in this GWAS analyses showing a suggestive enrichment for rare deletions in the ciliome genes further supports a significant role for cilia genes in normal heart development in humans (Ramachandran et al. 2014).
The second candidate was rs1428986 (P , 1.09 Ã 10 25 ) at 5p15.31. This variant is located within an RNA gene, FLJ33360, and the adjacent protein coding gene, MED10, is located 50 kb downstream. FLJ33360 belongs to the lncRNA class. Functionally, lncRNAs are implicated in diverse aspects of gene expression and protein synthesis, including epigenetic regulation and direct transcriptional regulation (Wilusz et al. 2009;Ma et al. 2012). Recent studies have shown that the expression of lncRNA is tissue-specific and, furthermore, disruptions in lncRNA have been linked to the pathology of many human diseases, ranging from neurodegeneration to cancer (Wapinski and Chang 2011;Gutschner et al. 2013). FLJ33360 is expressed in the heart, but more evidence is required to establish its role in the etiology of CHD. MED10 plays a major role in the transcriptional regulation of RNA polymerase II2dependent genes (Sato et al. 2003). Interestingly, using zebrafish mutants, Lin et al. (2007) have shown that depletion of MED10 causes specific defects in cardiac cushion formation, possibly through Wnt and nodal signaling.
The third region that warrants attention was on 8q22.3. SNPs with the strongest association are located at an intergenic region flanked by BAALC, 45.3 kb upstream, and FZD6, 22.8 kb downstream. BAALC is believed to play a synaptic role and is expressed by neural and hematopoietic cells (Tanner et al. 2001), whereas FZD6 (frizzled class receptor6) protein is detected in multiple tissues, including adult heart, brain, and placenta. Frizzled proteins act as a receptor for Wnt proteins and play a role in signal transduction via the Wnt/beta-catenin pathway (Umbhauer et al. 2000). Wnt signaling has a critical role in the development of the heart (Hurlstone et al. 2003;Alfieri et al. 2010;Marinou et al. 2012).
The fourth region of interest was at 17q22, with the strongest association signal at rs7225247 (P , 1.2 Ã 10 25 ). This variant is flanked by the gene KIF2B, 1.03 Mb upstream, and TOM1L1, 44.7 kb downstream. KIF2B is moderately expressed in heart, and its activity is critical for spindle assembly and chromosome movement, whereas TOM1L1 is involved in signaling pathways (Puertollano 2005;Manning et al. 2007). A region of high regulatory activity is located around 5 kb downstream of this variant, including binding sites for transcription factors, including GATA1, GATA2, GATA3, and NR2F2. The association between GATA proteins and congenital heart defects is well documented (Garg et al. 2003;Raid et al. 2009;Wang et al. 2012;Zheng et al. 2012;Li et al. 2014;Shan et al. 2014). A recent study implicates an association between rare sequence variants in NR2F2 and AVSD in nonsyndromic individuals (Al Turki et al. 2014). All four of the candidate regions showed the presence of histone markers, DNase hypersensitivity clusters, transcription factor binding sites, and nuclease accessible sites indicating regulatory activity. Chromatin segmentation status from ENCODE associates these regions with weak enhancer/promoter activity. Although none of the variants on ch21 reached genome-wide significance, we identified two regions worthy of further investigation in replication studies. The first region on ch21 that showed a suggestive association was at 21q22.13, where the most significant SNP, rs860795, is located in an intron of the KCNJ6 gene. KCNJ6 encodes a G-protein activated inward rectifier potassium channel, expressed in fetal heart, and its overexpression causes altered heart rate (Lignon et al. 2008). Interestingly, functional analyses on trisomic (DS) mouse model (Dp(16)4Yey/+) have implicated a 3.7-Mb "critical region" flanked by the Ifnar1-Kcnj6 region on mouse chromosome 16 (Mmu16) in DS-related heart defects (Liu et al. 2011;Liu et al. 2014). A recent genome-wide association study on conotruncal and related heart defects in a disomic population uncovered suggestive evidence for an intronic SNP rs2267386 (22q13.1) within KCNJ4, a paralog of the KCNJ6 gene (Agopian et al. 2014). The second ch21 candidate region showing evidence of association was at 21q22.3. The most significant SNP in this region, rs2838355 (P , 1 Ã 10 24 ), was located at the intronic region of the PDXK gene. PDXK is involved with vitamin B6 phosphorylation and is expressed ubiquitously. Evidence of significant overexpression of PDXK and its neighboring gene, RRP1B (23 kb upstream), in trisomic subjects   argues for a role of these genes in DS pathology. However, as yet there is no specific association to any heart phenotype (Salemi et al. 2012). Both the ch21 candidate regions appear to have high regulatory activity, with potential enhancer/promoter functions (ENCODE). The Affymetrix 6.0 genotyping microarrays did not have a sufficient marker density to replicate the recent findings of association reported by Sailani (Sailani et al. 2013).
Most of the GWAS studies on heart conducted to date employ a phenotypically heterogeneous sample set, thereby diluting the power to detect specific associations (Larson et al. 2007;Cordell et al. 2013b;Hu et al. 2013;Agopian et al. 2014;Xu et al. 2014). Here, using individuals with DS as a "sensitized" population, we conducted a genome-wide association study on a carefully phenotyped collection of individuals from extreme ends of the spectrum. Although our study represents the largest DS-associated study conducted so far using a homogeneous heart phenotype, we are still underpowered to detect variants of modest to low effect sizes. Replication studies with larger sample sizes are required to confirm the suggestive regions we have identified here. Finally, despite the 2000-fold risk observed, this elevated risk cannot be explained easily, but instead represents a complex interplay with the increased dosage of ch21. Given the complex and multifactorial nature of the disorder, the next feasible approach would be to use whole genome-sequencing to characterize the genetic variants, along with epigenetic interactions and environmental factors in a larger patient cohort.