Genome-Wide Survey of Large Rare Copy Number Variants in Alzheimer’s Disease Among Caribbean Hispanics

Recently genome-wide association studies have identified significant association between Alzheimer’s disease (AD) and variations in CLU, PICALM, BIN1, CR1, MS4A4/MS4A6E, CD2AP, CD33, EPHA1, and ABCA7. However, the pathogenic variants in these loci have not yet been found. We conducted a genome-wide scan for large copy number variation (CNV) in a dataset of Caribbean Hispanic origin (554 controls and 559 AD cases that were previously investigated in a SNP-based genome-wide association study using Illumina HumanHap 650Y platform). We ran four CNV calling algorithms to obtain high-confidence calls for large CNVs (>100 kb) that were detected by at least two algorithms. Global burden analyses did not reveal significant differences between cases and controls in CNV rate, distribution of deletions or duplications, total or average CNV size; or number of genes affected by CNVs. However, we observed a nominal association between AD and a ∼470 kb duplication on chromosome 15q11.2 (P = 0.037). This duplication, encompassing up to five genes (TUBGCP5, CYFIP1, NIPA2, NIPA1, and WHAMML1) was present in 10 cases (2.6%) and 3 controls (0.8%). The dosage increase of CYFIP1 and NIPA1 genes was further confirmed by quantitative PCR. The current study did not detect CNVs that affect novel AD loci identified by recent genome-wide association studies. However, because the array technology used in our study has limitations in detecting small CNVs, future studies must carefully assess novel AD genes for the presence of disease-related CNVs.

genome-wide association studies (GWAS) of large case-control datasets have identified significant association between late-onset AD and SNPs in CLU, PICALM, BIN1, CR1, MS4A4/MS4A6E, CD2AP, CD33, EPHA1, and ABCA7 (Harold et al. 2009;Lambert et al. 2009;Carrasquillo et al. 2010;Hollingworth et al. 2011;Naj et al. 2011). Our recent GWAS of a Caribbean Hispanic cohort has supported the association with SNPs in the CLU, PICALM, and BIN1 genes (Lee JH et al. 2010). However, the pathogenic variants in these novel AD loci have not yet been found.
Copy number variants (CNV) have been associated with several neuropsychiatric disorders, such as autism, schizophrenia, and bipolar disorder (Cook and Scherer 2008;Lee and Scherer 2010). Furthermore, rare duplications of the APP locus are associated with dominant earlyonset AD, which support the possibility of the existence of diseaserelated CNVs in other AD genes (McNaughton et al. 2010). In fact, recently Brouwers et al. proposed that the association between CR1 and AD might be explained by intragenic CNVs that translate into two major CR1 isoforms (Brouwers et al. 2011). To date, there are only two published genome-wide case-control studies that assess the contribution of CNVs to AD in North American populations (Heinzen et al. 2010;Swaminathan et al. 2011). However, in both studies, CNV calls were detected using a single method (PennCNV), and no overall casecontrol differences were observed in the CNV rate, size, presence of rare genic CNVs, or number of genes disrupted by CNVs. The only borderline association was reported by Heinzen et al. for a 500 kb duplication at 15q13.3 affecting the CHRNA7 gene that encodes the neuronal nicotinic cholinergic receptor (P ¼ 0.053, uncorrected for multiple testing) (Heinzen et al. 2010).
To evaluate the contribution of rare genomic variants to risk of lateonset AD, we analyzed a Caribbean Hispanic dataset that was previously assessed in a SNP-based GWAS (Lee JH et al. 2010). We focused our investigation on large rare CNVs that might contribute significantly to disease risk, as was previously demonstrated in other neuropsychiatric disorders (Kirov et al. 2009;Zhang et al. 2009;Glessner et al. 2010). To maximize CNV discovery, we used multiple CNV detection methods.

Sample collection and genotyping
The study was approved by the Institutional Review Boards of Columbia University and the University of Toronto. The Caribbean Hispanic casecontrol dataset, consisting of participants predominantly originating from the Dominican Republic and Puerto Rico, was described previously (Lee JH et al. 2010). Briefly, the dataset included 559 unrelated cases with late-onset AD and 554 unrelated controls similar in age and sex distribution. The mean (SD) age at onset of AD was 80.0 (8.0) years, and the mean (SD) age at last examination of the controls was 78.9 (6.4) years. In both the control and AD groups, 70% of the participants were women. The diagnosis of AD was based on the National Institute of Neurological Disorders and Stroke-Alzheimer's Disease and Related Disorders Association criteria (McKhann et al. 1984).
All DNA samples were isolated from whole blood and were randomly distributed in genotyping plates. All samples were genotyped on Illumina HumanHap 650Y arrays at the same laboratory. The dataset consisted only of samples that previously passed SNP-based quality control procedures (e.g. gender miscalls and relatedness checks) (Lee JH et al. 2010). Our preliminary analysis was done as a blind study, and the affection status of the samples was only disclosed after the CNV detection procedures were completed.
Quality control and CNV detection Raw intensity array data were normalized within and across samples using Illumina's BeadStudio software v.3.3.7. To maximize CNV discovery, we ran four different CNV calling algorithms, QuantiSNP (Colella et al. 2007), iPattern (Pinto et al. 2011), PennCNV (Wang et al. 2007), and CNVpartition (implemented in BeadStudio). To obtain high-confidence CNV calls, a stringent CNV dataset was generated by taking the CNV calls by iPattern that were also found by at least one additional algorithm (either PennCNV or QuantiSNP). Specifically, each CNV detected by two methods was merged using the outside probe boundaries (i.e. union of the CNVs) as described previously (Pinto et al. 2010), and it needed to overlap in at least 50% of its length. Previously (using Illumina 1M arrays) we showed that stringent CNVs .30 kb detected by both iPattern and QuantiSNP were confirmed by quantitative PCR (qPCR) to be true events at 95% confidence (Pinto et al. 2010). Here, given the lower resolution of the current 650K array, we assumed that a comparable sensitivity would be able to detect large CNVs (.100 kb). To minimize overestimation of reported boundaries, the third algorithm was only used for support. The fourth algorithm, CNV partition, was used to visualize large CNVs.
Poor quality samples were excluded from the study if they met the following criteria: chip call rate , 97%; log R ratio standard deviation . 0.27; B allele frequency standard deviation . 0.17; and PennCNV wave factor . 0.04 or # 0.04 (Diskin et al. 2008). We excluded CNV calls when they failed stringent quality control (QC) criteria: ,5 probes, ,100 kb size, or low confidence QuantiSNP score (log Bayes factor , 15), as these CNVs were likely to be unreliable at the current array resolution. We also excluded CNV calls within hypervariable centromere proximal bands and those overlapping immunoglobulin regions, as both are known to be prone to artifactual CNV calling and thus false discoveries.
Finally, we removed samples that had an excessive number of CNVs detected by each algorithm (i.e. samples with a number of CNV calls exceeding the third quartile plus three times the interquartile range). The resulting cutoff for the number of CNVs per sample was 67 CNV calls for PennCNV, 35 calls for QuantiSNP, and 35 calls for iPattern. Chromosome X and all CNVs . 1 Mb detected by any algorithm were inspected manually. Samples with excessive aggregate length of CNVs, as well as samples with CNVs . 7.5 Mb (likely karyotyping abnormalities) were visually inspected by plotting their intensities and allelic ratios, and removed from burden analyses (supporting information, Table S1).
For the purpose of burden analysis, CNVs with more than 50% of their length overlapping segmental duplications were discarded; CNVs found in .1% of cases and controls were not considered further. A total of 392 cases (106 males, 286 females) and 357 controls (104 males, 253 females) passed all QC steps and were used in subsequent analyses. The female/male ratio and age at onset in the dataset that passed all QC steps remained similar to original dataset: 70% females, the mean age at onset (SD) of AD cases was 77.1 (8.5) years, and the mean age at last examination (SD) of the controls was 79.5 (6.1) years.

CNV burden analyses
To determine whether cases show a greater genome-wide burden of rare CNVs compared with controls, CNV burden analyses were conducted using PLINK v1.07 and a permutation procedure (onesided, 100,000 permutations) (Purcell et al. 2007). P values were estimated for the number of CNVs per individual (CNV rate), for CNV sample proportion (fraction of samples with one or more CNVs), and for the total or average size ranges of CNV calls. Genome-wide P values were further corrected (Pcorr) for potential global case-control differences in CNV rate and size. CNVs found to be enriched in AD cases compared with controls or found only in AD cases were further evaluated by comparison with the Database of Genomic Variants (DGV), a catalog of CNVs found in control subjects of diverse populations, and by comparison with 5000 Caucasian controls previously used in an autism CNV study (Pinto et al. 2010). However, the controls in the autism study and DGV database were not specifically screened for AD symptoms.

CNV characteristics
Overall, we detected 1774 stringent CNVs with sizes $100 kb in the 392 cases and 357 controls that passed the QC steps (mean size ¼ 252,651 bp; median size ¼ 176,893 bp). This stringent CNV dataset was composed of 932 CNV calls in cases (52.5%) and 842 calls in controls (47.5%). We did not observe significant differences in the number of deletions between cases (n ¼ 397; 22.4%) and controls (n ¼ 367; 20.7%) or in the number of duplications between cases (n ¼ 535; 30.2%) and controls (n ¼ 475; 26.8%). Hence, there was no significant global enrichment between cases and controls for the total number of CNV calls or for deletions or duplications. However, we observed a nominal association between AD and a 470 kb (20.3-20.7 Mb NCBI36/hg18) duplication on chr15q11.2 (x 2 ¼ 3.206; uncorrected one-tailed P ¼ 0.037). This duplication, encompassing up to five genes (TUBGCP5, CYFIP1, NIPA2, NIPA1, and WHAMML1) and flanked by two low-copy repeats BP1-BP2, was present in 10 cases (2.6%) and in 3 controls (0.8%). The dosage increase of the CYFIP1 and NIPA1 genes in AD patients was further confirmed by qPCR ( Figure S1).

Analyses of large rare CNVs
A total of 734 stringent rare CNVs $ 100 kb with a frequency # 1% in the total sample set were observed in our dataset (mean size ¼ 292,240 bp; median size ¼ 200,981 bp), including 277 deletions and 457 duplications (Table 1, Table S2). Three hundred ninety (390) rare large CNVs were detected in 255 cases (65.0%), and 344 of these CNVs were found in 224 controls (62.7%) (case/control ratio ¼ 1.03; P ¼ 0.35) (Table S2). We did not detect significant differences in the distribution of large rare deletions or duplications between cases and controls (Table 1). Furthermore, no significant association with AD was found in the total size of rare CNVs (case/control ratio ¼ 0.94; P ¼ 0.77). Similarly the average size of rare CNVs was not different between cases and controls (case/control ratio ¼ 0.88; P ¼ 0.94) ( Global burden analyses were further extended by stratifying rare CNVs according to size (e.g. .500 kb or .1 Mb) and CNVs with genic content. None of these strategies revealed significant differences between cases and controls (Tables 1 and 2). For instance, the case/ control ratio for all genic CNVs was 0.94 (1.1 for deletions and 0.9 for duplications), and no considerable enrichment was found for CNV size in any range or excess of gene-disrupting CNVs (Tables 1 and 2).

Candidate novel CNVs
We observed 12 CNVs .1 Mb that were detected in eight AD cases and four controls. Six CNVs were found only in AD cases and were observed neither in Hispanic controls nor in the DGV (Table S3), suggesting that they might be novel structural abnormalities with potential functional significance for AD. For example, in case NY1811 (age at onset 73), we observed a 1.9 Mb duplication on chr 2p16.3 ( Figure S2A) encompassing the entire neurexin1 gene (NRXN1) that encodes a neuronal cell surface protein involved in cell recognition and cell adhesion. Genome-wide CNV studies previously implicated NRXN1 deletions in autism and schizophrenia (Ching et al. 2010;Magri et al. 2010). Our study is the first report describing a duplication of NRXN1 in an AD case.
In AD case NY1261 (age at onset 89), we detected a 1.4 Mb deletion on chromosome 17p13.1-2 and a 563 kb deletion on 3p21.31 ( Figure  S2B). Together both deletions affect 90 genes, including several genes implicated in synaptic function (DLG4, NLGN2, CHRNB1, GABARAP, and PITPNM3) (Table S2). In AD case RX1107 (age at onset 88), we detected a 2.9 Mb deletion on chromosome 7q35-q36.1 ( Figure S2C) that disrupts the CNTNAP2 gene encoding the contactin-associated protein-like 2 protein, a member of the neurexin family that mediates interactions between neurons and glial cells. SNPs in the CNTNAP2 were reported to be significantly associated with schizophrenia and bipolar disorder in GWAS studies O'Dushlaine et al. 2011). Intriguingly, variants in CNTNAP2 were also implicated in pseudoexfoliation syndrome (Krumbiegel et al. 2011) among patients who show a selective downregulation of clusterin (CLU) expression in their eyes (Zenkel et al. 2006). Notably, the association between CLU SNPs and AD was confirmed in several studies at a genome-wide significance level (Harold et al. 2009;Lambert et al. 2009;Carrasquillo et al. 2010).
In addition, we generated a list of 29 genes that were affected by CNVs in two or more AD patients, that were not seen in our Caribbean Hispanic controls, and that were absent or rare in the DGV and 5000 Caucasian controls ( Table 3). Some of these genes have potential functional connections to neurological disorders. For instance, in two AD patients, we detected deletions affecting the protein-tyrosine phosphatase receptor-type delta gene (PTPRD), which has been associated with restless legs syndrome (Morris et al. 2010;Yang et al. 2011). One of the deletions (128 kb) removes exon 9 of PTPRD, and the other one (135 kb) removes exon 4. Also, two patients (RM4073 and RM4285) had a 622 kb duplication on chr 5q12.1 affecting five genes, including the NDUFAF2 gene that encodes a chaperone for mitochondrial complex I assembly and that was found to be implicated in attention-deficit/hyperactivity disorder (Lesch et al. 2011). Two other duplications were detected at 3p26.3, disrupting the contactin 6 gene (CNTN6). Structural and sequence variations in several members of the contactin gene family were associated with neuropsychiatric disorders (e.g. schizophrenia and autism) (Fernandez et al. 2008;Burbach and van der Zwaag 2009;Cottrell et al. 2011).

DISCUSSION
We conducted a genome-wide scan for large CNVs ($100 kb) in a case-control dataset of Caribbean Hispanic origin that was previously investigated in a SNP-based GWAS (Lee JH et al. 2010). To generate results with high confidence, we focused on CNVs that were identified by at least two algorithms. We detected 1774 stringent CNVs (Table S4). First, we tested the hypothesis that rare CNVs (#1%) with a potentially strong impact on AD risk in individual patients might contribute to the overall disease risk, as was previously observed in other common neuropsychiatric disorders (Kirov et al. 2009;Zhang et al. 2009;Glessner et al. 2010). However, the burden analyses of rare CNVs did not find significant differences between cases and controls in CNV rate, total or average CNV size, or the number of genes affected by CNVs.
In addition, we conducted a case-control analysis of large genic CNVs, including common variants, using PLINK regional analysis. The only nominally significant result that survived qPCR confirmation was detected for a duplication on chromosome 15q11.2 affecting up to five genes, including NIPA1 and CYFIP1 (P ¼ 0.037). Duplications affecting the NIPA1 and CYFIP1 in control populations are cataloged at the DGV based on four studies (Pinto et al. 2007;Zogopoulos et al. 2007;Itsara et al. 2009;Shaikh et al. 2009) with similar frequencies to our controls (0.5%): this duplication was reported in 24 out of 5056 individuals.
n b Controls from databases that were not specifically screened for AD.
NIPA1 encodes a magnesium transporter associated with early endosomes in neuronal and epithelial cells (Rainier et al. 2003;van der Zwaag et al. 2010). CYFIP1 forms a complex at synapses with the fragile X mental retardation protein (FMRP) and eIF4E (FMRP-CYFIP1-eIF4E complex). FMRP acts as an APP translation repressor (Lee EK et al. 2010), releasing CYFIP1 from the FMRP-CYFIP1-eIF4E complex in response to synaptic stimulation (Napoli et al. 2008). Therefore, unbalanced dosage of CYFIP1 might result in altered APP turnover in AD patients. Of note, this region belongs to a larger region at chromosome 15q11-q13 that has been introduced as one of the most reliable "cytogenetic regions of interest" for genomic aberrations in autism spectrum disorders (Vorstman et al. 2006). It is important that the association between AD and the 15q11.2 duplication be validated in follow-up studies using large case-control datasets.
Our study does not support the previously reported marginal association between AD and the 500 kb duplication on chromosome 15q13.3 affecting the CHRNA7 locus in a genome-wide scan of a North American dataset (Heinzen et al. 2010), which is 9.5 Mb away from the duplication on 15q11.2 discussed above. We observed an equal number of cases (n ¼ 2; 0.5%) and controls (n ¼ 2; 0.6%) with duplications affecting CHRNA7, whereas Heizen et al. detected this CNV in six cases (2%) and one control (0.3%) (Heinzen et al. 2010).
In addition, a higher copy number of a complex multiallelic segment (DGV variation_0316) containing the olfactory receptor genes on chromosome 14q11.2 was reported to be associated with a decrease in age at onset of AD using genotypes obtained from Affymetrix SNP 6.0 arrays (controls were not evaluated) (Shaw et al. 2011). Although this region is 200 kb in size, it is poorly covered with SNPs in the 650Y array used for our study (three SNPs). Therefore, we were unable to assess the contribution of this region to AD in our dataset.
We did not detect CNVs (including common variants) that affect the well-confirmed AD loci reported by large GWAS (CLU, PICALM, BIN1, CR1, MS4A4/MS4A6E, CD2AP, CD33, EPHA1, and ABCA7) (Harold et al. 2009;Lambert et al. 2009;Carrasquillo et al. 2010;Hollingworth et al. 2011;Naj et al. 2011). However, as the array technology used in the current study has limitations in detecting small CNVs, future studies must carefully assess the new AD loci using a qPCR approach to detect small CNVs. For instance, by using multiplex amplicon quantification, a recent study reported that an 18 kb CNV in the CR1 gene is associated with AD risk and could explain the strong association between AD and SNPs at the CR1 locus detected by GWAS (Brouwers et al. 2011).
The limitations of our study are the modest dataset size and the fact that the study was not designed for the comprehensive detection of common CNVs. Several analytical challenges in the detection of common CNVs from SNP-intensity data could lead to a high false negative/positive rate. In general, a case-control setting can only test clusterable common CNVs that are well-tagged by common SNPs and are thus effectively screened by SNP-based GWAS (e.g. CR1 study discussed above (Brouwers et al. 2011)). On the other hand, the unclusterable CNVs could be of a multiallelic or complex nature (e.g. a small deletion within a large CNV duplication) and can only be accurately genotyped using a combination of custom arrays and deep sequencing. Nevertheless, we observed several reliably detected common CNVs that were included in a case-control analysis of genic CNVs (e.g. CNV on 15q11.2). Notably, none of the most significant variations previously detected in our SNP-based Hispanic GWAS (P , 10 25 ) (Lee JH et al. 2010) tag any of the common CNVs identified in the current study.
In summary, in a stringent genome-wide investigation for the global burden enrichment of large rare CNVs, we didn't find any significant difference between AD cases and controls. However, this finding may indicate the requirement of larger datasets to identify the enrichment of any of the above-mentioned CNVs. Similarly, confirmation of the biological significance of several large CNVs found only in AD patients requires further assessment in large cohorts, as well as functional studies. Nevertheless, modest datasets, such as reported here, can be useful for identifying rare variants for further validation in follow-up studies.