Novel Genes Affecting Blood Pressure Detected Via Gene-Based Association Analysis

Hypertension is a common disorder and one of the most important risk factors for cardiovascular diseases. The aim of this study was to identify more novel genes for blood pressure. Based on the publically available SNP-based P values of a meta-analysis of genome-wide association studies, we performed an initial gene-based association study in a total of 69,395 individuals. To find supplementary evidence to support the importance of the identified genes, we performed GRAIL (gene relationships among implicated loci) analysis, protein–protein interaction analysis, functional annotation clustering analysis, coronary artery disease association analysis, and other bioinformatics analyses. Approximately 22,129 genes on the human genome were analyzed for blood pressure in gene-based association analysis. A total of 43 genes were statistically significant after Bonferroni correction (P < 2.3×10−6). The evidence obtained from the analyses of this study suggested the importance of ID1 (P = 2.0×10−6), CYP17A1 (P = 4.58×10−9), ATXN2 (P = 1.07×10−13), CLCN6 (P = 4.79×10−9), FURIN (P = 1.38×10−6), HECTD4 (P = 3.95×10−11), NPPA (P = 1.60×10−6), and PTPN11 (P = 8.89×10−10) in the genetic basis of blood pressure. The present study found some important genes associated with blood pressure, which might provide insights into the genetic architecture of hypertension.


INTRODUCTION
Hypertension is a common disorder and one of the most important risk factors for cardiovascular diseases (CVD), the leading cause of death worldwide (KEARNEY et al. 2005). Blood pressure (BP) is influenced by both lifestyle and genetic factors (WHELTON et al. 2002). Identification of genes predisposing to hypertension will increase our understanding of the genetic mechanisms and provide the framework to identify potential novel drug targets for the treatment of hypertension and prevention of CVD.
Genetic factors contribute to the variation of BP, with heritability estimates around 40-60% (KUPPER et al. 2005). A previous large-scale meta-analysis of genome-wide association studies (GWAS) has identified 29 BP-associated loci. But the total variance explained by the 29 discovered signals was (only) 0.94% for diastolic blood pressure (DBP) and 0.92% for systolic blood pressure (SBP) (EHRET et al. 2011). This means that many more genetic factors need to be identified.
As gene-based association analysis method can combine genetic information given by all the single nucleotide polymorphisms (SNP) in a gene, it can increase the capability of finding novel genes and obtain more informative results. Gene-based association method has several attractive features. For example, it can substantially reduce the burden of multiple-testing correction, and the extension of the findings to further functional analyses is more straightforward. This method has been used as a novel method complementing SNP-based GWAS to identify disease susceptibility genes (LI et al. 2011).
Based on the publically available datasets, this study presented a statistically robust gene-based association analysis, focusing on finding more relevant genes for BP. Further, we performed gene relationships among implicated loci (GRAIL) analysis, protein-protein interaction (PPI) analysis, functional annotation clustering analysis, coronary artery disease (CAD) association analysis and other bioinformatics analyses to find supplementary information for the identified genes.

Gene-based association analysis
The present gene-based association study used data from the International Consortium for Blood Pressure Genome-Wide Association Studies (ICBP GWAS) (EHRET et al. 2011). Raw data was the downloaded association P values of about 2.5 million SNPs from the initial SNP-based GWAS for SBP and DBP. Study design, subject characteristics, genotyping, data-quality filters and SNP-based association analyses were detailed in the original GWAS meta-analysis publication (EHRET et al. 2011).
Briefly, it was a meta-analysis of GWAS evaluated associations between 2.5 million genotyped or imputed SNPs and BP in 69,395 individuals of European ancestry from 29 studies.
Gene-based association analysis was performed using the GATES (Gene-based Association Test using Extended Simes procedure) method， which was modeled in the KGG software, a systematic biological Knowledge-based mining system for Genome-wide Genetic studies (LI et al. 2011). The extended Simes test integrated functional information and association evidence to combine the P values of the SNPs within a gene to obtain an overall P value for the association of the entire gene. This test was powerful and did not require the raw genotype or phenotype data as inputs. It offered effective control of the type 1 error rate regardless of gene size and linkage disequilibrium (LD) pattern among markers, and did not need permutation or simulation to evaluate empirical significance. In the present gene-based association analysis, data files (for SBP and DBP association analyses) each contained four input variables (the rs number, chromosome, position and SNP-based association P value) for KGG were prepared using the R program. The defined length of the extended gene region was from 2-kb upstream to 2-kb downstream of each gene. LD was adjusted based on CEU genotype data from HapMap release 2 in the analyses. Bonferroni correction (TARONE 1990), the simplest and most conservative approach, was used to adjust for multiple testing in the analyses.

Text-mining-based data analysis
To examine the relationship among these genes in genomic BP regions, we performed a GRAIL analysis (http://www.broadinstitute.org/mpg/grail/) (RAYCHAUDHURI et al. 2009). GRAIL is a text mining tool that identifies non-random, evidence-based links between genes using PubMed abstracts. GRAIL gives a score to each region, which is a statistical significance score that reflects the degree of relatedness among genes at different loci. The inputs were the most associated SNPs in each gene revealed by the gene-based analyses and other BP-associated SNPs reported in previous GWAS (Ehret et al. 2011;Lu et al. 2014 regions, and the list of 75 genes was uploaded as seed genes.

Functional annotation clustering analysis
To gain insights into the functions of the identified genes, we tested the probability of the identified genes clustering into a specific Gene Ontology (GO) term or a particular biological pathway. The functional annotation clustering analysis was performed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID)  (Tarone 1990) was used to adjust for multiple testing in the analysis.

Association with coronary artery disease
As hypertension is one of the most important risk factors for CAD, we evaluated the associations between CAD and the BP-associated genes identified in the gene-based association analysis. Raw data was the downloaded P values from a large-scale GWAS meta-analysis for CAD carried out by the CARDIoGRAM and C4D consortium (DELOUKAS et al. 2013

Gene-based association analysis
In the gene-based association analysis, about 1,248,073 (48.7%) SNPs were mapped onto 22,129 genes on the human genome for SBP and DBP. The QQ plots of genes and SNPs were shown in the supplementary Figure S1 and S2. According to the Bonferroni correction method, the significance level for the gene-based tests was 2.3×10 -6 for each BP measure. Accordingly, 43 significant genes located in 14 loci were found for BP. Among them, 30 were associated with SBP (Table 1), 31 were associated with DBP (Table 2)

GRAIL analysis
We applied the GRAIL text-mining algorithm to investigate connections between genes tagged in the 30 BP loci. GRAIL identified 20 keywords that were commonly associated with the BP candidate genes in the literatures, for example: 'natriuretic',

Protein-protein interactions
We detected PPIs between 75 BP genes in the STRING database. Most of these genes

Functional annotation clustering analysis
The BP genes tend to enrich in the regulation of the blood circulation, circulatory system process, toxin metabolic process and trans-Golgi network GO terms, and heme and transmembrane protein SP_PIR_KEYWORDS categories (Table 3). Twelve BP-associated genes were involved in these categories. We still focused on finding unreported genes in these categories, and two genes were found. They were SCAMP2 in trans-Golgi network GO term (GO: 0005802), and COX5A in heme SP_PIR_KEYWORDS categories.

Association with coronary artery disease
To view whether the 43 BP-associated genes were associated with CAD, we performed gene-based analysis for CAD association using published GWAS dataset.
In this analysis, 45,010 (55.4%) SNPs were mapped onto 10,303 genes on the human genome. The significance level for this gene-based test was 4.85×10 -6 . We found that 15 of the 43 BP-associated genes seemed to be associated with CAD (Table 1 and

Gene-trait association Score
The genetic association information collected from the HuGE Navigator for the 43 detected genes was summarized in Table S1. Several  obtained evidence from at least 4 of the following 5 items: GRAIL analysis, PPI analysis (STRING and Dapple), functional annotation clustering analysis (DAVID), evidence from databases (OMIM, MGI, HuGE), and CAD association analysis.

DISCUSSION
The present gene-based association study identified 43 BP-associated genes.
Bioinformatics analyses and CAD association analysis provided supportive evidence and functional information on the association of several of these genes with BP, e.g.,

12
The evidence taken together from the present gene-based association analyses and bioinformatics analyses also supported the importance of these genes in the genetic basis of BP.
ID1 located in 20q11.21 has not been reported in ICBP GWAS, but the best SNP rs6058197 (P = 2.0×10 -6 ) in the gene was closed to the GWAS significance threshold.
ID1 is an effector of the p53-dependent DNA Damage Response pathway (QIAN and CHEN 2008 In conclusion, the present study took the advantage of gene-based association method to perform a supplementary analysis of the GWAS dataset and found some important BP-associated genes. A series of bioinformatics analyses gave supportive evidence for the gene-based association analysis discoveries. Our findings may provide insights into the genetic basis of hypertension. Table 1 Association results for SBP-associated genes with gene-based P value < Figure 1 Venn diagram of pleiotropic associations of the BP-associated genes. Forty-three genes that achieved the gene-based test significance level of 2.3×10 -6 were included. Among them, 30 were associated with SBP, 31 were associated with DBP, with 18 genes overlap.  Red colored gene symbols indicated genes identified in the present gene-based association study (9 genes), while black indicated previously reported genes.

SBP DBP
22 Figure 4 Protein-protein interactions between BP-associated genes in Dapple. The 3 groups of direct interactions and common interactors between indirect connections were presented. Gray circles were common interactors.