A Genome-Wide Association Study for Nutritional Indices in Drosophila

Individuals are genetically variable for the way in which they process nutrients and in the effects of dietary content on reproductive success, immunity, and development. Here, we surveyed genetic variation for nutrient stores (glucose, glycogen, glycerol, protein, triglycerides, and wet weight) in the Drosophila Genetic Reference Panel (DGRP) after rearing the flies on either a low-glucose or high-glucose diet. We found significant genetic variation for these nutritional phenotypes and identified candidate genes that underlie that variation using genome-wide associations. In addition, we found several significant correlations between the nutritional phenotypes measured in this study and other previously published phenotypes, such as starvation stress resistance, oxidative stress sensitivity, and endoplasmic reticulum stress, which reinforce the notion that these lines can be used to robustly measure related phenotypes across distinct laboratories.

Drosophila DGRP glycogen glucose triglyceride protein weight The quality of dietary nutrition and the assimilation of dietary nutrients have significant influence on many traits, including lifespan (Piper et al. 2005;Piper and Partridge 2007;Skorupa et al. 2008), development (Layalle et al. 2005), reproduction (Fricke et al. 2008), and immunity (Ayres and Schneider 2009;Fellous and Lazzaro 2010;Vass and Nappi 1998). Resources such as the Drosophila Genetic Reference Panel (DGRP) provide a practical means of using natural genetic variation to both untangle the genetic basis of complex traits and understand the intersection of selection and genetics in the maintenance of that variation . The DGRP is a set of approximately 200 D. melanogaster genetic lines that have been genome-sequenced and are available to the community for the mapping of complex genetic traits. Here, we present the results of a genomewide scan for SNPs associated with several nutritional indices measured after rearing on either a low -glucose (1 glucose: 2 yeast) diet or a high-glucose (2 glucose: 1 yeast) diet. We found significant genetic variation for all traits (total soluble protein, glucose, glycogen, free glycerol, triglycerides, and wet weight) and were able to map underlying genes. We additionally note correlations between our nutritional indices and several previously published DGRP phenotypes Jordan et al. 2012;Ayroles et al. 2009;Chow et al. 2013b).

Drosophila stocks and husbandry
We assayed nutritional indices in the DGRP ), a collection of approximately 200 inbred lines of Drosophila melanogaster derived from wild-caught females (2003, Raleigh, NC). Our study utilized 172 of these lines, although not every line was available for every day of the experiment.
Before measuring any phenotypes, each line was reared for at least three generations on two diets that varied in glucose content. The lowglucose diet consisted of 5% weight by volume brewer's yeast (MP Biomedicals, Santa Ana, CA), 2.5% glucose (Sigma-Aldrich, St. Louis, MO), and 1% Drosophila agar (Genesee Scientific, San Diego, CA) supplemented with 800 mg/L methyl paraben (Sigma-Aldrich), and 6 mg/L carbendazim (Sigma-Aldrich). The high-glucose diet was exactly the same but consisted of 10% glucose.
Columbus, OH) and then homogenized in 200 mL buffer (10 mM Tris, 1 mM EDTA, pH 8.0 with 0.1% v/v Triton-X-100) using lysing matrix D (MP Biomedicals, Santa Ana, CA) on a FastPrep-24 homogenizer (MP Biomedicals). We immediately froze 50 mL of the homogenate to be used for the total protein assay and incubated the remaining 150 mL at 72°for 20 min to denature enzymes naturally present in the homogenate. Each nutritional index was assayed using modifications of commercially available kits (see Unckless et al. unpublished data;Ridley et al. 2012): glucose with the oxidase kit (GAGO-20; Sigma-Aldrich); glycogen using the glucose kit and amyloglucosidase from Aspergillus niger (A7420; Sigma-Aldrich) in 10 mM acetate buffer at pH 4.6; free glycerol and triglycerides using reagent kits F6428 and T2449, respectively (Sigma-Aldrich); and soluble protein with the DC Protein Assay (BIO-RAD, Hercules, CA).

Data analysis
Before genome-wide association mapping, we estimated line means for each nutritional index using abundance of metabolite per mg of fly. The model used was: where Y is the estimated mass (mg) per fly of each nutrient divided by the mass of the flies measured in mg (except, obviously, in the case where wet mass is itself the response variable). Wolb i (i = 1,2) indicates endosymbiotic bacterium Wolbachia pipientis infection status, Diet j (j = 1,2) indicates rearing diet, Block n (Diet j ) (n = 1,3) differentiates among the three replicate blocks within each diet, Line k (Wolb i ) (k = 1,2,. . .,172) tests the influence of inbred line on nutritional index nested within Wolbachia infection status (52.2% of lines were infected), and the Diet j · Line k (Wolb i ) interaction term tests whether inbred lines differ in their responsiveness to the two diets. All factors were considered fixed. All models were run in SAS 9.3 (Cary, NC) using the "GLM" procedure and least squares means were extracted. For modeling on each diet individually, the model used was: We also obtained a more holistic view of fly metabolic status by performing a principal component analysis on the collective set of nutritional measures, excluding wet weight, because that is implicitly contained in mass-scaled measures of individual nutrients. This allowed us to distill the higher-order interactions of our nutritional phenotypes into several one-dimensional components. Line estimates for each nutritional principal component were determined using the prcomp function in R (R Core Team 2012) with tol = 0.1 and unit variance scaling turned on. This analysis was completed with flies reared on the two diets considered separately.

Genome-wide association mapping
The set of SNPs for genome-wide association mapping was described in Huang et al. (2014) and consists of only SNPs with minor alleles present in at least four of the lines (MAF .2%; 2,415,518 total SNPs). For genome-wide associations, we formatted this SNP set for PLINKassessed (Purcell et al. 2007) associations between SNP and line estimates from the above models using the "-assoc" flag to perform associations and the "-qt-means" flag for estimates of effect size. PLINK uses an ordinary least squares model for each SNP. These analyses were performed for the high-glucose diet, low-glucose diet, and when data from both diets were pooled. We used a nominal P value threshold of P , 10 26 for declaring SNPs to be significantly associated with trait variation but relaxed this to P , 10 24 for gene ontology enrichment analysis (see below).

GO term analysis
We performed Gene Ontology (GO) analysis corrected for gene size using GOWINDA (Kofler and Schlötterer 2012) to test for the enrichment of particular functional groups in genes bearing SNPs associated with variation in phenotypic traits. Significantly associated SNPs (P , 10 24 ) for each treatment (low glucose, high glucose, main effect) were used as the query set with a background SNP set consisting of all remaining SNPs used in the genome-wide mapping. We used this relaxed P value threshold to increase the number of significant SNPs in this analysis. GO slim (Adams et al. 2000) terms were used to reduce redundancy in GO categories. GOWINDA was run using gene mode, including all SNPs within 1000 bp of a gene, a minimum gene number of 5, and with 100,000 simulations. We report all GO terms with a nominal P , 0.1.

Phenotypic correlations with other traits
We examined correlations among our measured traits, and between our nutritional phenotypes and independent traits that have been measured in the DGRP lines by other research groups. Correlation analyses were performed in R (R Core Team 2012) using our line mean estimates, and we report both correlation coefficient and P value. For significantly correlated traits, we queried whether a single gene or a few genes might drive the correlation by determining whether the same SNPs were significantly associated with variation in both traits with a relaxed P value threshold of 10 25 .

RESULTS
Genetic and environmental variation for nutritional status across the DGRP ANOVA for each nutritional index (both pooled across diets and on each diet individually) is presented in Supporting Information, Table S1. When the data from each diet are analyzed separately, all nutritional indexes showed a significant (or nearly significant) line effect except soluble protein after rearing on the low-glucose diet and triglycerides after rearing on the high-glucose diet (Table S1b), indicating that most traits are genetically variable. When the data from both diets were pooled, all nutritional indices except free triglycerides and glycogen showed a significant effect of rearing diet, with glucose, glycerol, and triglycerides occurring at higher levels in flies reared on the high-glucose diet, whereas glycogen, soluble protein, and total wet mass were lower in flies reared on the high-glucose diet. All nutritional indices showed a significant effect of line. Only wet weight showed a significant interaction between line and diet (Table S1a). In addition, total soluble protein showed a significant effect of Wolbachia infection status (P = 0.047). All phenotypic values are presented in Table S2.

Principal components of nutritional indices
We considered that our nutritional indices might give more information about the metabolic status of the fly when considered in aggregate, so we used a principle component (PC) analysis to extract the top five PCs from the full nutritional data set. The top five principal components summarizing the NIs on each diet each explain 12-31% of the total in nutritional state, with loadings of each NI given in Table S3. Principal component loadings show variation in both sign and magnitude of contribution from each NI, suggesting they capture complex integrations of the nutritional indices to reflect overall metabolic state.

Phenotypic correlations with other traits
We measured correlations between our nutritional phenotypes and several other traits that have been measured in the DGRP and for which the data are publically available (starvation stress resistance, chill coma recovery, startle response, oxidative stress response, endoplasmic reticulum stress) Jordan et al. 2012;Chow et al. 2013b). Table 1 contains the correlation coefficient and P value for each trait combination. Note that for all nutritional indices, we present correlations between other phenotypes and line means estimated when data from both diets were pooled. We did not perform principal components analysis on this pooled data; however, diet-specific principal components were used for the analysis. Several interesting correlations are evident. In particular, starvation stress resistance as measured by Mackay et al. (2012) is correlated with several metabolic principal components and is positively correlated with wet weight (P = 0.005) and with levels of glucose (P = 0.004) and glycogen (P , 0.001). Chill coma recovery, also measured by Mackay et al. (2012), is correlated with two metabolic principal components as well as with wet weight (P = 0.005), levels of glucose (P = 0.004), glycogen (P = 0.048), and protein (P = 0.038). Startle response ) is correlated with two metabolic principal components and with glucose (P , 0.001) and triglyceride (P = 0.003) levels. Sensitivity to oxidative stress, induced by either paraquat or menadione sodium bisulfate (MSB) (Jordan et al. 2012), was positively correlated with glycogen stores (P = 0.029 and P = 0.021, respectively) and wet weight (P = 0.035 and P = 0.025, respectively). Sensitivity to paraquat was also negatively correlated with soluble protein (P = 0.032). Interestingly, several nutritional indices were significantly correlated with time to 50% mortality after endoplasmic reticulum stress (ER T 50 ) (Chow et al. 2013b), including glycogen stores (P = 0.005), glycerol level (P = 0.004), total triglycerides (P = 0.020), as well as PC1 and PC3 on the low-glucose diet and PC1 on the high-glucose diet.
Phenotypic values for male reproductive fitness, male aggression, lifespan, and ethanol tolerance were also reported for a smaller set of 40 DGRP lines (Ayroles et al. 2009). With only 40 lines, we have less power to find correlations with these data, although we do still detect some significant correlations. Male reproductive fitness (proportion of offspring sired during competition for matings with males from a marked stock) is negatively correlated with our measure of soluble protein (P = 0.015) and positively correlated with lowglucose diet PC5. Lifespan is positively correlated with lowglucose diet PC4. Surprisingly, male aggression as determined by Ayroles et al. was negatively correlated with our measure of wet weight (P = 0.044), where we might have naively expected larger flies to be more aggressive. Finally, ethanol tolerance is significantly positively correlated with high-glucose PC4.

Genome-wide association results
SNPs that are significantly associated with variation in each nutritional phenotype (P , 10 26 ) are presented in Table 2 and  Table 3. Overall, SNPs significantly associated with variation in our nutritional phenotypes are disproportionately found as nonsynonymous substitutions or in introns and UTRs, as opposed to synonymous substitutions or positions more than 1000 bp from known genes, relative to the distribution of all variants across the genome. For the nutritional indices, 33 out of 48 (69%) total significantly associated SNPs across phenotypes and diets are found in introns, UTRs, less than 1000 bp from an annotated gene, or as nonsynonymous SNPs. For principal components, this fraction is 17 n  "DNA binding TF activity" is "sequence-specific DNA binding transcription factor activity." b "Nucleo" is "nucleobase, nucleoside, nucleotide, and nucleic acid metabolic process.
of 24 (71%). In contrast, less than half of all SNPs meeting criteria for inclusion in this study are found in introns or UTRs, are less than 1000 bp from an annotated gene, or are nonsynonymous. This enrichment for putatively functional SNPs is significant (x 2 = 6.75, df = 1, P = 0.009 for nutritional indices; x 2 = 4.17, df = 1, P = 0.041 for principal components). For example, across the three mapping strategies (low glucose, high glucose and data pooled across diets), there were seven unique SNPs meeting our threshold for association with glucose levels. Of these, one was synonymous and one was not associated with any known gene. The remaining five mapped SNPs were intronic. For triglyceride levels, all four significantly associated SNPs were intronic. Each SNP that associates significantly with variation in a measured phenotype is given in Table 2, including significance level, estimated effect size, minor allele frequency, type of SNP, and gene functional categorization. No SNPs were significantly associated with more than one distinct nutritional phenotype, even when the significance threshold was relaxed to 10 25 .

Gene ontology analysis for enrichment
To determine whether the SNPs significantly associated with variation in each phenotype were clustered in genes with particular biological functions, we performed gene ontology (GO) enrichment analysis. Across all NIs and all diets, few categories were even nominally significant for enrichment and none was significant after correcting for multiple testing (Table 4). This may not be surprising because GO analysis of mapping results implicitly assumes the "infinitesimal model" of quantitative genetics, where many genes each contribute small but meaningful effects on the overall phenotype. We have no evidence that this is an appropriate model for our nutritional phenotypes, and we expect that, given the sample size of the DGRP, our experiment lacks power to identify SNPs of small effects.

DISCUSSION
We found significant genetic variation for wet weight as well as five nutritional indices (levels of glycogen, free glucose, soluble protein, triglycerides, and free glycerol) in the DGRP after rearing on two different diets that varied in glucose content. Several of these nutritional indices and the principal components describing them jointly are correlated with phenotypes that have been measured by other researchers. Because the complete genomes have been sequenced for all of the lines in the DGRP, we could conduct genome-wide association mapping to identify candidate genes that may influence Drosophila metabolic status in response to diet. We were able to identify genetic correlations among the traits we measured and between our traits and phenotypes measured by independent groups in other studies. Many of these correlations make good biological sense. For example, starvation stress resistance is positively correlated with wet weight and with stores of glucose and glycogen, consistent with a simple interpretation that genotypes that store more nutrients are more resistant to starvation. The correlations among other phenotypes were less intuitive but may motivate followup examination. For example, we found correlations between endoplasmic reticulum stress and several nutritional indices (glycogen, glycerol, triglycerides), suggesting that nutrients play a role in modulating the ER stress response. One concern could be that spurious correlations arise due to variable inbreeding depression among the lines. However, we do not believe this would be a sufficient explanation because at least some of the correlations appear to be negatively correlated with respect to fitness. For example, wet weight was negatively correlated with male aggression (P = 0.044), where we would presume that both greater wet weight and more aggressive males would be more "fit." However, guessing at the fitness value of nutritional indices is obviously difficult. For example, we simply do not know a priori whether flies with more glycogen stores are inherently more or less fit than flies storing less glycogen, and the answer probably depends on the environmental conditions.
Our genome-wide association mapping implicated many genes as explaining natural variation for nutritional phenotypes, and these can be targeted for more thorough follow-up study. One striking pattern is the over-representation of genes involved in nervous system development and behavior. This may be an artifact of the observation that neurological genes tend to be large and therefore provide a larger target for association studies Chow et al. 2013a). Neurological terms were generally not enriched in our GO analysis that controlled for gene size. A majority of significantly associated SNPs were intronic, suggesting that gene expression variation may play a major role in determining variability in nutritional phenotypes. Generally speaking, the mapping results presented here can provide a starting point for further research on these important traits.