Genome-Wide Analyses Reveal Footprints of Divergent Selection and Drought Adaptive Traits in Synthetic-Derived Wheats

Crop-wild introgressions have long been exploited without knowing the favorable recombination points. Synthetic hexaploid wheats are one of the most exploited genetic resources for bread wheat improvement. However, despite some QTL with major effects, much less is known about genome-wide patterns of introgressions and their effects on phenotypes. We used two genome-wide association approaches: SNP-GWAS and haplotype-GWAS to identify SNPs and haplotypes associated with productivity under water-limited conditions in a synthetic-derived wheat (SYN-DER) population. Haplotype-GWAS further enriched and identified 20 more genomic regions associated with drought adaptability that did not overlap with SNP-GWAS. Since GWAS is biased to the phenotypes in the study and may fail to detect important genetic diversity during breeding, we used five complementary analytical approaches (t-test, Tajima’s D, nucleotide diversity (π), Fst, and EigenGWAS) to identify divergent selections in SYN-DER compared to modern bread wheat. These approaches consistently pinpointed 89 ‘selective sweeps’, out of which 30 selection loci were identified on D-genome. These key selections co-localized with important functional genes of adaptive traits such as TaElf3-D1 (1D) for earliness per se (Eps), TaCKX-D1 (3D), TaGS1a (6D) and TaGS-D1 (7D) for grain size, weight and morphology, TaCwi-D1 (5D) influencing drought tolerance, and Vrn-D3 (7D) for vernalization. Furthermore, 55 SNPs and 23 haplotypes of agronomic and physiological importance such as grain yield, relative water content and thousand grain weight in SYN-DER, were among the top 5% of divergent selections contributed by synthetic hexaploid wheats. These divergent selections associated with improved agronomic performance carry new alleles that have been introduced to wheat. Our results demonstrated that GWAS and selection sweep analyses are powerful approaches for investigating favorable introgressions under strong selection pressure and the use of crop-wild hybridization to assist the improvement of wheat yield and productivity under moisture limiting environments.

stability in diverse environments . In conventional wheat breeding, a limited proportion of the genetic variation available in the bread wheat genepool for abiotic and biotic stress tolerance and quality attributes has been explored to maximize genetic gain. Some estimate that 69% of genetic diversity has been lost through domestication and modern breeding (Reif et al., 2005). Re-introducing lost genetic diversity from wheat wild relatives by integration of molecular genetics and conventional breeding technologies is one way to bridge this gap (Rasheed et al., 2018).
Exploiting crop wild relatives for quantitative traits is challenging because the germplasm may be too diverse to be used directly and a prebreeding process will be required. Therefore, genetic approaches such as large-scale and systematic identification and characterization of quantitative trait loci (QTL) through GWAS could help to identify alleles or haplotypes that may not exist in crop cultivars but could be incorporated into the existing elite gene pool (Bevan et al., 2017). Synthetic hexaploid wheat (SHW) developed by hybridizing Aegilops tauschii, D-genome donor to bread wheat (Triticum aestivum L.), is a source of genetic diversity and favorable alleles for tolerance to biotic and abiotic stresses (Börner et al., 2015;Mujeeb-Kazi et al., 1996;Ogbonnaya et al., 2013). Synthetic hexaploid wheat and their advanced derivatives (SYN-DERs) have potential to improve drought-and heat-adaptive mechanisms in bread wheat (Afzal et al., 2017;Reynolds et al., 2005). SYN-DERs have outperformed their recurrent parents under drought stress and have proved to be high yielding compared to non-synthetic wheat in various studies (Jafarzadeh et al., 2016;Lopes and Reynolds 2011;Tang et al., 2017). Despite several GWAS and QTL mapping studies available in synthetic hexaploid wheats (Ogbonnaya et al., 2013;Rasheed et al., 2018), underpinning the recombinations favoring a yield-advantage in SYN-DER is lacking. Unlocking the genetic potential of genetic resources requires whole-genome sequence coverage through nextgeneration sequencing; however large wheat genome-size (16Gb) and incomplete reference genome makes this unfeasible. Alternatively, single nucleotide polymorphism (SNPs) arrays can be used to define genomic regions associated with quantitative traits due to their dense genome coverage (Akhunov et al., 2009;Rasheed et al., 2017;Wang et al., 2014).
SNPs have been extensively used in mapping experiments to identify the genetic loci associated with important agronomic phenotypes. In human genetic studies, haplotype blocks combining two or more SNPs in strong LD are more informative than single bi-allelic SNPs (Stephens et al., 2001). Lorenz et al. (2010) defined haplotypes by several analytical procedures in barley and confirmed that the distribution of QTL alleles in nature was unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. Haplotype based GWAS analyses are still rare in wheat except for few studies Jordan et al., 2015), and have shown promise in Brassica (Wu et al., 2016), barley (Lorenz et al., 2010) and other crop species (Qian et al., 2017). Bevan et al. (2017) argued that haplotypes in progenitors and wild relatives of crops contain a broad range of genetic variation and identification of haplotypes with fewer deleterious alleles and improved phenotypes could accelerate genetic gain in crop improvement. Such GWAS approaches either with bi-allelic SNPs or haplotypes could be combined with genome-wide selective sweeps which identifies selection signatures that are beneficial for crop adaptation. Crop breeding selects favorable alleles and retains them in new cultivars. These signatures of selection can be detected by a cross-population comparison approach (Chen et al. 2010). Several lines of evidence have shown that that genomic regions that exhibit selection signatures are also enriched for genes associated with biologically important traits (Xie et al. 2015). Therefore, detection of selection signatures is emerging as an additional approach to identify and validate novel gene-trait associations (Cadzow et al. 2014).
This study aimed to identify the chromosomal regions under divergent selection in SYN-DER wheat by comparison to modern bread wheat cultivars widely grown in the same environments. We used two GWAS strategies (SNP-GWAS and haplotype-GWAS) to identify marker-trait associations under divergent selection in SYN-DERs.

MATERIALS AND METHODS
Germplasm and phenotyping A total of 240 hexaploid wheat accessions including 171 SYN-DER wheats and 69 modern bread wheat cultivars and advanced lines were assessed (Table S1). Among the modern bread wheat cultivars, 32 were recurrent bread wheat parents used in the development of SYN-DER. These SYN-DERs were developed by crossing primary SHWs with advanced lines and elite cultivars of Pakistan and CIMMYT (see pedigrees in Table S1). More than 800 SYN-DERs were developed initially at the National Agriculture Research Center (NARC), Pakistan. This was reduced to 171 following re-selection for better agronomic characteristics. Field trials were conducted at two locations during the 2014-15 and 2015-16 cropping seasons at the Barani Agriculture Research Institute (BARI), Chakwal (33°40 / 38 // N 72°51 / 21 // E, 498m asl), Pakistan and the National Agriculture Research Center (NARC), Islamabad (33°43 / N 73°04 / E, 579 asl), Pakistan. At each location/year, field trials were conducted in two water regimes i.e., well-watered (WW) and water-limited (WL) conditions. Chakwal is a rain-fed, semi-arid area lies at the beginning of the Potohar plateau. The deep, well-drained soil comprises moderately fine textured particles. It is slightly calcareous, non-saline, with pH 7.6 and an electrical conductivity of 0.32 dS/m (Islam et al. 2012). Average rainfall during cropping season in 2014 was 20 mm and was 22.14 mm in 2015. A field experiment with an alpha lattice design was set up in both WW and WL conditions. The first environmental condition was established as wellwatered (WW) in which all genotypes with two replications were planted in the field. Three irrigations were given to well-watered (WW) plants, and soil moisture was maintained at field capacity (100%) until harvest. The second environmental condition was waterlimited (WL) in which all genotypes were planted with two replications in a polyethylene tunnel supported by an iron frame to provide shelter from precipitation. A 1-m deep ditch surrounded the tunnel to prevent any seepage of rainwater. Irrigation was stopped at the end of tillering through the completion of the flowering stage. The crop growth stages were determined using the Zadoks scale (Zadoks et al. 1974). Two 2-m long rows were sown with each genotype with two replications, maintaining an inter-row spacing of 30 cm for both treatments. The sowing date was November 20 in both years. To establish uniform stands, we sowed 30 viable seeds of variable kernel mass with the help of small plot grain drill for each row. To ensure sufficient nutrition, standard agronomic practices were followed.
A chlorophyll meter of model CCM200 plus was used to measure chlorophyll content index. Similarly, canopy temperature was measured using an infrared thermometer IRT206 at booting stage (Z45) (Zadoks et al. 1974). Plant height was measured from the base of the plant to the spike excluding awns at Z96 stage. Similarly, number of tillers per plant (TP) and spike length (SL) excluding awns were recorded for ten spikes from each replicate. Number of grains per spike were calculated after harvesting on the same 10 random spikes and thousand grain weight was recorded.
For biochemical analysis, fresh leaf samples were collected at the onset of flowering stage (Z60) (Zadoks et al. 1974). Sugar contents or soluble sugar in leaves were estimated (Dobois et al. 1956). For this purpose, 0.5 g of fresh leaf tissues were taken with 10 mL of ethanol and were heated at 80°in for one hour with continuous shaking. One mL phenol (18%) was added to the 0.5 ml of pre-heated extract and incubated for one hour. After this, 2.5 mL of concentrated H 2 S0 4 was added and vortexed. The extract was used to measure absorbance at 420 nm by a UV spectrophotometer (Biochem-2100) and the concentration of soluble sugar was measured using a standard glucose curve.
Total chlorophyll was measured using dimethyl sulphoxide (DMSO) (Hiscox and Israelstam 1979). Briefly, 0.5 gram of leaf tissues were taken in a preheated tube filled with 6 mL DMSO. Material was heated in a hot water bath at 65°for 35-40 min. Then extract was transferred to a new tube and DMSO was added until total volume reaches 10 mL. Finally, total chlorophyll contents were measured by recording absorbance of the extract at 660 nm using RS-1100 equipment and fitting into the equation of Arnon (1949).
Proline was measured following protocol of Bates et al. (1973). For this, 0.3 g of fresh leaf sample was homogenized with 5 mL of sulfosalyclic acid (3%) in a test tube. Then two mL of mixture was added to 2 mL of glacial acetic acid and 2 mL of ninhydrin and incubated for one hour at 100°. The reaction was stopped with ice and then four mL toluene was added to reaction mixture. The mixture absorbance value recorded at 420 nm using UV spectrophotometer. A standard curve was made using authentic proline and proline was calculated as mmol/g.
The concentration of superoxide dismutase (SOD) in leaf was measured with following protocol. First 0.5 g of fresh wheat leaf samples were ground in a chilled mortar and pestle with 10 mL of chilled phosphate buffer (50 mM) in an ice bath. The mixture was centrifuged at 13,000 RPM at 4°for 25 min and was filtered in four layers of cheesecloth. The supernatant obtained was used to assay SOD activity (Beauchamp and Fridovich 1971;Giannopolistis and Ries 1977). The extract obtained was illuminated under 40 W fluorescent lamps at 25°w ith 100 mL of 1.3 mM riboflavin and 3 mL of 0.1 M SOD buffer for 8 min until a dark color appeared. For the samples from well-water conditions, the same protocol was followed but in darkness. A blank check without any leaf sample was prepared. The absorbance of all three mixtures was measured at 560 nm in the UV spectrophotometer.
The concentration of SOD is the quantity of enzyme preventing 50% of photoreduction by nitroblue tetrazolium (NBT) in contrast to the sample mixture that lacks plant material.
Relative water content (RWC) was measured as per method devised by Bar and Weatherly (1962) with slight modifications. Three fully expanded flag leaves from randomly chosen plants in each plot were collected from both environmental conditions (WW and WL) at Z45 of booting stage (Zadoks et al. 1974). Top and bottom of the leaves with any dead tissue was removed off to leave a 5 cm mid-section. All samples were then immediately placed in pre-weighed air tight falcon tubes to stop any moisture loss/gain from the system. Tubes were then transferred to a cooled and insulated container (at around 10°-15°; but not frozen). All tubes with sample were then weighed and recorded under TW + FW. Tubes were then filled with 1 mL of distilled water and placed in refrigerator (4°) for 24h. This will make leave turgid. After carefully blotting with dry paper to remove moisture, hydrated leaves were weighed and recorded under TW. All leaf samples were then transferred to labeled envelopes to oven (70°) for 24h. Samples were then reweighed and recorded under DW. RWC was measured by using formulae; Leaf RWCð%Þ ¼ ½ðFW-DWÞ=ðTW-DWÞ · 100 where FW = leaf fresh weight, TW = leaf turgid weight, and DW = leaf dry weight.
Genotyping and analysis of genotyping datasets Five viable seeds of each genotype were planted in 5 cm diameter pots. DNA was extracted per the CIMMYT Molecular Genetics Manual (Dreisigacker et al., 2013) from fresh leaf samples of 25-day-old seedlings. DNA samples (50-100 ng/ml per sample) were sent to Department of Primary Industries, Victoria, Australia for genotyping with high-density wheat 90K infinium SNP array . Genotypic clusters for every SNP were determined following the manual for Genome Studio version 1.9.4 with the polyploid clustering version 1.0.0 (Illumina; http://www.illumina.com), based on the data from all the genotypes.
The physical position of SNPs on the wheat genome were determined based on IWGSC RefSeq version 1.0 (IWGSC 2018). Polymorphism information content (PIC) was used to determine genetic diversity at each locus. Markers with ambiguous SNP calls, that were monomorphic or with missing values of more than 20% and less than 5% MAF (minor allele frequency) were removed from the dataset. Genetic similarities between wheat genotypes were estimated using PowerMarker v.3.0 with a Dice coefficient based on the proportion of shared alleles (Liu and Muse 2005).
Population structure was assessed with 1000 unlinked SNP markers using STRUCTURE software 2.3.3, which implements a model-based Bayesian cluster analysis. Structure matrix (Q-matrix) was formed by organization of all population entries into clusters. An assumed number of subpopulation (k) ranging from 1 to 10 was evaluated using 100,000 burn-in iterations followed by 500,000 recorded Markov-Chain iterations. Robustness (sampling variance) of inferred population structure was estimated by carrying out 10 independent runs for each k. The optimum number of subpopulations was determined utilizing ADHOC statistics Dk based on the rate of change in log probability of data between successive k (Evanno et al. 2005).
Genome-wide linkage disequilibrium (LD) was evaluated across A, B and D genomes. Using TASSEL v.5.0, the LD parameter r 2 was calculated for all the pairwise markers which could be aligned to the consensus map for both entire panel and model-based subgroups. To examine LD due to the physical linkage in particular, the critical r 2 value (Breseghello and Sorrells 2006) was investigated. This was calculated by taking the 95 th percentile of the square root transformed r 2 data of unlinked markers (Breseghello and Sorrells 2006). An r 2 beyond the critical r 2 value was declared to be caused by genetic linkage. Only the LD pairs significant at (P , 0.001) were included in LD decay plots and stacked bar plots.

Statistical analysis of phenotyping datasets
The phenotypic data were collected from the two locations over two consecutive years under well-watered and water-limited conditions. The data were averaged for both years x locations leading to two datasets i.e., well-watered and water-limited for each trait. ANOVA was used to test the statistical significance of different sources of variation for each of the nine traits. In the ANOVA model, phenotypic effect was partitioned into overall mean, treatment effect, replication (i.e., block) within environment (year and location combination) effect, genotypic effect, environment effect, genotype by environment effect, genotype by treatment effect, and random error effect. Let y lijk be the observed value of a trait of interest for the i th accession in the k th replication under the j th environment (equivalent to location and year in this study) and the l th treatment. The linear model used in ANOVA is therefore, [1] where l = 1, 2, ..., L (L = 2 for well-watered and water-limited treatments), i = 1, 2, ..., n (n = 203), j = 1, 2, ..., e (e = 4 with two locations and two years), k = 1, 2, ..., r (r = 2), is overall mean of the whole population, R k/j is the k th replication effect in the j th environment, G i is genotypic effect of the i th accession, E j is environmental effect of the j th environment, GE ij is interaction effect between the i th accession and the j th environment, GD il is interaction effect between the i th accession and the l th treatment, and e lijk is random error effect which was assumed to be normally distributed with a mean of zero, and variance s 2 e . The ANOVA described above was implemented with the GLM procedure in SAS software (SAS Institute, Cary, NC, 2007).
The BLUP of genotypic value for each accession under each water treatment was used as the phenotype for all subsequent comparisons. BLUPs were calculated as follows: the observed value of trait was defined as y ijk for the i th accession in the kth replication in the j th environment (equivalent to location and year in this study). The mixed model used for BLUP was therefore, where i = 1, 2, ..., n (n = 203), j = 1, 2, ..., e (e = 4 with two locations and two years), k = 1, 2, ..., r (r = 2), m, R k/j , G i , E j , and GE ij were the same as the description above. Except for, all the effects were viewed as random effects following the normal distributions R k=j Nð0; s 2 R Þ, E , and s 2 GE were the variances explained by replication, genotype, environment, and genotype by environment interaction, respectively. The BLUPs were calculated with the MIXED procedure in SAS software (SAS Institute, Cary, NC, 2007).

Genome-wide association studies and selective sweeps
Genome-wide association analysis using SNP markers: For markertrait associations (MTAs), a mixed linear regression (MLM) model controlling both population structure (Q matrix) and kinship matrix (K matrix) was applied in TASSEL Standalone v.5.0 (Yu and Buckler 2006). Quantile-quantile plots of estimated vs. observed P values from MTAs were also produced and deviations from the expectation demonstrated that statistical analysis might cause spurious associations. Bonferroni corrections were applied, and a P-value of 10 25 was defined as the threshold for significant MTAs. SNPs with P-values in the range of 10 23 -10 24 for one trait and P-values ,10 25 for another trait were also reported. The LD decay distance at r 2 = 0.1 was used as the support interval to avoid multiple significance within one LD block.
Genome-wide haplotype blocks and haplotype-GWAS: Genomewide haplotype blocks were constructed using PLINK with the default parameters as used by Haploview 4.2 software package (http:// www.broadinstitute.org/haploview/haploview). The package defined haplotype blocks based on 4 gametes, and provided the number of haplo-groups and their genetic length (bp) for each block, as well as the number of tag SNPs based on solid spine of linkage disequilibrium (LD) (Extend spine if D9.0.8). In brief, this meant that the first and the last marker in a block were in strong LD with the intermediate markers that were not necessarily in LD with each other. All the haplotype blocks consisting of two haplo-groups were removed to rule out perfect LD and similarity to SNP polymorphism. Haplotype frequency was calculated using a custom Perl script. If a haplotype block is only observed in one group (SYN-DER or BW) but not in the other group, it is considered to be a group-specific block. If a haplotype block was simultaneously detected in both groups, the haplotype block was considered to be a common block. Haplotype-GWAS was performed using the linear regression procedure implemented in PLINK (Purcell et al., 2007), where phenotype BLUPs were regressed on the number of haplo-groups of a particular haplotype using PLINK linear option, including population structure as covariate. The P-value threshold of 10 25 was defined to declare haplo-groups associated with phenotypes.
Identification of loci Under selection for selective sweeps: Five statistical methods/parameters were used to detect the loci under selection. (1) Difference in allele frequencies for the m th marker locus between the bread wheat (BW) set and the SYN-DER set were tested by Student's t-test: , f 1 and f 2 were the allele frequencies of a specific marker locus in the BW and SYN-DER sets, respectively, and n 1 and n 2 were the sample sizes in the BW and SYN-DER sets, respectively. The population-specific alleles were determined for each subpopulation based on zero allele frequency in one subpopulation and non-zero in another subpopulation; and different allele frequencies between two subpopulations at significance level P , 0.001.
(2) Fst was calculated for individual SNPs by where f and s 2 f were the mean and variance of allele frequencies, respectively, w 1 ¼ n1 n1 þ n2 , and w 2 ¼ n2 n1 þ n2 (Weir and Cockerham 1984). VCFtools (https://vcftools.github.io/index.html) were used to calculate Fst with a sliding window of 100 kb and a step size of 10 kb (Schmutz et al., 2014) over the whole genome, and the regions with the top 5% of Fst values were regarded as highly diverged across the two groups.
(3) A search for loci under selection using genome-wide association of eigenvectors was implemented by EigenGWAS  requiring three steps. First, a genetic relationship matrix was generated for the 235 accessions, 69 from BW set and 171 from SYN-DER panel. Assuming that X i ¼ ðx i1 ; x i2 ; :::; x iM Þ T was a vector of genotype for the i th individual, where x was the number of the alleles and M was the number of marker loci, the genetic relationship matrix A for each pair of 240 accessions was calculated by where f m is the allele frequency for the m th marker locus. Second, the principle component analysis (PCA) was conducted on the A matrix (Price et al. 2006). Then matrix E with dimension N·C was retained, where E c was the eigenvector corresponding to the c th largest eigenvector. In this study, c was set as 10 to calculate the top 10 eigenvalues and eigenvectors. Finally, single marker regression on E c was conducted to estimate the effect of each marker. Further, (4) Tajima's D, and (5) nucleotide diversity (p) relative to the founder population (p SYN-DER /p BW ) were calculated using VARISCAN version 2.0 with a sliding window of 100 kb and a step size of 10 kb.

Data availability
The authors affirm that all data necessary for confirming the conclusion of this article are represented fully within the article and its tables and figures. Supplemental material available at FigShare: https://doi.org/ 10.25387/g3.7356923.

Ethics statement
The field trials were permitted by National Agriculture Research Center (NARC), Islamabad, Pakistan and Barani Agriculture Research Institute (BARI), Chakwal, Pakistan. The remaining experimentation does not require any ethical statement.

RESULTS
A total of 19,676 SNPs was retained after quality control (Table 1). SNP marker density on each wheat chromosome is visualized in Figure 1a.
PIC value was highest on chromosomes 5A and 6A (0.35) and least on chromosomes 3D and 5D (0.19). Overall, PIC value was highest on the A genome (0.34) followed by the B genome (0.33) and D genome n  n  (245), while the least were identified on the D genome (328), particularly chromosome 4D (9). Haplotypes in BW and highlighted SYN-DER specific haplotypes are shown in Figure 2a and 2b. There was a significant difference between SYN-DER and BW for number of haplotypes and haplotype block size (kb); 23% of haplotypes were divergent in SYN-DER at chromosomal positions where BW did not generate any haplotype (Supplementary Table S2). PCA with high quality SNP markers clearly separated bread wheat cultivars from SYN-DER in various sub-clusters (Figure 1b). The first five principal components explained 39.7% of the total variability. Population structure based on PCA is presented in Figure 1c, where the first three PCs are presented as a 3D biplot. All the BW cultivars were in different clusters from SYN-DERs except those that were used in developing SYN-DERs.
LD was estimated by r 2 at P # 0.001 from all pairs of SNPs along each chromosome. Pairs of loci in significant LD on the sub-genome level was 45.55%, 47.47% and 35.85% on A, B and D genomes, respectively ( Table 2). The average r 2 of genome-wide LD was 0.22 in the A genome, 0.25 in the B genome and 0.36 in the D genome. The proportion of pairs of markers in complete LD was 2.97%, 4.43% and 4.75% on the A, B and D genomes, respectively. The extent and distribution of LD were graphically displayed in decay plots and bar plots between the fraction of SNP pairs in SYN-DER by plotting intrachromosomal r 2 values for loci in significant LD at P # 0.001 against the genetic distance in kb and a second-degree LOESS curve was fitted (Figure 3). The critical value for significance of r 2 was estimated as 0.22 according to Breseghello and Sorrells (2006), and thus all values of r 2 . 0.22 were estimated to be due to genetic linkage.

Phenotypic variations in diversity panel
Estimates of variance components showed significant differences among locations, treatments and genotypes for all traits. All the traits showed significant variations in WW and WL conditions in both SYN-DER and BW datasets ( Figure S1). All the agronomic traits including GY, GS, PH, TP and TGW were significantly reduced by 19.4% (TGW) to 62.8% (GY) under WL conditions. However, CT, SOD, Sugar, Chl and Proline were significantly increased by 10.3% (CT) to 22.3% (SOD). In the WW environment, very high heritability (0.79 -0.92) was observed for almost all physiological traits. In diversity panel, SYN-DER showed high performance on an average for all traits, for example GY in WL conditions was 16.7% higher in SYN-DER as compared to BW and can be visualized in violin distribution plots ( Figure S1). In the WW environment, GY was positively correlated with SL (r = 0.29), TP (r = 0.61), GS (r = 0.40), and TGW (r = 0.74). Among biochemical traits, Chl was positively correlated with Sugar (r = 0.37), while proline was positively correlated with GS (r = 0.43) and SL (r = 0.28) ( Figure S2). Other biochemical components had moderate to low correlation with grain yield and yield components. In the WL environment, GY was positively correlated with TGW (r = 0.66), GS (r = 0.72) and SL (r = 0.53), while SDW was negatively correlated with PH (r = -0.14) and Chl was negatively correlated with CTspike (r = -0.16) ( Figure S2).

GWAS for phenotypic traits based on SNPs
A total of 181 loci were associated with 15 phenotypes in SNP-GWAS, of which 66 were in WW conditions, 67 in WL and 62 were associated with the difference between WW and WL (DIFF) conditions (Table 3; Table  S3). Of these 181 loci, 55 SNPs were located on the D genome, 22 of which were also identified as selective sweeps. An important chromosomal region on 1A between 512.2 to 574.3 Mb was associated with multiple physiological traits including RWC, CCI, SFW and SOD (Table S3). Nine SNPs were associated with GY WW , three with GY WL , and one with GY DIFF , of these four were located within selective sweep regions. Six SNPs were associated with GS WW , two with GS WL , and one with GS DIFF . Of these, two were distributed on chromosomes 1A and 6D that were under selection (Table S3). For TGW, an important yield component trait, three SNPs were associated with TGW WW , six with TGW WL , and 10 with TGW DIFF . Among these, six SNPs were under selective sweeps, two of which were located on the D genome (i.e., chromosomes 3D and 7D). Three important loci were associated with phenotypic traits under both WW and WL conditions including SNP IWB20120 on chromosome 1A which was associated with sugar contents, IWB2303 on chromosome 1D linked to RWC and IWB8919 on chromosome 5B associated with TGW (Table 3).
b MAF: Minor allele frequency. c Position of SNP based on IWGSC Ref sequence version 1.0 (https://urgi.versailles.inra.fr/blast_iwgsc/blast.php) d P-value significance for marker-trait associations in well-watered (WW), water-limited (WL) and difference between well-watered and water-limited (DIFF) conditions. e Co-localization of marker-trait associations with divergent selective sweeps (in case of p-value) and/or functional genes.
Hap-7B-TGW3, and Hap-7D-TGW1. Due to the greater diversity of the D-genome in SYN-DER, 21 haplotypes were identified on D genome, of which seven haplotypes were under divergent selection. Thirteen haplotypes were associated with GY WW , four with GY WL , and one with GY DIFF , one of which was in the region of selective sweeps. Similarly, two haplotypes were associated with GS in WL. Nineteen haplotypes were identified for TGW WW and one for TGW WL ; four of these were under selective sweeps and two were located on the D genome (i.e., chromosome 7D). Three of the TGW associated haplotypes were under divergent selection, while two haplotypes were closely linked to vernalization (Vrn-A1 and Vrn-B1 genes) response on chromosomes 5A and 5B.

Allelic effects of SNPs and haplotypes on phenotypes
SNPs and haplotypes associated with two important phenotypic traits; GY and RWC were used to show the allelic effects in each of the WW, WL, and 'DIFF' environments ( Figure 4 -Figure 7). Three SNPs ( Figure  4e and Figure 5c, f) and four haplotype blocks (Figure 4b,c,d) associated with GY WL were assessed for their allelic effects on phenotypes in all water regimes. Among these SNPs, only IWB1876 on chromosome 4A was associated with GY WL , GY WW and GY DIFF , while another two SNPs; IWB72112 (2A) and IWB61578 (5D) were only associated with GY WL . These SNPs gave a yield advantage of 46.2, 26.5 and 28.2 gm -2 in water-limited conditions. Each of the three haplotype blocks consisted of three haplo-groups and favored haplo-groups significantly increased n a Chl: chlorophyll contents (mg/g); CCI: chlorophyll content index; CT: Canopy temperature (°C); GS: grains per spike; GY: grain yield (g m -2 ); PH: plant height (cm); Proline (mmol/g); RWC: relative water contents (%); SDW: shoot dry weight (g); SFW: shoot fresh weight (g); Sugar (mg/g); TGW: thousand grain weight (g). b Position of SNP based on IWGSC Ref sequence version 1.0 (https://urgi.versailles.inra.fr/blast_iwgsc/blast.php) c P-value significance for marker-trait associations in well-watered (WW), water-limited (WL) and difference between well-watered and water-limited (DIFF) conditions. d Co-localization of marker-trait associations with divergent selective sweeps (the numeric values are the p-value of selection sweep) and functional genes.

Divergent selections and their co-localization With drought QTL in SYN-DER wheats
Genome-wide divergent selection was investigated in SYN-DER by comparing allele frequencies in the two panels. Five different approaches were used to provide complementary information (Figure 8; Table S5). The Student's t-test was used to identify significantly different allele frequencies (Table S5; Figure 5). Fst was calculated with a sliding window approach and only the top 5% of Fst values were regarded as selective loci. This was further validated by a new EigenGWAS approach where eigenvectors were used as phenotypes for GWAS. Tajima's D and p were calculated in each of the subpopulations. All the results were analyzed comparatively to identify genomic regions consistently under divergent selection in SYN-DER. In total, 291 SNPs representing selective sweep were narrowed down to 89 loci based on the LD (r 2 . 0.7). These 89 loci or divergent selections were present on all 21 chromosomes (Table S5). The number of loci under divergent selection differed among the three genomes, and the most loci were located on the D-genome (32), followed by the B-(29) and A-genomes (28). Similarly, EigenGWAS was plotted against SNP-GWAS and haplotype-GWAS to highlight GY and RWC associated loci under divergent selection (Figure 5b and Figure 7b, respectively). Among these loci, 31 were regarded as 'hard sweeps' where contrasting alleles were fixed (MAF . 0.9) in both subsets or where the -log(P) value for EigenGWAS was among the top 5% (Table S5). Based on the extent of LD in the SYN-DER panel, 20 loci were within the close proximity of functional genes such as Elf3-D1 and Vrn-D3 for flowering time, TaCwi-A1, TaCKX-D1, TaSus1-7A, and TaGS-D1 for grain size and weight, and 1fehw3, TaMoc1 and SST-A1 for sugar metabolism and transport (see Table S5 for complete list of genes).

DISCUSSION
Genetic diversity in SYN-DER Crop wild relatives provide enhanced diversity and new favorable genes and gene recombinations that potentially increase additive variation for traits of breeding interest. The goat grass (Ae. tauschii) reference genome sequence has recently been published (Luo et al., 2017;Luo et al., 2013;Zhao et al., 2017). This sequence is an important tool for better understanding of wheat biology, adaptation and productivity. In parallel, Ae. tauschii diversity has been globally exploited in breeding through the use of synthetic hexaploid wheats (Börner et al., 2015;Ogbonnaya et al., 2013). The current study showed that SYN-DER wheats have different patterns of genetic diversity and population structure compared to modern bread wheat as revealed by PCA, phylogenetic tree and haplotype blocks (Figure 1 and 3). Since these SYN-DER have been selected in the field for agronomic superiority, it is possible that major proportions of the bread wheat genome have been retained because empirical selection can lead to a differential loss of genetic diversity in targeted genomic regions of agronomic importance. This in return enhances the diversity of neutral genes through promotion of gene flow, recombination and exchanges between diverse populations (Olsen and Wendel 2013). There is potential ascertainment bias in the 90K SNP array because the SNP array was developed based on transcriptome sequencing of few bread wheat cultivars  and variation in promoter regions and introns where much of the variation is present were not surveyed (Zheng et al., 2014). Despite these limitations, we were able to identify differential patterns of diversity and 89 loci were under divergent selection in SYN-DERs.  et al. (2015) previously used a meta-analysis approach and developed consensus QTL positions from various mapping experiments. As the reference genome sequences of bread wheat and some progenitors are available (Avni et al., 2017;Luo et al., 2017;Zhao et al., 2017), there is urgent need to translate QTL location information to a position in the wheat genome. We found very few common MTAs between WW and WL (3), WW and DIFF (9) and WL and DIFF (5). Such a phenomenon has been reported in most of the abiotic stress experiments where QTL under c ontrol conditions do not significantly overlap with QTL under stress conditions probably due to different evolutionary trajectories (Makumburage and Stapleton 2011;Ogbonnaya et al., 2017).
A yield MTA for GY WL on chromosome 5D (393.05 Mb) is a novel MTA because it is under strong divergent selection (-log(p)=81.9) and the favorable allele gave a yield advantage of 76.2 gm -2 . This MTA was also associated with SOD WL . Two other MTAs for GY WL on chromosomes 2A and 4A are likely to be MQTL13 and MQTL32 (Acuña-Galindo et al., 2015) based on genetic positions of associated SNPs on the wheat consensus genetic map (Quraishi et al., 2017). A TGW WL MTA on chromosome 2A (553.2 Mb) is within the vicinity of grain weight gene TaCwi-A1 (Ma et al., 2012). Based on the association with multiple phenotypes, we identified three genomic important regions such as chromosome 1D at 438.3 Mb that was associated with physiological traits RWC, SFW, and SDW. RWC in leaves is a sensitive parameter for estimating plant water balance (Clavel et al., 2005), and also shows a positive relationship with yield in cereals (Merah 2001). The second important region on chromosome 3D (5.1-48.3 Mb) is associated with Sugar, RWC, and TGW which are important parameters in drought adaptability. This genomic region was not under divergent selection, so it is likely that it was retained from bread wheat parents of SYN-DER. A third important genomic region on chromosome 6A (542.7-615.3 Mb) was associated with RWC, Sugar and SL. Based on genetic positions, these two regions are likely to be the MQTL29 and MQTL49 based on the meta-QTL map (Acuña-Galindo et al., 2015). Two drought tolerance related genes TaCwi-4A and TaMoc1 (7A) were co-localized with GY WL and CCI WL MTAs. Zhang et al. (2015) reported that TaMoc1 is an important gene associated with yield-related traits and is involved in regulating axillary meristem initiation and growth, while TaCwi-4A is an important drought related gene only expressed in anthers and is involved in pollen sterility under drought stress. The association of these genes directly with GY WL in SYN-DER indicated the potential usefulness of this MTA because the minor allele frequency is higher (0.25), while the TaCwi-4A favorable allele is highly fixed during modern wheat breeding (Jiang et al., 2015).
Haplotype-GWAS, a powerful complement With SNP-GWAS Consistent with reports from other studies, haplotype-GWAS proved to be an effective strategy to increase the resolution of GWAS experiments. However, this strategy is rarely implemented in wheat, even though it has shown promise in Brassica (Wu et al., 2016), barley (Lorenz et al., 2010) and other crop species (Qian et al., 2017). Our results were in agreement with Hao et al. (2017) that haplotype-GWAS was able to identify MTAs where individual SNPs were ineffective. This was because haplotypes containing a group of closely linked SNP markers can increase the level of polymorphisms and overcome the drawback of using single SNP markers by creating more combinations (haplotypes) (Supplementary Table S3). This can increase the power and the effectiveness of haplotype-based association (Hamblin and Jannink 2011;Lorenz et al., 2010). Several haplotypes associated with phenotypes in our study were not identified by SNP-GWAS and it is likely due to the fact that patterns of LD in the population and the marker density along with the genetic architecture of the trait affect the detection of haplotypes associated with phenotypes (Lorenz et al. 2010). Moreover, the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers which could be unlike the distribution of single SNPs. Therefore, it was recommend to routinely use both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data.
We identified three haplotypes associated with GY WL and each haplotype had three haplo-groups (Figure 4). The favored haplotypes significantly increased GY WL by 39.7 to 87.2 gm -2 . This is important because none of the single SNPs were associated with GY WL in these genomic regions. Similarly, Hap-7B-GY-1 was associated with grain yield across three water treatments i.e., WW, WL and DIFF, while only two haplo-groups (AC and GC) of this haplotypes block showed significant differences for GY and a third haplo-group 'AT' remained nonsignificant. In haplotype block Hap-1B-GY-1, there was no significant difference between GY WL of two haplo-groups 'ATG' and 'GTA', while the 'ACG' variant had a significantly different effect on GY (Figure 4). Haplotype-GWAS identified several genomic regions closely linked to functional genes such as a haplotype on chromosome 2B (643.6-690.1 Mb) that was associated with multiple physiological traits RWC, SFW and PH and was linked to glutamine synthase2 gene, TaGS2-B1 (Li et al., 2011). Previously, haplotypes at TaGS2-B1 were associated with nitrogen use efficiency and several agronomic traits including shoot and root dry weight (Li et al., 2011), which corroborates results from this study. Similarly, haplotypes on chromosome 5A (504.6 Mb) associated with TGW and GY are within the same region of the Vrn-A1 gene. On chromosome 5B (603.8 Mb), the haplotype associated with TGW is within the proximity of Vrn-B1. These two genes are homeologous to VERNELIZATION1 (Vrn1) gene, a key regulator of flowering time in cereals that have pleiotropic effects on agronomic traits like plant height and root traits (Chen et al., 2015;Voss-Fels et al., 2017). Two closely linked haplotypes on chromosome 2A (31.9-41.2 Mb) were linked to functional genes Ppd-A1 (Beales et al., 2007) and Sus2-2A (Hou et al., 2014), which modulate the flowering time and yield related traits, respectively. Most of the synthetic hexaploid wheats have the GS-105 type 1,117 bp deletion in Ppd-A1 gene (M. Khalid et al., unpublished data), which is the Ppd-A1a photoperiod insensitive allele in durum wheat. This allele is different from Ppd-A1a allele in bread wheat and has higher transcript levels Figure 6 Histogram for relative water contents (RWC %) and boxplot for SNPs and haplotypes associated with RWC in different water regimes. a) Histogram for RWC under well-watered (WW) and water-limited (WL) conditions. Allelic effects of Hap3B-RWC1 b), Hap7A-RWC1 c), SNP (IWB50770) d), SNP (IWB53305) e) on RWC across three water regimes (all information is annotated). (Wilhelm et al., 2009). In conclusion, the simultaneous use of SNP-GWAS and haplotype-GWAS dissected the complex quantitative traits at higher resolution and several genomic regions harboring functional and candidate genes of agronomic importance were identified.
Divergent selections associated with phenotypes and future breeding strategies Although traditional linkage mapping and association mapping are effective for identifying large-effect trait loci, they are limited to phenotypes used in analyses and may fail to detect a large portion of the  genetic changes associated with plant domestication and improvement (Morrell et al., 2011). During the development of adapted lines, selection imposed by humans favored alleles of traits valuable for agriculture. When selection increases the frequency of beneficial alleles in a population, it impacts the standing variation of surrounding genomic regions resulting in reduced diversity, extended linkage disequilibrium, or strong inter-population allele frequency differentiation. To detect these local patterns of variation, also referred here to as 'divergent selective sweeps', we first investigated the genome-wide divergent selections in SYN-DER panel using various complementary analytical approaches (Supplementary Table S4). However, only some of these selection loci could be associated with phenotypes. In a second step, we compared the SNPs in selection loci to those identified in SNP-GWAS and haplotype-GWAS to narrow down the selection loci associated with phenotypes of significant agronomic value. This approach resulted in the identification of 53 SNPs (Table 3) and 23 haplotypes (Table 4) of agronomic importance that appeared to be loci under selection. We have shown selection loci as EigenGWAS as -log10(p) associated with GY and RWC (Figure 5b and 7b). These loci could provide novel adaptability and agronomic superiority in drought stressed environments.
As the narrow genetic base of the D-genome in bread wheat is widely acknowledged, divergent haplotypes in the D-genome in SYN-DER could be a mean to introduce new genetic diversity for traits of breeding interest. The co-localization of these haplotypes with functional genes strongly supported the idea that conventional haplotypes strongly fixed during domestication or modern breeding have been replaced by new haplotypes from Ae. tauschii through synthetic wheats. For example, the close proximity of these haplotypes with functional genes like TaElf3-D1 (1D at 468.7 Mb) and Vrn-D3 (7D at 70.8 Mb) for earliness per se (Eps) and flowering time (Wang et al., 2009;Zikhali et al., 2016), TaCKX6-D1 (3D at 137.4 Mb), TaCwi-D1 (5D at 557.3 Mb) and TaGS-D1 (7D at 6.7 Mb) (Jiang et al., 2015;Zhang et al., 2012;Zhang et al., 2014) for grain size and weight indicated that novel genetic introgressions from synthetic hexaploid wheats may have been selected.
The benefit of using synthetic wheats or other crop-wild introgressions is that new useful haplotypes could be generated due to the differential recombination rates and patterns (Rasheed et al., 2018;Wingen et al., 2017). Such new favorable haplotypes contribute to additive genetic variation for complex quantitative traits, which can replace the conventional haplotypes exhaustively utilized during modern wheat breeding; thus improving the rate of genetic gains. The breeding strategies such as genomic selection can be applied following the identification of molecular markers linked to those haplotypes. For example, Spindel et al. (2016) identified SNPs associated with agronomic traits and fitted these markers as fixed effects in a rrBLUP model (referred to as a GS + de novo GWAS model). They showed that this new model outperformed six other prediction models for various traits. Furthermore, the prediction accuracies of extended GS models outperformed classical models using phenotype data from dry seasons, implying that this approach is particularly suitable for the improvement of drought-resilience in future crop cultivars. Because of their increased information content compared to bi-allelic SNP markers, fitting haplotypes with statistically significant trait associations to phenotypes as fixed effects in GS models could further improve prediction accuracies (Voss-Fels et al., 2017). The use of haplotype-assisted GS should more accurately depict the complex relationships between genotypic information and phenotypes than single SNPs alone; hence, this approach could ultimately help to further increase selection gain per unit of time.
Our analytical approaches successfully identified the divergent selections in SYN-DER, dissected the complex architecture of quantitative traits under drought stress and identified the favorable SNP and haplotypes underpinning traits of breeding interest through crop-wild introgressions.