RNA-seq of Rice Yellow Stem Borer Scirpophaga incertulas Reveals Molecular Insights During Four Larval Developmental Stages

The yellow stem borer (YSB), Scirpophaga incertulas, is a prominent pest in rice cultivation causing serious yield losses. The larval stage is an important stage in YSB, responsible for maximum infestation. However, limited knowledge exists on the biology and mechanisms underlying the growth and differentiation of YSB. To understand and identify the genes involved in YSB development and infestation, so as to design pest control strategies, we performed de novo transcriptome analysis at the first, third, fifth, and seventh larval developmental stages employing Illumina Hi-seq. High-quality reads (HQR) of ∼229 Mb were assembled into 24,775 transcripts with an average size of 1485 bp. Genes associated with various metabolic processes, i.e., detoxification mechanism [CYP450, GSTs, and carboxylesterases (CarEs)], RNA interference (RNAi) machinery (Dcr-1, Dcr-2, Ago-1, Ago-2, Sid-1, Sid-2, Sid-3, and Sid-1-related gene), chemoreception (CSPs, GRs, OBPs, and ORs), and regulators [transcription factors (TFs) and hormones] were differentially regulated during the developmental stages. Identification of stage-specific transcripts made it possible to determine the essential processes of larval development. Comparative transcriptome analysis revealed that YSB has not evolved much with respect to the detoxification mechanism, but showed the presence of distinct RNAi machinery. The presence of strong specific visual recognition coupled with chemosensory mechanisms supports the monophagous nature of YSB. Designed expressed sequenced tags-simple-sequence repeats (EST-SSRs) will facilitate accurate estimation of the genetic diversity of YSB. This is the first report on characterization of the YSB transcriptome and the identification of genes involved in key processes, which will help researchers and industry to devise novel pest control strategies. This study also opens up a new avenue to develop next-generation resistant rice using RNAi or genome editing approaches.

strategies to combat this deadly pest. To control YSB, it is essential to understand the biochemical and molecular mechanisms associated with its life cycle. The process of exploring pest genomic information for the design of control strategies has been progressive, but limited to only some pests. Genome-wide expression analysis of genes during the different developmental stages of the insect will provide molecular insights into its life cycle. This will eventually aid the identification of key insect transcripts involved in infestation and growth that can be targeted for the development of resistance against this pest (Kola et al. 2015).
With the advent of next-generation sequencing platforms, obtaining biological perspectives of cellular processes has become a reality. Transcriptome analysis through RNA-seq can be utilized efficiently to identify the temporal and unique gene expression patterns in an organism (Ozsolak and Milos 2011). Specifically, it provides novel opportunities for expression studies in organisms lacking genome or transcriptome sequence information (Haas and Zody 2010;Wolf 2013). RNA-seq has been used extensively in insect pests to reveal biological phenomena (Ou et al. 2014), gene expression profiles, and gene discovery (Vogel et al. 2014). It has also been used to identify RNAi targets to develop pest control strategies . The developmental transcriptome studies in insect pests have provided better understanding of the transition phases of the insect life cycle. RNAseq data also offers a great opportunity to develop genomic resources, namely the EST-SSRs leveraged by the function of the transcripts. Such functional markers act as a repository of genomic resources for the insect species (Stolle et al. 2013). The larval stage is the important developmental stage in YSB, being responsible for economic damage. In the present study, with the aim of understanding and identifying the temporal and unique gene expression patterns during the larval stages in YSB, we performed RNA-seq at the first, third, fifth, and seventh instars. A total of 236 million reads were generated and de novo assembled into 24,775 transcripts of YSB. The differentially expressed transcripts were analyzed by comparing gene expression profiles and further validated using qRT-PCR. We also performed comparative transcriptome analysis with related insect pest species for better understanding of behavioral and evolutionary aspects of YSB. Functional EST-SSR markers were also developed for YSB, which will serve as a rich genetic resource for diversity studies. These findings will facilitate future molecular studies aim at understanding the biology of YSB and related insects. Such YSB-specific functionally important transcripts may be used as target genes for RNAi to develop YSB resistance in rice.

Insect rearing and sample collection
The YSB adults were collected from Indian Council of Agricultural Research-Indian Institute of Rice Research paddy fields in the wet season 2013. The insects were released on rice plants (TN1 variety), and maintained under controlled conditions at 25 6 2°temperature and 80% relative humidity for oviposition. Egg masses laid on the leaf blades were collected and placed in glass vials for hatching. Neonate larvae were reared on cut stems of rice plants (TN1 variety). Different larval instars, namely the first, third, fifth, and seventh (L 1 , L 3 , L 5 , and L 7 , respectively) were identified on the basis of mandibular width and visual observation of exuvae (Padmakumari et al. 2013). Each larval instar was collected 2-4 hr after moulting in three replicates and stored in RNA later (Invitrogen) at 280°for total RNA isolation.

RNA sequencing
Total RNA was isolated using the SV total RNA isolation system (Promega) as per the manufacturer's protocol. The RNA was pooled from three replicates of each instar. The yield and purity of RNA were assessed by measuring the absorbance at 260 and 280 nm, and quality was checked on a formaldehyde gel using MOPS buffer. The RNA integrity number (RIN) was checked by running the RNA Nano chip on a Bioanalyzer (Agilent Technologies) and samples with a RIN value . 7.5 were processed further. As required for Illumina sequencing (San Diego, CA), mRNA was isolated from 5 mg of total RNA and paired-end (2 · 100 bp) cDNA sequencing libraries were prepared using an Illumina TruSeq RNA Library Preparation Kit as per the Figure 1 Schematic representation of S. incertulas (yellow stem borer) transcriptome sequencing. EST-SSR, expressed sequenced tags-simplesequence repeats; HQ, high-quality; MISA, MIcroSAtellite identification tool.
n Table 1 Summary of raw and high-quality reads of S. incertulas transcriptome data
manufacturer's protocol (Illumina) and sequenced on a HiSequation 2000 sequencing system at Nucleome Informatics Pvt.Ltd., Hyderabad, India. The schematic representation of YSB de novo transcriptome analysis is given in Figure 1.

Preprocessing of raw data and read mapping
The YSB raw reads were filtered to obtain HQR by removing low-quality reads and adaptors using Trimmomatic v0.32 with default parameters. The resulting HQR from all the four samples were assembled using Trinity v2.0.0 with default parameters (http://trinityrnaseq.sourceforge. net/, Haas et al. 2013). The redundancy of the assembly was reduced through processing by Cd-hit (Li and Godzik 2006), TGICL (Pertea et al. 2003), and Evigene (http://arthropods.eugenes.org/EvidentialGene/ about/EvidentialGene_trassembly_pipe.html). The de novo assembled transcripts were named as YSB_LS as prefixed by the number emanated from Trinity assembler.

Functional annotation of transcripts
The de novo assembled transcripts were subjected to BLASTx against the protein nonredundant database, Swiss-Prot, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database with an E-value cutoff of 10 25 . Gene Ontology (GO) functional annotation of transcripts was performed using the Blast2GO server (Conesa et al. 2005). Pathway mapping was executed for transcripts using the KEGG (www.genome. jp/kegg) database with an E-value cut-off of 10 25 .

Differential expression of transcripts
The differentially expressed transcripts between the four larval stages were analyzed using EdgeR, available as a plug-in in Trinity based on the FPKM (Fragments Per Kilobase of transcript per Million mapped reads) value (Robinson et al. 2010). Transcripts expressed at very low levels (read counts , 10 across all four libraries) were not considered and P-value # 0.05 was used to identify differentially expressed transcripts. The significance of differences in gene expression was judged using a threshold FDR # 0.001 and an absolute value of log 2 ratio . 1. Heat maps showing differential expression of transcripts belonging to TF families, RNAi machinery, and chemoreception were generated by using MeV 4.0 (https://sourceforge.net/projects/mev-tm4/) software. Heat maps were prepared based on relative FPKM values using hierarchical clustering based on average Pearson's distance following the complete linkage method (Saeed et al. 2003).

Expression validation of transcripts by qRT-PCR
The differentially expressed transcripts selected based on their function were validated through qRT-PCR. An aliquot of 1 mg total RNA was reverse transcribed into single-stranded cDNA using the Prime Script RT reagent kit (TaKaRa, Japan). Primers for qRT-PCR were designed using online software Primer 3 (http://primer3.ut.ee/) and synthesized from IDT (Supplemental Material, Table S1). The qRT-PCR primers were checked for the presence of hairpin structures or dimer formation using the online Oligo Analyzer 3.1 tool (http://eu.idtdna.com/calc/ analyzer; IDT). The reaction mixture for qRT-PCR comprised the SYBR premix Ex-Taq kit (TaKaRa, Japan) with 2 ml normalized cDNA template and 10 pg forward and reverse primers each. Reactions were performed in PCR LC-96-well plates (Roche LightCycler 96; Roche). The YSB b-actin gene reported in our previous study was used as an internal control (Kola et al. 2016). The relative gene expression was analyzed by the DDCT method and fold change was calculated by 2 2DDCT (Schmittgen and Livak 2008).
Helicoverpa armigera (American boll worm) (two polyphagous lepidopteran pests); and Bombyx mori (silkworm) (a monophagous pest) were retrieved. Comparison of protein sequences was done using the BLASTx against the above-mentioned pest genomes with an E-value cut off of 10 25 and 80% similarity.

Phylogenetic analysis
Phylogenetic analysis was carried out for the transcripts belonging to some of the important functional groups, namely insecticide detoxification, chemoreception, and RNAi machinery. For this analysis, transcripts of lepidopteran, hemipteran, and coleopteran sequences were retrieved from NCBI (February 10, 2016). The phylogenetic tree was constructed from the multiple alignments using MEGA 5.0 generated with 1000 bootstrap trials by the Neighbor-Joining method (Tamura et al. 2007).

Data availability
The YSB raw reads and de novo assembled YSB transcripts are available in the Short Read Archive with the accession number SRX733621, and the Transcriptome Shotgun Assembly at the NCBI.

RESULTS AND DISCUSSION
Sequencing and de novo assembly Sequencing of four larval stages yielded a total of 236 million reads, ranging from 32 to 83 million per library. The raw reads were filtered to remove adaptors and sequencing artifacts prior to assembly. A total of 229 million (97.3%) cleaned reads (after preprocessing and quality checking) were obtained from raw reads and the corresponding statistics are presented in Table 1. Since, a reference genome is not available for YSB, de novo transcriptome assembly was done using a Trinity assembler. Assembly of HQR resulted into 24,775 transcripts with a total length of 36.77 Mb. The average length of YSB transcripts was 1485 bp with N50 of 2348 bp ( Table 2). The Illumina platform has been widely used for RNA-seq on account of its large data size, precision, and ease (Porreca et al. 2007). The huge amount of raw data generated in RNA-seq needs preprocessing and quality checking to remove sequence artifacts, low complexity reads, and adaptor contamination for better de novo assembly (Patel and Jain 2012). The preprocessing and assembly of data were carried out by Cd-hit, TGICL, and Evigene, which efficiently handles long reads or transcripts and gives nonredundant data as evidenced by earlier studies (Zhao et al. 2016;Zhang et al. 2014). Although de novo transcriptome assembly without a reference genome represents a computational challenge (Kulkarni et al. 2016), Trinity is an efficient tool for robust rebuilding of transcriptomes from large volumes of RNA-seq data, especially those of ecological and evolutionary importance (Grabherr et al. 2011). Trinity has been successfully employed in de novo assembly of other insect transcriptomes, namely C. suppressalis (Yin et al. 2014) and Vespa mandarinia (Patnaik et al. 2016).
Twenty-four thousand, seven hundred and seventy-five transcripts obtained in this study were compared with transcriptome data of other related rice insects, namely C. medinalis (44,941 unigenes) ) and C. suppressalis (37,040 contigs) . The transcripts emanating from de novo assemblies depend on distinct quality parameters and thus differ according to the parameters employed in each study (Schliesky et al. 2012). The N50 value of the YSB transcriptome was higher compared to previously reported insect transcriptome assemblies, which indicated the maximum utilization of reads. The 44% GC content of the YSB transcriptome corroborated earlier reports showing GC content between 40 and 48% (Camargo et al. 2015;Li et al. 2013). Out of the 24,775 assembled transcripts, 13,506 transcripts were commonly expressed in all developmental stages, while 708, 786, 366, and 631 transcripts were specific to L 1 , L 3 , L 5 , and L 7, respectively ( Figure 2). From our study, the transcripts required for basal metabolic activities like cuticular formation, energy releasing mechanisms, and digestion were commonly expressed. It is evident that larval development involves a set of essential metabolic activities for its growth and development, while the stage-specific transcripts might be important for specific development at a particular larval stage (Ou et al. 2014).
Based on GO, the YSB transcripts were divided into three major categories: cellular component, molecular function, and biological processes. In the cellular component category, the highest number of transcripts were attributed to cell (1998, GO: 0005623) and cellular part (1998; GO: 0044464). In the molecular function category, the maximum number of transcripts were attributed to binding (2569; GO: 0005488) followed by catalytic function (2392; GO: 0003824). In the biological process category, the maximum number of transcripts were associated with cellular process (2449; GO: 0009987) followed by metabolic process (2395; GO: 0008152) ( Figure 3). The metabolic pathway enrichment through KEGG confronted a total of 1078 transcripts, which were mapped to 88 metabolic pathways, and the top ranking 10 metabolic pathways are shown in Figure 4. The highest number of transcripts was mapped to purine metabolism (179, 16.6%). It is the most critical pathway for the generation of energy sources like ATPs and GTPs, necessary for cellular functions in all organisms. Purine metabolism is also involved in transcriptional regulation during sex-specific nutrition allocation in Drosophila melanogaster which influences reproduction, repair, and aging processes (Bauer et al. 2006).
From the total transcriptome, the transcripts associated with five important classes were selected for further discussion, namely insecticide detoxification and target enzymes, chemoreception for host specificity, RNAi machinery, TFs, hormonal biosynthesis, and visual perception.

Insecticide detoxification and target enzymes
Insecticides such as pyrethroids and neonicotinoids show high toxicity to insect pests, but relative hypotoxicity to mammals and natural enemies. One of the main reasons for this phenomenon is variation attributed to the presence of insecticide target genes and their metabolism . In YSB, genetic information on insecticide resistance has yet not been reported. We identified transcripts related to insecticide detoxification like cytochrome P450 (CYP450), glutathione S-transferases (GSTs), and CarEs. In addition, insecticide targets like acetylcholinesterase (AChE), g-aminobutyric acid receptor (GABA), nicotinic acetylcholine receptor (nAChR), sodium channels, and ligandgated chloride channels were also found (Table S4).
CYP450 is one of the largest protein super families in insect species, functioning in xenobiotic metabolism and detoxification (Scott 2008). Among the 80 CYP450 transcripts identified in this study, 17 transcripts showing . 75% similarity in the nr database were used for phylogenetic analysis. The majority of the CYP450 family transcripts were unique to YSB, except a few that clustered with other insects such as C. suppressalis (Figure 5a). CYP450 has three important families, namely CYP4, CYP6, and CYP9. The CYP4 genes in insects are involved in both pesticide metabolism and chemical communication.
Previous studies have demonstrated that the CYP4 family genes CYP4C27 in Anopheles gambiae (David et al. 2005) and CYP4G19 in Blattella germanica (Pridgeon et al. 2003) are associated with insecticide resistance. Similarly, CYP6B48 and CYP6B58 in Sp. litura , CYP6AE14 and CYP6AE12 in H. armigera (Zhou et al. 2010), and CYP9A2 in Manduca sexta (Stevens et al. 2000) were reported for their expression in response to plant allelo chemicals and xenobiotics. In the current study, we have reported the presence of common and unique CYP genes in YSB for the first time, which can be further characterized and exploited.
The GTSs belong to a family of multifunctional detoxification enzymes that catalyze the conjugation of reduced glutathione with exogenous and endogenous toxic compounds or their metabolites. In insects, GSTs mediate resistance to organophosphates (OPs), organochlorines, and pyrethroids (Yamamoto et al. 2009). The main classes of GSTs in insects are delta, epsilon, omega, sigma, theta, zeta and microsomal. The delta and epsilon classes comprise the largest insect-specific group of GST classes and play an important role in xenobiotic detoxification (Ranson et al. 2002). They are well-studied in insect species like Lucilia cuprina, N. lugens, P. xylostella, M. sexta, Musca domestica (Enayati et al. 2005), and B. mori (Yamamoto et al. 2009). An epsilon GST was reported to detoxify DDT resulting in DDT-resistance in A. gambiae (Ortelli et al. 2003), while a delta class GST showed induced resistance to pyrethroids by detoxifying lipid peroxidation products (Vontas et al. 2001). We identified a total of 23 transcripts encoding GSTs. Twelve transcripts of GST (representing delta, epsilon, zeta, sigma and omega classes) with . 75% similarity in the nr database and an average length of 1518 bp were considered for phylogenetic analysis. The GST transcripts of YSB shared considerable similarity with B. mori followed by O. furnacalis (Figure 5b). Here also, common and unique transcripts related to GSTs in YSB were identified, which can be later characterized for future development of effective insecticides.
Carboxylesterases are enzymes of the carboxyl/cholinesterase gene family that catalyze the hydrolysis of carboxylic esters to their component alcohols and acids. They have several physiological functions, such as the degradation of neurotransmitters and metabolism of specific hormones and pheromones (Cui et al. 2011). Carboxylesterases offer insect resistance to OPs, carabamates, and pyrethroids by gene amplification or transcriptional upregulation (Yan et al. 2009). Carboxylesterases have been extensively studied in insects like L. cuprina and D. melanogaster (Heidari et al. 2005). Here, seven CarE transcripts with an average length of 2377 bp were used for phylogenetic tree construction, which showed considerable similarity with CarE of Cn. medinalis (Figure 5c). Although the molecular mechanism of insecticide action is still unknown in YSB, the present transcriptome study provides baseline information on insecticide target enzymes and genes associated with insecticide detoxification. These results indicate that YSB has transcripts that are involved in detoxification mechanisms, so these may be the potential targets for suppression to bring down insecticide resistance in YSB (Kola et al. 2015).
Chemoreception for host specificity In insects, both larvae and adults use their olfactory system to detect food resources, sexual partners, or adequate oviposition sites. Chemoreception occurs by recognition of plant volatiles and uptake of chemical components from the environment and their interaction with chemoreceptors (Smith and Getz 1994). Generally, the chemoreception system in insects has important components, namely odorant binding proteins (OBPs), chemosensory proteins (CSPs), odorant receptors (ORs), sensory neuron membrane proteins, ionotropic receptors (IRs), and gustatory receptors (GRs) (Leal 2013;Jia et al. 2016). The OBPs and CSPs are signal binding proteins that recognize chemical cues from the ambient environment, whereas ORs, IRs, and GRs convert chemical signals into neuronal activity and thus play key roles in local adaptation and reproductive isolation in insects. The diversity of ORs between and within the insect species imparts specificity to insects and allows these receptors to bind to a variety of ligands (Sato et al. 2008). Genome sequencing and molecular studies have characterized the complete OR repertoires and other olfactory genes in several insect species such as B. mori (Gong et al. 2009) and Conogethes punctiferalis (Jia et al. 2016), providing an understanding of the olfactory signal pathways in these insects. In the YSB transcriptome, a total of 62 transcripts related to chemoreception were identified. Among these transcripts, 21 transcripts including four CSPs, three GRs, nine OBPs, and five ORs were selected for phylogenetic analysis to determine their relationship with other insect species (Table S4).
YSB transcripts coding for OBP (si c76982g1i1 OBP1) and CSP (si c8801g1i1CSP) shared high homology with the rice insect N. lugens and showed high bootstrap values ( Figure 6). Similar observations were made with protein sequence comparison (99% homology). These two transcripts matched those of another monophagous rice insect, indicating that they may be crucial for rice recognition by insects. The remaining chemoreception associated transcripts of YSB did not show any homology with other insects, suggesting that YSB has independently evolved its chemoreception machinery leading to stenophagy. The finding that the two transcripts are common in two specialist feeders of rice; namely YSB and BPH is worth probing further, to ascertain their role in host finding. Although the function of OBP, GRs, ORs, and the host specificity mechanism in YSB are not clear, earlier studies in other insects have suggested their tight association with plant volatile reception (Xue et al. 2014). Through RNA-seq, chemosensory genes have been identified and their significance in host recognition was reported in some lepidopteran species like H. armigera , Cydia pomonella (Bengtsson et al. 2012), and M. sexta (Grosse-Wilde et al. 2011). The presence of an olfactory system in another lepidopteran pest, cotton leaf worm (S. littoralis), also gives clues regarding the chemosensory behaviors of insects. The present study confirms the existence of a robust chemoreception mechanism in YSB via the presence of all the other important components required for chemoreception, which might be responsible for host detection and the monophagous nature of this insect.

RNAi machinery
The existence of RNAi machinery in animals, plants, and fungi has enabled it to be applied to gene knockdown studies more efficiently. In insects, RNAi is triggered by the presence of dsRNA or siRNAs (Huvenne and Smagghe 2012). Although, RNAi mechanisms exist in some insects, the differences regarding signal amplification, systemic effect, and inheritance vary among the species (Posnien et al. 2009). Dicer1 and Dicer2 are known to cleave the long dsRNA into siRNAs (21-25 nt). The presence of dicer genes was reported in T. castaneum (Tc-Dcr-1 and Tc-Dcr-2) and S. litura (Dcr1 and Dcr2) (Tomoyasu et al. 2008;Gong et al. 2015). Argonaute (Ago) proteins bind to singlestranded mRNA through siRNAs to degrade it in a sequence-specific manner (Hammond et al. 2001). The presence of two Ago genes in N. lugens, Ago1 and Ago2 in S. litura, and five types of Ago in T. castaneum (Tc-ago-1, Tc-ago-2a, Tc-ago-2b, Tc-ago-3, and Tc-Piwi) have been reported (Tomoyasu et al. 2008;Xu et al. 2013;Gong et al. 2015). Sid proteins are well-known for the uptake and spread of gene silencing signals. The presence of these proteins has been reported in Apis mellifera (Aronstein et al. 2006), B. mori (Kobayashi et al. 2012), and S. litura (Gong et al. 2015).
In YSB, core genes of RNAi machinery, namely Dicer1 (Dcr-1), Dicer2 (Dcr-2), Argonaute1 (Ago-1), Argonaute2 (Ago-2), Sid-1, Sid-2, Sid-3, and Sid-1-related gene were found that indicates that YSB has complete RNAi machinery (Table S4). Until now, evidence to support the existence of RNAi machinery in YSB has not been reported. Thus, these findings are important in that they open up the possibility of exploring RNAi strategies for the management of YSB. In our previous study, we have shown the silencing of genes, i.e., Aminopeptidase (APN) and CYP450 (CYP6) by feeding the YSB larvae with dsRNAs (Kola et al. 2016). Through transcriptome analysis, the presence of RNAi genes and a silencing response have been reported in Cylas puncticollis (Prentice et al. 2015) and Tuta absoluta (Camargo et al. 2015). It is interesting to note that the YSB transcripts encoding Sid-1, Sid-2, Ago-1, and Ago-2 revealed higher homology with another lepidopteran pest, S. litura, in which RNAi has been exploited very well for its control (Figure 7).

Transcription factors
Gene regulation can be controlled by DNA-and protein-binding TFs at both the transcriptional and translational level. A total of 415 TFs were identified in the YSB transcriptome, of which the majority of the TFs belongs to zinc fingers (ZFs) (400) followed by Myb proteins (8). The Cys 2 His 2 ZFs are the most common DNA-binding motifs in all eukaryotes (Wolfe et al. 2000). It has been well-established that ZF TF's are involved in DNA recognition, RNA packaging, transcriptional activation, and lipid binding, which is required for growth and development in both animals and plants (Laity et al. 2001). In red beetle (Tribolium), Sp-like ZF TF (T-Sp8) was reported to be involved in formation of body appendages and the regulation of growth (Beermann et al. 2004). The other TFs, namely leucine-zipper TFs, basic helix-loop-helix (bHLH) TFs, and b-NAC-like protein were less in number. The bHLH TFs are also known to play important roles in controlling cell proliferation and tissue differentiation (Kageyama and Nakanishi 1997), and the developmental processes of D. melanogaster (Moore et al. 2000) and B. mori (Zhao et al. 2014).

Hormone biosynthesis
In insects, development, moulting, and other related processes are regulated by juvenile hormone (JH). Degradation of JH occurs mainly by the action of juvenile hormone esterase (JHE) and/or juvenile hormone epoxide hydrolase (JHEH) (Belles et al. 2005). In the YSB transcriptome, we also found the presence of JHE, JHEH, and JH-binding protein. Another important insect hormone, prothoracicotropic hormone (PTTH), stimulates the secretion of moulting hormone (ecdysone) from prothoracic glands and regulates the developmental timings in insects (McBrayer et al. 2007). The presence of this hormone (YSB_LS_c64818_g1_i1) was also evidenced in our work. Insect ecdysis behavior at each developmental stage will be determined by ecdysis-triggering hormone (Park et al. 2003). In YSB, we also found the presence of ecdysone receptors (YSB_LS_c42438_g1_i1 and YSB_LS_CL10106Contig1), which can be used as selective insecticides. Earlier studies have also reported that ecdysone receptor (EcR) has been used as a potential target for RNAi based pest control in H. armigera  and N. lugens .

Visual perception
The r-opsin family of proteins is instrumental in visual perception and host recognition in arthropods (Feuda et al. 2016). In YSB, we also found transcripts encoding long-wavelength opsin (YSB_LS_CL7470Contig1), UV opsin (YSB_LS_c32301_g1_i1), and blue opsin (YSB_LS_c33970_g1_i1) in all the four larval developmental stages. In addition, rhodopsins like Rh 1 and Rh 3 were also found at different developmental stages, which may indicate that they have an important role in host recognition apart from chemoreception.

Differential gene expression
To determine the level of gene expression during YSB larval developmental stages, transcripts were analyzed on the basis of FPKM values. Overall, 9775 transcripts were differentially expressed across the four stages of larval development (Figure 8). In L 3 compared to L 1 , 1340 and 951 transcripts were up-and downregulated. Transcripts exclusively expressed at L 1 and L 3 were 388 and 206, respectively; 1697 transcripts were common to both stages (Table S5). In the top 10 differentially upregulated transcripts, most were associated with cuticular proteins, suggesting that cuticle formation may be the key process at this stage of larval development. Cuticular proteins and chitin are essentially required to maintain physical structure and localized mechanical activities (Andersen et al. 1995). Thus, these cuticular proteins might also be involved in maintaining physical structure in YSB.
A list of the top 10 up-and downregulated transcripts is given in Table S6. Interestingly, the transcript encoding long-wave opsin (YSB_LS_c24494_g1) was highly expressed only at the L 1 stage compared to other stages. It is required for visual perception according to environmental cues and has a prominent role in insect behavior, as reported in H. armigera (Yan et al. 2014). This could be of significance in YSB, as it may aid in dispersal of the just-hatched larvae and their ability to find the host plant. In addition to the opsin, one of the neuropeptide precursors (YSB_LS_c24589_g1) was also highly expressed at the L 1 stage compared to other larval stages. The roles of neuropeptides and their precursors in development, physiology, and Figure 8 Differentially expressed transcripts between four larval developmental stages of S. incertulas. Up-(red) and downregulated (green) YSB transcripts between the four developmental stages. L 1 , first instar larvae; L 3 , third instar larvae; L 5 , fifth instar larvae; L 7 , seventh instar larvae; YSB, yellow stem borer.
behavior were previously reported in several insects like N. lugens (Tanaka et al. 2014) and T. castaneum (Li et al. 2008).
Similarly, between L 3 and L 5 , 3172 differentially expressed transcripts were observed, of which 1896 were upregulated and 1276 were downregulated at L5. Three hundred and sixty-eight transcripts at L 3 stage and 314 transcripts at L 5 stage were expressed exclusively, whereas 2490 transcripts were expressed commonly in both stages. Among the top 10 differentially upregulated transcripts, two transcripts; discs large1 and Aminopeptidase N3 (APN3) are noteworthy. Hexamerine, reverse transcriptase, and b-fructofuranosidase 2 transcripts were downregulated. The APN has an important role in dietary protein digestion in the midgut (Wang et al. 2005). It can be postulated that the APN gene in YSB may also have a prominent role in the larval midgut and can be used as a candidate for gene silencing. Previous studies also demonstrated that inhibition of its activity in the larval midgut can result in a detrimental effect on larval growth and development, finally leading to larval death (Ningshen et al. 2013). At the L 3 stage, the maximum number of transcripts involved in insecticide detoxification was expressed. EcR, nAChR, and GABA were more highly expressed at the L 3 stage than at all other stages, which indicates that molting and neurotransmission are key processes at this stage of development. The larval developmental transitions (molting and metamorphosis) are largely regulated by changes in the titer of the ecdysteroid hormone 20-hydroxyecdysone (20E) binding to its EcR. The function of EcR isoform-A (BgEcR-A) has been reported through RNAi in Bl. germanica (Cruz et al. 2006). Insect nAChR and GABA receptors mediate postsynaptic cholinergic transmission (Raymond-Delpech et al. 2005). The effect of nAChR on the insect central nervous system has been previously reported in A. mellifera (Jones et al. 2006) and B. mori (Shao et al. 2007).
Between L 5 and L 7 , 4292 transcripts were differentially expressed, of which 2668 were upregulated and 1624 were downregulated at L 7 . Of these, 3405 transcripts were expressed in both the stages, while 463 and 424 transcripts were exclusively expressed at L 3 and L 5 , respectively. Among the top 10 differentially upregulated transcripts, ZF protein 600 (ZFP600) and CarE were highly upregulated while the maximum transcripts encoding cuticular proteins were downregulated. The sperm flagellar protein (YSB_LS_c17944_g1) and testis-specific tektin (YSB_LS_c7891_g1) were exclusively highly expressed at the L 5 stage compared to all other stages, which indicates that these transcripts could be involved in sperm formation and maturation, particularly during this instar of YSB. The role of testis-specific protein (BmTst) Figure 9 Significantly up-and downregulated and exclusively expressed transcripts at each stage of S. incertulas larval development. for spermatogenesis has been reported in B. mori (Ota et al. 2002), but in YSB, no information is available so far on spermatogenesis. The role of tetkin in YSB need to be probed further to understand the mechanism. Among exclusively expressed transcripts at the L 7 stage, the majority were TFs. Components of the RNAi gene silencing machinery like Sid1 (YSB_LS_c38003_g4) and Ago-1 (YSB_LS_c27000_g2) were also highly expressed only at L 7. The transcripts for ZF proteins, CarEs, cuticular proteins, and voltage-gated sodium channels were also highly expressed at the L 7 stage, and these may be used as potential candidate genes for gene silencing through RNAi. The number of differentially expressed transcripts increased during the larval transitions from L 1 to L 7 . This indicates that L 7 might be the most evolved stage of YSB development proceeding to the pupal stage. These transcripts might be involved in metamorphosis and modulated by the expression of tissuespecific transcripts. The significantly up-and downregulated and exclusively expressed transcripts at each stage of larval development are represented in Figure 9. JH hormone regulation is usually controlled by JHE, JHEH, and JH-binding protein. The expression of JHE (YSB_LS_CL6858Contig1) and JHEH (YSB_LS_CL2874Contig1) decreased gradually from L 1 to L 7 , while that of JH-binding protein was significantly decreased at the L 7 stage, indicating that JH regulation is crucial for YSB development. During unfavorable conditions, YSB remains as larva and may undergo up to 10 larval moults with low levels of JH-binding protein. In general, a lack of PTTH results in delayed larval development and eclosion (McBrayer et al. 2007). Inhibition of PTTH has been shown to regulate the larval diapauses (Chippendale 1977). Interestingly, low expression of PTTH (YSB_LS_c64818_g1_i1) was observed at the L 7 stage, suggesting that PTTH may have a role in YSB development.
Comparing all the four YSB larval developmental stages, the maximum number of commonly expressed transcripts was observed for cuticle proteins, chymotrypsins, CYP450s, GSTs, APN, CarE, ZFPs, CSPs, and OBPs. The consistent expression of these transcripts provides evidence for their role in insect growth and development. The functionally important transcripts belonging to the RNAi machinery, chemoreception, and TFs were analyzed based on their normalized gene expression values (FPKM) and represented as heat maps based on the hierarchical clustering of transcripts ( Figure 10). In the case of TFs, the transcript for leucine-zipper protein was expressed continuously in all larval stages, indicating that it has an important functional role in the regulation of gene expression during larval stages. However, one of the transcripts encoding b-NAC was expressed only at the L 1 stage. In the case of RNAi machinery, Ago-2 was highly expressed in all larval stages but Ago-1 showed reduced expression, which indicated that Ago-2 might be more involved in the RNAi pathway in YSB. In contrast, at the L 7 stage, dcr-2, sid-1, and sid-2 were highly expressed, which further shows that the RNAi mechanism might be functional at the L 7 stage. The expression of chemoreception-associated transcripts in YSB was the highest at the L 1 and L 3 larval stages; however, transcripts for ORs were constitutively expressed in all larval stages, indicating that they might play a crucial role in host specificity. Pairwise comparisons between stages on the basis of fold change and false discovery rate between the samples have been depicted as MA and Volcano plots ( Figure S3).
YSB EST-SSR mining and characterization SSR mining identified a total of 1327 (5%) transcripts harboring SSR motifs. The newly developed SSRs with regard to the transcripts derived from RNA-seq were referred to as YSB EST-SSR. We categorized the SSR Figure 11 Validation of the RNA-sequencing (RNA-seq) data using quantitative reverse transcription-polymerase chain reaction (qRT-PCR). All data were normalized to the reference gene, yellow stem borer b-actin. The transcripts validated were ecdysteroid-regulated protein (ERP), cytochrome p450 (CYP6AB51), neuropeptide receptor A10 (NPR) aminopeptidase N (APN), zinc finger DNA-binding protein (ZFDB), nicotinic acetylcholine receptor (nAChR), carboxylesterase (CarE), chemosensory protein (CSP), seminal fluid protein (SFP), and helix-loophelix protein (HLH). Error bars indicates statistical significance of data.
n motifs into perfect SSRs (repeat motifs that are simple tandem sequences, without any interruptions) and compound SSRs (sequences containing two adjacent distinct SSRs, separated by none to any number of base pairs) (Parekh et al. 2016). In our study, we kept this distance as 100 bp and excluded mononucleotide repeats to reduce the sequencing errors in the data (Kalia et al. 2011). A total of 563 SSR motifs were found in the YSB transcriptome with a frequency of one SSR/10.98 kb (Table 3). Comparatively, the frequency of YSB EST-SSRs was higher than in earlier reports in insect species (Hamarsheh and Amro 2011). This further strengthens the good transcriptome coverage achieved by RNA-seq. Out of the 563 SSR motifs, 511 SSR motifs belonged to class І ($ 20 bp) and 52 to class II (, 20 bp) (Table 4). Considering the reverse and complementary sequences and/or different reading frames, (AC)n, (CA)n, (TG)n, and (GT)n were counted as the same class ). On the basis of identified SSRs, trinucleotide repeats were the most represented (59.5%), with motif repeats CCG/CGG, followed by dinucleotide repeats with 35.1% motifs of AT/AT (Table S7). It is a fact that frameshift mutations or additions/deletions in trinucleotide repeats do not disturb the open reading frame (Metzgar et al. 2000). This further supports the highest number of trinucleotide repeats obtained in the present study. The repeat type AT/AT was also found to be the highest in the Asian giant hornet V. mandarinia (Patnaik et al. 2016). The EST-SSRs are worthy genetic resources for understanding genetic diversity, and are comparatively more efficient than genomic SSRs in population genetic studies (Arias et al. 2011). The YSB EST-SSRs provide a valuable resource for future genetic analysis of S. incertulas and can be employed for constructing high-density linkage maps of YSB. Moreover, YSB EST-SSRs can be used for cross-species/ taxa identification, which can also yield preliminary information regarding other related but unexplored insect taxa (Stolle et al. 2013).

Comparative transcriptome analysis
To understand the similarity or variability among different insect species on the basis of sequence conservation, comparative transcriptome analysis of YSB was done with six related insect species. It showed significant homology with C. suppressalis (4186) followed by N. lugens (3787), B. mori (3027), S. frugiperda (452), H. armigera (293), and Cn. medinalis (27) (Figure 12). We found that most of the transcript (63) sequences are unique to YSB on the basis of less percentage identity with related species. Transcripts related to detoxification mechanisms, namely nAChR (YSB_LS_CL9841Contig1), AChE (YSB_LS_c43640_g1_i3), GSTs (YSB_LS_c41379_g1_i2), as well as the voltage-gated sodium channel (YSB_LS_CL609Contig1) and GABA receptor (YSB_LS_c20927_g2_i1), showed 95-100% similarity with C. suppressalis and Cn. medinalis, which are predominant rice insects in South Asian countries. Based on this observation, it appears that the detoxification mechanism is conserved across the lepidopteran rice pests, So these genes could be explored for silencing through RNAi to achieve broad-spectrum resistance in rice (He et al. 2012).
In contrast, two of the transcripts encoding the voltage-gated ion channel and CYP450 showed # 20% similarity with all other pests, suggesting that they are specific to YSB. In the RNAi machinery, the transcript for Sid-2 (YSB_LS_CL1284Contig1) and Ago-1 (YSB_LS_c27000_g2_i1) showed 95% sequence homology with C. suppressalis and N. lugens. However, Ago-2, Sid-1, Sid-3, and sid-1related genes showed less homology (40-50%), suggesting that the evolution of the RNAi mechanism may be driven by the host. Interestingly, the transcript responsible for visual recognition, i.e., long wave opsin (YSB_LS_c24494_g1_i1), and transcripts involved in the chemoreception mechanism (CSP and OBP) showed 98-100% homology with BPH.
Notably, BPH and YSB have only rice as a host and both are monophagous insects. From the homology of host-specific recognition transcripts, it can be postulated that these transcripts might be playing an important role in the monophagy of YSB and BPH in rice. Similarly, the transcripts common to both C. suppressalis and YSB suggested an association with feeding behavior, as both are borers on rice crops. Generally, the morphology of mouthparts plays a major role in the boring, chewing, or sucking capability of insects (Rogers et al. 2002), but tremendous diversification has been reported in insect mouthparts corresponding to their food niches. In YSB, transcripts like sex combs reduced (scr) homolog (YSB_LS_c35525_g1_i2), Hox protein pb (YSB_LS_c642_g2_i1), and distal-less-like (dl) protein (YSB_LS_c27814_g1_i2) were identified, which showed higher similarity (# 80%) with B. mori and C. suppressalis (chewing type of mouth parts) and have less similarity ($ 52%) with N. lugens (sucking pest). This implies that these three transcripts may be required for the feeding and boring ability of YSB. Nevertheless, further in-depth research is needed for better understanding of the feeding behavior of YSB.

Figure 12
Comparative analysis of yellow stem borer transcriptome with lepidopteran and hemipteran insect protein sequences at a cutoff E-value 10 26 and # 80% similarity.
Among 63 YSB-specific transcripts, most coded for unpredicted (7), putative (3), and hypothetical proteins (13). This could be due to uniqueness of transcripts for YSB, which has not been very wellstudied at the molecular level. The remaining transcripts, whose functions are known, includes adipokinetic hormone receptor (YSB_LS_c40297_g7_i2), CarE (YSB_LS_c41108_g1_i1), muscle M-line assembly protein (YSB_LS_c39945_g2_i1), WD repeat domain-containing protein (YSB_LS_CL8220Contig1), ZF protein (YSB_LS_c40816_g4_i5), allantoinase (YSB_LS_c37251_g1_i6), neuropeptide receptor (YSB_LS_c32700_g1_i1), leucine-zipper TF (YSB_LS_CL2667Contig), CYP450 (YSB_LS_c43088_g1_i5), titin-like protein (YSB_LS_c42866_g3_i1), U5 small nuclear ribonucleoprotein (YSB_LS_c10666_g1_i1), mitochondrial carrier protein (YSB_LS_c39431_g4_i1), and reverse transcriptase (YSB_LS_CL1205Contig1). The insect adipokinetic hormones known to be associated with energy-requiring activities like flight and locomotion in D. melanogaster and B. mori (Staubli et al. 2002) might be operating in YSB also. The YSB-specific CarE and CYP450 might be involved in key mechanisms of resistance to insecticides (particularly OPs and pyrethroids), as reported in S. litura . It can be postulated that the specific ZF protein and leucine-zipper TFs might play role in gene regulation. The presence of unique transcripts encoding titin-like protein, muscle M-line assembly protein, U5 small nuclear ribonucleoprotein, and mitochondrial carrier protein suggests that they have a crucial role in basic regulatory activities like RNA splicing, flight, and insect muscle formation.

Conclusions
Comprehensive transcriptome data of four larval developmental stages of YSB was generated, which greatly enriches the currently available genomic resource for this organism. Differentially expressed transcripts revealed the phenology and behavioral characteristics of different larval stages of YSB. Although there are many transcripts that are present in all the stages, stage-specific transcripts might have a greater role in specific larval development. It appears that YSB has evolved independently from other lepidopteron pests by having specific mechanisms for visual recognition as well as chemoreception, which might play a role in host recognition and have a molecular basis for host specificity. The role of hormones and their regulation in moulting and development was clearly observed. The novel transcripts of YSB identified in the present study could be efficiently used as candidate genes for knocking-down or editing to use in pest management strategies. The presence of Dcr-1, Dcr-2, Ago-1, Ago-2, Sid-1, and Sid1-related genes provided strong evidence of RNAi machinery in YSB. Further, YSB EST-SSR markers are new additions to the genomic resources of YSB that can be readily used for genetic diversity studies. Overall, the data provides information on the larval development of an agriculturally important pest. The wealth of data generated on the genes involved in important molecular mechanisms will facilitate further research into the developmental biology of YSB as well as the evolution of novel pest management strategies.