The Chlamydomonas genome has been sequenced, assembled and annotated to produce a rich resource for genetics and molecular biology in this well-studied model organism. However, the current reference genome contains ~1000 blocks of unknown sequence ('N-islands'), which are frequently placed in introns of annotated gene models. We developed a strategy to search for previously unknown exons hidden within such blocks, and determine the sequence and exon/intron boundaries of such exons. These methods are based on assembly and alignment of short cDNA and genomic DNA reads, completely independent of prior reference assembly or annotation. Our evidence indicates that a substantial proportion of the annotated intronic N-islands contain hidden exons. For most of these our algorithm recovers full exonic sequence with associated splice junctions and exon-adjacent intronic sequence. These new exons represent de novo sequence generally present nowhere in the assembled genome, and the added sequence improves evolutionary conservation of the predicted encoded peptides.
- Received March 15, 2016.
- Accepted April 23, 2016.
- Copyright © 2016 Author et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.