Annotation of the oak genome sequence and associated bioinformatic resources

Archive ouverte

Amselem, Joelle, J. | Aury, Jean Marc | Francillonne, Nicolas | Alaeitabar, Tina | da Silva, Corinne | Duplessis, Sébastien | Ehrenmann, François | Faye, Sébastien | Klopp, Christophe | Gaspin, Christine | Ruë, Olivier | Labadie, Karine | Leroy, Thibault | Lesur, Isabelle, Lesur Kupin | Faivre-Rampant, Patricia, P. | Leplé, Jean-Charles | Kremer, Antoine | Martin, Francis | Salse, Jerome, J. | Quesneville, Hadi | Plomion, Christophe

Edité par CCSD -

National audience. The large, complex and highly heterozygous genome of pedunculate oak (Quercus robur) was sequenced using a whole-genome shotgun approach [1] . Roche 454 GS-FLX sequence reads were assembled into contigs and combined with Illumina reads from paired-end, mate-pair libraries and true synthetic long reads to build a total of 8,827 scaffolds (1.46 Gb total size; N50=821 kb). Both haplotypes were merged into an haploid version and 12 pseudomolecules were established using a high-density linkage map [2] combined with a syntenome approach using the peach genome sequence.The structural (Transposable Elements (TEs), genes, ncRNA) and functional annotation of automatically predicted genes relies on powerful and robust pipelines: (i) REPET package [3] [4] was first used to de novo detect, classify and annotate TEs representing about 50% of the genome; (ii) Eugene was trained and launched to integrate ab initio and similarity gene finding software to finally predict 43,240 genes including 29,665 highly confident gene models; (iii) ncRNA were predicted using feelcn (lncRNA), similarities against databases and small RNAseq data analysis (miRNA), RNAmmer (rRNA), tRNAscan-SE (tRNA) and Infernal package (other non-coding RNA) (iv) A functional annotation pipeline mainly based on Interproscan to search for patterns/motifs and Blast based comparative genomics was launched onto the 43,240 predicted proteins. The assignation of a provisional definition for predicted protein according to the results of the most reliable tools and their occurrence in Oak annotation was produced (D. Goodstein method, personal communication). We will present here these pipelines and the results of this annotation.We also set up an integrated genome annotation system (dedicated to oak) based on GMOD web interfaces such as WebApollo/JBrowse and Intermine to make these data available under a user-friendly environment. This system allowed experts to analyze their respective protein families of interest and curate/validate gene structure. We will also present the interoperability between these genomic data and genetic data produced in Quercus (SNPs, linkage maps, QTLs) available in GnpIS [5] an information System for plants. All together these resources provide a framework to study the two key evolutionary processes that explain the remarkable diversity found within the Quercus genus: local adaptation and speciation.

Consulter en ligne

Suggestions

Du même auteur

Oak genome sequencing and evolution

Archive ouverte | Salse, Jerome, J. | CCSD

National audience

Oak genome reveals facets of long lifespan

Archive ouverte | Plomion, Christophe | CCSD

Oaks are an important part of our natural and cultural heritage. Not only are they ubiquitous in our most common landscapes1 but they have also supplied human societies with invaluable services, including food and shelter, since p...

An integrated information system dedicated to oak genomics and genetics

Archive ouverte | Amselem, Joëlle | CCSD

GnpIS is an information system designed to integrate and link genomic, genetic and environmental data into a single environment dedicated to plant (crops and forest trees) and fungi data. GnpIS is regularly improved with new funct...

Chargement des enrichissements...