TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data

Archive ouverte

Kurylo, Cyril | Guyomar, Cervin | Foissac, Sylvain | Djebali, Sarah

Edité par CCSD ; Oxford University Press -

International audience. Abstract Genome annotation plays a crucial role in providing comprehensive catalog of genes and transcripts for a particular species. As research projects generate new transcriptome data worldwide, integrating this information into existing annotations becomes essential. However, most bioinformatics pipelines are limited in their ability to effectively and consistently update annotations using new RNA-seq data. Here we introduce TAGADA, an RNA-seq pipeline for Transcripts And Genes Assembly, Deconvolution, and Analysis. Given a genomic sequence, a reference annotation and RNA-seq reads, TAGADA enhances existing gene models by generating an improved annotation. It also computes expression values for both the reference and novel annotation, identifies long non-coding transcripts (lncRNAs), and provides a comprehensive quality control report. Developed using Nextflow DSL2, TAGADA offers user-friendly functionalities and ensures reproducibility across different computing platforms through its containerized environment. In this study, we demonstrate the efficacy of TAGADA using RNA-seq data from the GENE-SWiTCH project alongside chicken and pig genome annotations as references. Results indicate that TAGADA can substantially increase the number of annotated transcripts by approximately $300\%$ in these species. Furthermore, we illustrate how TAGADA can integrate Illumina NovaSeq short reads with PacBio Iso-Seq long reads, showcasing its versatility. TAGADA is available at github.com/FAANG/analysis-TAGADA.

Suggestions

Du même auteur

Empowering bioinformatics communities with Nextflow and nf-core

Archive ouverte | Langer, Björn, E | CCSD

Standardised analysis pipelines are an important part of FAIR bioinformatics research. Over the last decade, there has been a notable shift from point-and-click pipeline solutions such as Galaxy towards command-line solutions such...

Gene networks controlling functional cell interactions in the pig embryo revealed by omics studies

Archive ouverte | Dufour, Adrien | CCSD

International audience. Pig embryonic development differs from that of humans and mice from the blastocyst stage and is characterised by a much later implantation. This particular period is associated with a lengthe...

Knowledge graph based integration of transcriptome sequencing data to explore miRNA mediated regulation

Archive ouverte | Carpentier, Océane | CCSD

National audience. MicroRNAs (miRNAs) are small non coding RNAs essentially known to repress the expression of protein coding genes, either by degrading mRNAs or by preventing their translation into proteins by bind...

Chargement des enrichissements...