BrumiR: A toolkit for de novo discovery of microRNAs from sRNA-seq data

Archive ouverte

Moraga, Carol | Sanchez, Evelyn | Ferrarini, Mariana Galvão | Gutierrez, Rodrigo, A | Vidal, Elena, A | Sagot, Marie-France

Edité par CCSD ; Oxford Univ Press -

International audience. Abstract MicroRNAs (miRNAs) are small noncoding RNAs that are key players in the regulation of gene expression. In the past decade, with the increasing accessibility of high-throughput sequencing technologies, different methods have been developed to identify miRNAs, most of which rely on preexisting reference genomes. However, when a reference genome is absent or is not of high quality, such identification becomes more difficult. In this context, we developed BrumiR, an algorithm that is able to discover miRNAs directly and exclusively from small RNA (sRNA) sequencing (sRNA-seq) data. We benchmarked BrumiR with datasets encompassing animal and plant species using real and simulated sRNA-seq experiments. The results demonstrate that BrumiR reaches the highest recall for miRNA discovery, while at the same time being much faster and more efficient than the state-of-the-art tools evaluated. The latter allows BrumiR to analyze a large number of sRNA-seq experiments, from plants or animal species. Moreover, BrumiR detects additional information regarding other expressed sequences (sRNAs, isomiRs, etc.), thus maximizing the biological insight gained from sRNA-seq experiments. Additionally, when a reference genome is available, BrumiR provides a new mapping tool (BrumiR2reference) that performs an a posteriori exhaustive search to identify the precursor sequences. Finally, we also provide a machine learning classifier based on a random forest model that evaluates the sequence-derived features to further refine the prediction obtained from the BrumiR-core. The code of BrumiR and all the algorithms that compose the BrumiR toolkit are freely available at https://github.com/camoragaq/BrumiR.

Suggestions

Du même auteur

Mycoplasma hyopneumoniae J elicits an antioxidant response and decreases the expression of ciliary genes in infected swine epithelial cells

Archive ouverte | Mucha, Scheila, Gabriele | CCSD

International audience. Mycoplasma hyopneumoniae is the most costly pathogen for swine production. Although several studies have focused on the host-bacterium association, little is known about the changes in gene e...

Nitrate sensing and signaling in plants: comparative insights and nutritional interactions

Archive ouverte | Ruffel, Sandrine | CCSD

International audience

The Silene latifolia genome and its giant Y chromosome

Archive ouverte | Moraga, Carol | CCSD

Data and materials availability: Sequencing data (long reads, short reads, and Omni-C datasets), genome assembly, and annotation are available under the project PRJNA1132743 on the National Center for Biotechnology Information (NC...

Chargement des enrichissements...