Should we really use graph neural networks for transcriptomic prediction?

Archive ouverte

Brouard, Céline | Mourad, Raphaël | Vialaneix, Nathalie

Edité par CCSD ; Oxford University Press (OUP) -

International audience. The recent development of deep learning methods have undoubtedly led to great improvement in various machine learning tasks, especially in prediction tasks. This type of methods have also been adapted to answer various problems in bioinformatics, including automatic genome annotation, artificial genome generation or phenotype prediction. In particular, a specific type of deep learning method, called graph neural network (GNN) has repeatedly been reported as a good candidate to predict phenotypes from gene expression because its ability to embed information on gene regulation or co-expression through the use of a gene network. However, up to date, no complete and reproducible benchmark has ever been performed to analyze the trade-off between cost and benefit of this approach compared to more standard (and simpler) machine learning methods. In this article, we provide such a benchmark, based on clear and comparable policies to evaluate the different methods on several datasets. Our conclusion is that GNN rarely provides a real improvement in prediction performance, especially when compared to the computation effort required by the methods. Our findings on a limited but controlled simulated dataset shows that this could be explained by the limited quality or predictive power of the input biological gene network itself.

Consulter en ligne

Suggestions

Du même auteur

Semi-supervised learning with pseudo-labeling compares favorably with large language models for regulatory sequence prediction

Archive ouverte | Phan, Han | CCSD

International audience. Predicting molecular processes using deep learning is a promising approach to provide biological insights for non-coding single nucleotide polymorphisms identified in genome-wide association ...

NMFProfiler: a multi-omics integration method for samples stratified in groups

Archive ouverte | Mercadié, Aurélie | CCSD

International audience. Motivation The development of high-throughput sequencing enabled the massive production of “omics” data for various applications in biology. By analyzing simultaneously paired datasets collec...

ProA and ProB repeat sequences shape genome organization, and enhancers open domains

Archive ouverte | Bonnet, Konstantinn Acen | CCSD

SUMMARY There is a growing awareness that repeat sequences (RepSeq) - the main constituents of the human genome - are also prime players in its organization. Here we propose that the genome should be envisioned as a supersystem wi...

Chargement des enrichissements...