CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS

Archive ouverte

Roux de Bézieux, Hector | Lima, Leandro | Perraudeau, Fanny | Mary, Arnaud | Dudoit, Sandrine | Jacob, Laurent

Edité par CCSD ; Oxford University Press (OUP) -

International audience. Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k -mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through k -mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic k -mers. These covariates are able to capture polymorphic genes as a single entity, improving k -mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_Recomb .

Suggestions

Du même auteur

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Archive ouverte | van den Berge, Koen | CCSD

International audience

A general and flexible method for signal extraction from single-cell RNA-seq data

Archive ouverte | Risso, Davide | CCSD

International audience

More power via graph-structured tests for differential expression of gene networks

Archive ouverte | Jacob, Laurent | CCSD

International audience. We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such...

Chargement des enrichissements...