Generating functional protein variants with variational autoencoders

Archive ouverte

Hawkins-Hooker, Alex | Depardieu, Florence | Baur, Sebastien | Couairon, Guillaume | Chen, Arthur | Bikard, David

Edité par CCSD ; PLOS -

International audience. The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.

Suggestions

Du même auteur

On-target activity predictions enable improved CRISPR-dCas9 screens in bacteria

Archive ouverte | Calvo-Villamañán, Alicia | CCSD

International audience. The ability to block gene expression in bacteria with the catalytically inactive mutant of Cas9, known as dCas9, is quickly becoming a standard methodology to probe gene function, perform hig...

Specificity and Mechanism of tRNA cleavage by the AriB Toprim nuclease of the PARIS bacterial immune system

Archive ouverte | Belukhina, Svetlana | CCSD

Transfer RNA molecules have been recently recognized as widespread targets of bacterial immune systems. Translation inhibition through tRNA cleavage or modification inhibits phage propagation, thereby protecting the bacterial popu...

Phages and their satellites encode hotspots of antiviral systems

Archive ouverte | Rousset, François | CCSD

International audience. Bacteria carry diverse genetic systems to defend against viral infection, some of which are found within prophages where they inhibit competing viruses. Phage satellites pose additional press...

Chargement des enrichissements...