Relative gradient optimization of the Jacobian term in unsupervised deep learning

Archive ouverte

Gresele, Luigi | Fissore, Giancarlo | Javaloy, Adrián | Schölkopf, Bernhard | Hyvärinen, Aapo

Edité par CCSD -

International audience. Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals-thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian and is computationally expensive, thus imposing a trade-off between computation and expressive power. In this work, we propose a new approach for exact training of such neural networks. Based on relative gradients, we exploit the matrix structure of neural network parameters to compute updates efficiently even in high-dimensional spaces; the computational cost of the training is quadratic in the input size, in contrast with the cubic scaling of naive approaches. This allows fast training with objective functions involving the log-determinant of the Jacobian, without imposing constraints on its structure, in stark contrast to autoregressive normalizing flows.

Suggestions

Du même auteur

Modeling Shared Responses in Neuroimaging Studies through MultiView ICA

Archive ouverte | Richard, Hugo | CCSD

International audience. Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects...

Kernel methods in medical imaging

Archive ouverte | Charpiat, Guillaume | CCSD

Generative modeling : statistical physics of Restricted Boltzmann Machines, learning with missing information and scalable training of Linear Flows. Modélisation générative : physique statistique des Machines de Boltzmann Restreintes, apprentissage avec informations manquantes et apprentissage scalable des flux linéaires

Archive ouverte | Fissore, Giancarlo | CCSD

Neural network models able to approximate and sample high-dimensional probability distributions are known as generative models. In recent years this class of models has received tremendous attention due to their potential in autom...

Chargement des enrichissements...