Generative models are known to be very powerful tools when it comes to the learning of an underlying representation of complex data. While undirected models in the early times, such as Deep Boltzmann Machines or DBMs showed us a great promise, practically they were not able to scale well to complicated high-dimensional settings that were beyond MNIST, possibly because of optimization as well as mixing difficulties. More recent work that has been done on both Helmholtz machines and on variational autoencoders have borrowed from tools of deep learning and can achieve impressive results since they have now been adopted in a large array of domains.
The abstract of this paper lies here:
Directed latent variable models that result in the formulation of the joint distribution as p(x,z)=p(z)p(x∣z) come with the advantage of fast as well as exact sampling. However, these models also come with a weakness of its need to specify p(z), often with a simple fixed prior that in return results in the limiting of the expressiveness of the model. So, Undirected latent variable models help in discarding the requirement that p(z) needs to be specified with a fixed prior even if it's simple, yet sampling from them generally requires an iterative procedure such as blocked Gibbs-sampling that may further require many steps in order to draw samples from the joint distribution p(x,z).
It is, therefore, proposed forward a novel approach to help learn the joint distribution between the data and a latent code which makes the use of an adversarially learned iterative procedure to gradually refine the joint distribution, p(x,z), to better match with the data distribution on each and every step.
GibbsNet turns out to be the best of both worlds both in theory as well as in practice. Since it achieves the speed and simplicity of a directed latent variable model, it is also guaranteed to produce samples from p(x,z) with only a few sampling iterations Also since it successfully achieves both the flexibility and expressiveness of an undirected latent variable model, GibbsNet thus does its way with the need for p(z) and comes with the ability to do class-conditional generation, attribute prediction, as well as joint image-attribute modeling in a single model which has not been trained for any of the specific tasks beforehand.
It is shown empirically that GibbsNet is able to learn a more complex p(z) and also that this leads to an improvement in the painting and iterative refinement of p(x,z)for a number of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps as such.
The ultimate goal of GibbsNet is to emphasize on the training of a graphical model with transition operators that are learned as well as defined directly by matching the joint distributions of the model expectation with the observations that are clamped to data. This has basically taken its inspiration from undirected graphical models, except that of the transition operators, that are defined to move along a defined energy manifold, so this connection is made throughout our formulation.
The Related Work includes:
Energy Models and Deep Boltzmann Machines
Generative Stochastic Networks
Generative Adversarial Learning of Markov Chains
Adversarially Learned Inference (ALI)
The Paper can be concluded as:
We have introduced GibbsNet, a powerful new model for performing iterative inference and generation in deep graphical models. Although models like the RBM and the GSN have become less investigated in recent years, their theoretical properties worth pursuing, and we follow the theoretical motivations here using a GAN-like objective. With a training and sampling procedure that is closely related to undirected graphical models, GibbsNet is able to learn a joint distribution which converges in a very small number of steps of its Markov chain, and with no requirement that the marginal p(z) match a simple prior. We prove that at the convergence of training, in spite of unrolling only a few steps of the chain during training, we obtain a transition operator whose stationary distribution also matches the data and makes the conditionals p(x | z) and q(z | x) consistent with that unique joint stationary distribution.
The above image is a demonstration of learning the joint distribution between images and a list of 40 binary attributes. Attributes on the right are generated from a multinomial distribution as part of the joint with the images in the left.
The paper thus shows that this allows the prior, p(z), to be shaped into a complicated distribution where different classes have representations that are easily separable in the latent space. This leads to improved classification when the inferred latent variables q(z|x) are used directly. Finally, we show that GibbsNet’s flexible prior produces a flexible model which can simultaneously perform in painting, conditional image generation, and prediction with a single model not explicitly trained for any of these specific tasks, outperforming a competitive ALI baseline with the same setup.
For an insight of the paper in depth, one can go to the link below and get the PDF of the same.
Download Free PDF: GibbsNet