Wasserstein GAN: An Alternative To The Traditional GAN Training

Jan. 28, 2018, 3:26 a.m. By: Kirti Bakshi

Wasserstein GAN

This paper proposes training Generative Adversarial Networks(GAN) making the use of a slightly different objective function. This newly proposed objective function is much more stable to train than that of a standard GAN since it avoids vanishing gradients during training.

The Goal:

Mainly, The problem that this paper is concerned with is that of unsupervised learning and the main goal of the same lies in the use of a better objective function for the more stable training of GANs.

The contributions of this paper are:

The paper is comprised of various sections that cover various topics. A few of which are mentioned below.

• Section 2: This section provides a comprehensive theoretical analysis of how the Earth Mover (EM) distance behaves in comparison to popular probability distances and divergences that are used in the context of learning distributions.

• Section 3: In this section, the paper defines a form of GAN that is called Wasserstein-GAN that minimizes an efficient and reasonable approximation of the EM distance, and also theoretically shows that the corresponding optimization problem is sound.

• Section 4: This section empirically shows that WGANs cure the main training problems of GANs. In particular, training WGANs does not require maintaining a careful balance in training of the discriminator and the generator and does not require a careful design of the network architecture either. The mode dropping phenomenon that is typical in GANs is also drastically reduced.

One of the most compelling practical benefit of WGAN:

One of the most compelling practical benefits of WGANs is the ability to continuously estimate the EM distance by training the discriminator to optimality. Plotting these learning curves is not only useful for debugging and hyperparameter searches, but also correlate remarkably well with the observed sample quality.

Empirical Results based on Experiments:

Experiments are run on image generation making the use of the Wasserstein-GAN algorithm that shows that significantly there are practical benefits to using it over the formulation that is used in traditional standard GANs. The two main benefits claimed in this paper are:

  • A meaningful loss metric that correlates with the generator’s convergence and sample quality

  • Improved stability of the optimization process

Wasserstein-GAN algorithm

Improved stability: One of the benefits of WGAN:

One of the benefits of WGAN is that it allows us to train the critic until optimality. On the completion of the training of the critic, it simply provides a loss to the generator that we can train as any other neural network and this puts forward to us that there is no longer any need to balance properly the capacity of generator and discriminator. The higher quality of gradients that can be used to train the generator depends on how better the critic turns out to be. It is observed that WGANs are much more robust than GANs when one varies the architectural choices for the generator. this has been illustrated by running experiments on three generator architectures as mentioned:

  • A convolutional DCGAN generator,

  • A convolutional DCGAN generator without batch normalization and with a constant number of filters,

  • A 4-layer ReLU-MLP with 512 hidden units.

The last two of the above are known to perform very poorly when it comes to traditional GANs. The convolutional DCGAN architecture has been kept for the WGAN critic or the GAN discriminator.


The paper introduced to us an algorithm that is deemed as WGAN, what we can call as an alternative to traditional GAN training. In this new model, the paper showed that there can be an improvement in the stability of learning, and we can as well get rid of problems like mode collapse, and provide meaningful learning curves that are useful for debugging as well as hyperparameter searches.

Furthermore, this paper also put forward to us the corresponding optimization problem being sound, and provision of extensive theoretical work that highlights the deep connections to other distances between distributions.

For more information and insight to the paper, one can go through the link mentioned below.

Link To The PDF: Click Here

For More Information: GitHub