Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

April 7, 2018, 8:30 a.m. By: Kirti Bakshi


A class of vision and graphics problems that we call Image-to-image translation comes with an aim to learn the mapping between an input image and an output image making the use of a training set of aligned image pairs. For many tasks, However, it is not important that paired training data will be available. Here in the absence of paired examples, you are presented with an approach for learning to translate an image from a source domain X to a target domain Y. The ultimate goal here making the use of an adversarial loss is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y.

Since this mapping is highly under-constrained, it is coupled with F: Y → X (an inverse mapping) and introduced a cycle consistency loss to enforce F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, that include season transfer, collection style transfer, photo enhancement, object transfiguration, etc. Quantitative comparisons against several prior methods demonstrate the superiority of this approach.

A deeper look at the paper:

What did Claude Monet see as he placed his easel by the bank of the Seine near Argenteuil on a lovely spring day in 1873? A colour photograph, had it been invented, may have documented a crisp blue sky and a glassy river reflecting it. Monet related to this same scene conveyed his impression through a bright palette and wispy brush strokes. A brief stroll through a gallery of Monet paintings makes it possible to imagine how he would have rendered the scene: perhaps in pastel shades, with abrupt dabs of paint, and a somewhat flattened dynamic range.

It can be imagined that despite all this and never having seen a side by side example of a Monet painting next to a photo of the scene he painted. Instead, we have knowledge of the set of landscape photographs and the set of Monet paintings. We can thereby come to reason about the stylistic differences between these two sets and imagine what a scene might look like if we were to “translate” it from one set into the other. In this paper, it is presented to you a method that can learn to do the same all in the absence of any paired training examples: figuring out how these characteristics could be translated into the other image collection and capturing special characteristics of one image collection.

More broadly, this problem can be described as image-to-image translation, converting an image from one representation of a given scene, x, to another, y. Years of research in computer vision, image processing, computational photography, and graphics have produced powerful translation systems in the supervised setting, where example image pairs {x, y} are available.

However, obtaining paired training data can be difficult and expensive. For example, only a couple of datasets exist for tasks like semantic segmentation, and they are relatively small. Obtaining input-output pairs for graphics tasks like artistic stylization can be even more difficult since the desired output is highly complex, typically requiring artistic authoring. For many tasks, like object transfiguration, the desired output is not even well-defined.

We, therefore, in this paper seek an algorithm that can learn to translate between domains without paired input-output examples.

Related work:

  • Generative Adversarial Networks (GANs)

  • Image-to-Image Translation

  • Unpaired Image-to-Image Translation

  • Cycle Consistency

  • Neural Style Transfer


Their method has been demonstrated in several applications where paired training data does not exist. You can refer to the appendix (Section 7) of the paper for more details about the datasets. They observe that translations on training data are often more appealing than those on test data, and full results of all applications on both training and test data can be viewed on their project website.

  • Collection style transfer

  • Object transfiguration

  • Photo enhancement

  • Comparison with Gatys

Limitations and Discussion:

  • Although this method can achieve compelling results in many cases, the results are far from uniformly positive. Several typical failure cases are shown.

  • On translation tasks that involve changes in colour and texture, like many of those that are reported above, the method often succeeds. They have also explored tasks with little success that require geometric changes.

  • Some failure cases are caused by the distribution characteristics of the training datasets.

  • It is also observed that there is a lingering gap between the results that are achieved by this unpaired method and those that are achievable with paired training data. In some cases, this gap may be very hard – or even impossible – to close. To resolve this ambiguity may require some form of weak semantic supervision.

  • Integrating semi-supervised or weak data may lead to substantially more powerful translators, still at a fraction of the fully-supervised systems when it comes to the annotation cost. In many cases, completely unpaired data that is plentifully available should be made use of. This paper thus proves to push the boundaries of what is possible in this “unsupervised” setting.

For a deeper insight to the paper, one can go through the link mentioned below.

For More Information: GitHub

Link To The PDF: Click Here