Simple PyTorch implementation of GANimation (ECCV 2018 Oral)

July 30, 2018, 11:16 p.m. By: Kirti Bakshi


For the task of facial expression synthesis, recent advances in Generative Adversarial Networks (GANs) have shown impressive results and the most successful architecture of them being StarGAN that conditions GAN's generation process with images of a specific domain. This approach even if effective, can only generate a discrete number of expressions, determined by the content of the dataset.

To look into this limitation, in this paper, you are introduced to a novel GAN conditioning scheme based on Action Units (AU) annotations, which in a continuous manifold describes the anatomical facial movements that define a human expression. This approach allows controlling the magnitude of activation of each AU and combines several of them.

Also, to train the model, they propose a fully unsupervised strategy, that only requires images annotated with their activated AUs, and exploit attention mechanisms that to changing backgrounds and lighting conditions makes their network robust.


From a single image, the ability to automatically animate the facial expression would in different areas open the door to many new applications. As GAN's have become more prevalent, this task has experienced significant advances, with architectures such as StarGAN, which is able not only to synthesize novel expressions but also to change other attributes of the face.

But despite its generality, the architecture can only generate a discrete number of expressions, determined by the content of the dataset.

However, Facial expressions are the result of the combined and coordinated action of facial muscles that in a discrete and low number of classes cannot be categorized. The Facial Action Coding System (FACS) was developed by Ekman and Friesen for describing facial expressions in terms of Action Units (AUs), which are anatomically related to the contractions of specific facial muscles.

What is the aim of the paper?

In this paper, the main aim is at building with a great level of expressiveness of Facial Action Coding System (FACS) a model for synthetic facial animation, and without the need of obtaining any facial landmarks being able to generate in a continuous domain aware expressions anatomically.

As a result, they build an anatomically coherent facial expression synthesis method, that in a continuous domain is well able to render images, and which can handle images in the wild with both complex backgrounds as well as illumination conditions.

Related Work:

  • Generative Adversarial Networks GANs

  • Conditional GANs.

  • Unpaired Image-to-Image Translation.

  • Face Image Manipulation.

For more info regarding these and more, one can go through the link to the PDF mentioned in the end.

The conclusion of the Paper:

In this paper, you have been presented a novel GAN model for face animation in the wild that in a fully unsupervised manner can be well trained. It, so far, advances the current works which had only addressed the problem for discrete emotions category editing and portrait images.

The model introduced in this paper by means of AUs encodes anatomically consistent face deformations parameterized. Conditioning the GAN model on these AUs by simple interpolation allows the generator to render expressions of wide ranges. Additionally, they within the network also embed an attention model which allows focusing only on those regions of the image relevant for every specific expression.

By doing this, even with distracting backgrounds and illumination artifacts, it was easier to process images in the wild. In answer to the same, the results turn out to be very promising and show transitions between different expressions smoothly. This opens to them the possibility of applying the approach to video sequences, which is on their list in the future.

More Information: GitHub

Link to the PDF: Click Here