DeepMimic: Example-Guided Deep RL of Physics-Based Character Skills

April 14, 2018, 2:25 a.m. By: Kirti Bakshi

DeepMimic

A goal in character animation that has been longstanding is the combination of a data-driven specification of behaviour with a system that can execute a similar behaviour in a physical simulation, and therefore enable responses to perturbations and environmental variation that are realistic. Here, it is shown that reinforcement learning methods that are well-known can be adapted to learn control policies that are robust and capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals.

The method used here handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings. This approach thus to define the desired style and appearance combines the convenience and motion quality of using motion clips, with the flexibility and generality afforded by RL methods and physics-based animation.

They further, in order to develop multi-skilled agents that are capable of performing a rich repertoire of diverse skills explore a number of methods for integrating multiple clips into the learning process. The results are demonstrated using multiple characters and a large variety of skills that include martial arts, acrobatics and locomotion,

RELATED WORK:

In recent years, as machine learning algorithms for control have matured, there has also been an increase in interest in these problems from the machine learning community. Here the focus is on the most closely related work in animation and RL.

  • Kinematic Models

  • Physics-based Models

  • Reinforcement Learning

  • Motion Imitation

OVERVIEW:

This system receives as input a character model, a corresponding set of kinematic reference motions, and a task defined by a reward function. It then synthesizes a controller that, while also satisfying task objectives enables the character to imitate the reference motions.

The final result of the system is a policy that allows a simulated character while also fulfilling the specified task objectives to imitate the behaviours from the reference motions. The policies are modelled making use of neural networks and trained using the proximal policy optimization algorithm.

DISCUSSION AND LIMITATIONS:

You are presented with a data-driven deep reinforcement learning framework for training control policies for simulated characters. It is put forward that this method can produce a broad range of challenging skills. The resulting policies are highly robust and produce natural motions that are nearly indistinguishable from the original motion capture data in the absence of perturbations. The framework is able to retarget skills to a variety of characters, environments, and tasks, and multiple policies can be combined into composite policies capable of executing multiple skills.

First of all, the policies require the synchronization of a phase variable with the reference motion, which advances linearly with time. This limits the ability of the policy to adjust the timing of the motion, and lifting this limitation could produce more natural and flexible perturbation recoveries. The multi-clip integration approach even though has not yet been demonstrated on large motion libraries, works well for small numbers of clips.

The PD controllers, that for the characters are used as the low-level servos, still require some insight in order to set properly for each individual character morphology. The learning process itself is also quite time-consuming, often requiring several days per skill, and is performed independently for each policy. Although the same imitation reward across all motions is used, this is still currently based on a state-similarity metric that is manually defined. The relative weighting of the imitation reward and task reward also needs to be defined with some care.

In their future work, it is hoped to understand how on robotic systems the policies might be deployed, as applied to dexterous manipulation, locomotion, and other tasks and also integrate diverse skills that would enable a character to perform more challenging tasks and more complex interactions with their environments. Incorporating hierarchical structure is likely to be beneficial towards this goal.

Link To The PDF: Click Here

SIGGRAPH 2018: DeepMimic paper (supplementary video)

Video Source: Jason Peng