Gym: A Toolkit For Developing And Comparing Reinforcement Learning Algorithms.

July 8, 2018, 1:57 p.m. By: Kirti Bakshi

Gym

There is a two-fold purpose of this technical report. First being that it (integrated with OpenAI Gym) introduces a suite of challenging continuous control tasks that are based on currently existing robotics hardware. The tasks include:

  • Push

  • Slide

  • With A Fetch Robotic Arm Pick & Place

  • With A Shadow Dexterous Hand-In-Hand Object Manipulation.

All tasks having sparse binary rewards follow a Multi-Goal Reinforcement Learning (RL) framework in which using an additional input an agent is told what to do. The second part of the technical report presents for improving RL algorithms a set of concrete research ideas, most of which are related to Multi-Goal RL and Hindsight Experience Replay(HER).

Environments:

All environments that use the MuJoCo physics engine for fast and accurate simulation have been released as part of OpenAI Gym. The link to the video that presents the new environments is present in the end.

  • Fetch environments:

The Fetch environments are based on the 7-DoF Fetch robotics arm which possesses a two-fingered parallel gripper. Being very similar to the tasks used in Andrychowicz et al. (2017) there is an additional reaching task and also a bit difference lies in the pick & place task. The goal is 3-dimensional in all Fetch tasks and describes the desired position of the object. Rewards are sparse and binary: The agent obtains a reward of 0 if the object is (within a tolerance of 5 cm) at the target location and −1 otherwise. Actions are 4-dimensional: 3 dimensions in Cartesian coordinates specify the desired gripper movement and the last dimension controls opening and closing of the gripper.

  • Hand Environments:

These environments are based on an anthropomorphic robotic hand with 24 degrees of freedom: The Shadow Dexterous Hand. Of those joints that are 24 in number, 20 can be controlled independently whereas the remaining ones are known to be coupled joints. In all hand tasks, rewards are sparse and binary: If the desired goal has been achieved (within some task-specific tolerance) the agent obtains a reward of 0 and −1 otherwise.

Actions are 20-dimensional: They for all non-coupled joints of the hand make the use of absolute position control. And in the reaching task, also include the Cartesian position of all 5 fingertips.

  • Multi-goal environment interface:

All environments make the use of goals that describe the desired outcome of a task. Let's take, for example, the desired target position in the FetchReach task is described by a 3-dimensional goal. While on one side their environments are fully compatible with the OpenAI Gym API, they to support this new type of environment also slightly extend upon it. All environments extend the newly introduced gym.GoalEnv.

  • Benchmark results:

The performance of DDPG is evaluated with and without Hindsight Experience Replay (HER) on all environments along with all its variants. The following four configurations are then compared:

  • DDPG+HER with sparse rewards

  • DDPG+HER with dense rewards

  • DDPG with sparse rewards

  • DDPG with dense rewards

* Request for Research:

Probably the hardest part of doing research is making the decision about which problem is worth working on. Next, in the paper, they present a set of research problems which they strongly believe can lead to widely-applicable RL improvements. At least one potential solution for each problem is proposed but solving many of them will require inventing new ideas. So then these ideas in order to make tracking the progress of work relatively easier, they when publishing related research, ask authors to cite this report.

For further details to the subtopics below, kindly go through the link mentioned at the end.

  • Automatic hindsight goals generation

  • Unbiased HER

  • HER+HRL

  • Richer value functions

  • Faster information propagation

  • HER + multi-step returns

  • On-policy HER

  • Combine HER with recent improvements in RL

  • RL with very frequent actions

More Information: GitHub

Link To PDF: Click Here

Ingredients for Robotic Research:

Video Source: OpenAI