Deep Reinforcement Learning (DeepRL) has undergone a great deal of improvement and gained a lot of success in a range of tasks, from continuous control problems in the area of robotics to games like Go and Atari. Reinforcement Learning is the area of ML inspired by behaviourist psychology that deals with the decision making regarding actions to be performed by software agents with the aim to maximize the aspect of cumulative reward. The enhancement seen in the DeepRL domain has been restricted to individual tasks in which a separate agent has been tuned and trained for each task.
It deals with solving a collection of tasks through a single reinforcement learning agent with a single set of parameters. The major challenge is to manage the increased amount of data and extended training time. To confront this challenge DMLab-30 provides a set of new tasks that span a variety of challenges in a visually unified environment with a communal action space. A well-trained agent is expected to have massive throughput and efficient utilisation of data point. To serve this purpose we have come up with a distributed agent called IMPALA (Importances Weighted Actor-Learner Architectures) that encompasses learning with a new off-policy correction procedure called V-trace. IMPALA has the ability to scale over thousands of machines and attain a throughput rate of 250,000 frames per second. V-trace provides us with a balanced and stable learning even at a high throughput by collaborating learning with a novel off-policy and decoupled acting. The results of all the experiments depict that IMPALA is capable of obtaining improved performance in comparison to prior agents.
DMLab-30 is a collection of new levels for executing a number of tasks in individual or multi-tasking manner. It is designed using DeepMind Lab, open source RL environment. DeepMind Lab is basically a 3D learning environment that enables us to perform a number of challenging puzzle-solving tasks and 3D navigation for learning agents based on id Software's Quake III Arena via ioquake3 and other open source software. The main purpose of DMLab is that it acts as a testbed for research in artificial intelligence, especially deep reinforcement learning. They vary in the goals they aim, from learning to memory, to navigation. They differ visually, from brightly coloured, modern-styled texture, to weary brown and greens of a desert at different times of the day. And they contain physically different visuals, as well as some of the environments, also includes ‘bots', with their own, internal, goal-oriented behaviours.
IMPALA is a new distributed agent that emphasises on maximising data throughput using a distributed architecture with TensorFlow. It is inspired by A3C architecture that uses multiple distributed actors to learn the agent's parameters. IMPALA has the ability to scale thousands of machines without compromising data efficiency or training stability. Unlike A3C-based agents, IMPALA actors interact trajectories of experience to a centralised learner. The decoupled architecture can achieve very high throughput and it is extensive data efficient as it makes use of deeper neural networks. It can be implemented using both single learner machine and multiple learners performing synchronous updates between themselves.
The distinction between learning and acting has its own advantage because of the increase in the throughput of the whole system as the actors don't have to wait for the learning step. This provides us with the flexibility to train IMPALA in interesting environments without wastage of time. However, the decoupling between acting and learning leads to the policy in the actor to lag behind the learner. To compensate for this difference we have come-up with a principled off-policy called V-trace which settles the trajectories obtained by actors being off policy. It provides a very simple yet scalable and robust framework for building better Deep-RL agents and has the potential to enable research on new challenges.
Read the full IMPALA paper: here.