Presenting to you the launch of a transfer learning contest that measures the ability a reinforcement learning algorithm to generalize from its previous experience. When it comes to typical RL research, algorithms are tested in the same environment where they were trained, which favours algorithms which are good at memorization and have a number of hyperparameters. Instead, this contest puts to test an algorithm on video game levels that are previously unseen. This contest uses a new platform that integrates classic games into Gym, starting with 30 SEGA Genesis games: Gym Retro.
This transfer-learning contest is being held using the Sonic The Hedgehog™ series of games for SEGA Genesis as mentioned before. In this contest, participants — without having access to those levels during development try to create the best agent for playing custom levels of the Sonic games.
Taking a look at how the contest works:
Train or script your agent to play Sonic The Hedgehog™
Submit your agent to us as a Docker container
We evaluate your agent on a set of secret test levels
Your agent's score appears on the leaderboard
It is believed that to leverage past experience to quickly learn new environments is the next step for reinforcement learning. Algorithms that are currently present are very prone to memorization and can't adapt to new situations well. While this contest focuses on video game levels, it is hoped that the techniques that win will be applicable to a wide variety of domains.
More about the contest:
The OpenAI Retro Contest from the Sonic The Hedgehog™ series of games gives you a training set of levels and then your algorithm is evaluated on a test set of custom levels that have been created for this contest. The contest will run for two months that is from April 5th to June 5th. To get people started they are also releasing retro-baselines, which shows how to run several RL algorithms on the contest tasks.
The participant can use any environments or datasets as needed at training time, but at test time you only get about 18 hours on each level that has never been seen before. Even though 18 hours may sound like a long time to play a single game level, but, given this training budget, existing RL algorithms perform far worse than humans.
Only one account per team.
The shortcode snippets or tutorial code can be shared with other teams, but no full or partial solutions can be shared.
Each person is only allowed to be on one team at a time.
There are two award categories, "Best Score" and "Best Writeup". To be eligible to win you your submission as an open source must be released at the end of the contest. 1st, 2nd, and 3rd place winners from each category will receive a trophy. In addition, there will be a single award for "Best Supporting Materials".
All winners will be invited to co-author a tech report with OpenAI about the contest.
There is also a release of a technical report: Gotta Learn Fast: A New Benchmark for Generalization in RL in order to describe the benchmark in detail, as well as provide some baseline results. This report contains details about the benchmark as well as results from running Rainbow DQN, PPO, and a simple random guessing algorithm called JERK. JERK in a way samples random action sequences that are optimized for Sonic, it replays the top-scoring sequence of actions more frequently and as training progresses.
It was found that they, by leveraging experience from the training levels could significantly boost PPO’s performance on the test levels. When the network on the training levels was pre-trained and on the test levels was fine-tuned, its performance nearly doubled, making it better than the strongest alternative baselines. It is exciting as it shows that transfer learning can have a large and reliable effect.
Gym Retro Beta:
Gym Retro is a second generation attempt by them in order to build a large dataset of reinforcement learning environments. It builds on some of the same ideas as Universe from late 2016, but the results achieved were not as good from that implementation because Universe environments ran asynchronously, could only run in real time, and due to screen-based detection of a game state were often unreliable. Gym Retro extends the model of the Arcade Learning Environment to a much larger set of potential games.
Gym Retro was written to be more flexible than RLE even though it was inspired by the Retro Learning Environment; for instance, in Gym Retro the environment definition through JSON files can be specified rather than C++ code, therefore, making the integration of new games much easier.
Hence, there is a release of Gym Retro, a system for wrapping classic video games as RL environments. This preliminary release includes:
30 SEGA Genesis games from the SEGA Mega Drive,
Genesis Classics Steam Bundle
And from the Arcade Learning Environment 62 of the Atari 2600 games.
The Gym Retro Beta makes the utilization of a more modern console than Atari — SEGA Genesis —that results in the expansion of the quantity and complexity of games that are available for RL research.
To see rules for the contest or to get started with it, look at the link below: