Requests For Research 2.0: A Release by Open AI
A nonprofit AI research company, OpenAI, basically, is now, to its list is releasing a new batch of seven unsolved problems which have come up in the course of their research at OpenAI. Very similar to their original Requests for Research which resulted in the upbringing of several papers, the company expects these problems for new people to enter the field to be a fun and a meaningful way to do the same, as well as to hone the skills for practitioners. Not to forget that is also is a great way to get a job at OpenAI that aims at enacting and discovering the path to safe general artificial intelligence.
Also, If one is not sure where to begin, they also have some solved starter problems.
Requests for Research:
1. Slitherin’:
Implementation and solving of a multiplayer clone of the classic Snake game as a Gym environment.

Environment: Start with two snakes, and scale from there and then with multiple snakes have a reasonably large field; snakes grow when eating randomlyappearing fruit; a snake dies when colliding with another snake, itself, or the wall; and the game ends when all snakes die.

Agent: solve the environment using selfplay with an RL algorithm of your choice.

Inspect the learned behavior: does the agent learn to competently pursue food and avoid other snakes? Does the agent learn to attack, trap, or gang up against the competing snakes?
2. Parameter Averaging in Distributed RL:
On sample complexity and amount of communication in RL algorithms, Explore the effect of parameter averaging schemes. While the simplest solution is to average the gradients from every worker on every update. The usage of other algorithms like EASGD bring another possibility and bring parameters partly together in each update.
3. Transfer Learning Between Different Games via Generative Models:

For 11 Atari games, bring to train 11 good policies and for each game Generate 10,000 trajectories of 1,000 steps each from the policy.

To the trajectories that are produced by 10 of the games, fit a generative model.

Then finetune that model on the 11th game.

The man goal here is to quantify the benefit from pretraining on the 10 games.
4. Transformers with Linear Attention:
The Transformer model makes the use of soft attention with softmax.
Your goal:

Train a transformer;

Find a way without increasing the total number of parameters by much to get the same bits per character/word using a linearattention transformer with different hyperparameters.

One proviso to take look at that may turn out to be impossible.

But potentially one helpful hint: it is likely that transformers with linear attention require much higher dimensional key/value vectors compared to attention that uses the softmax, which can be done without significantly increasing the number of parameters.
5. Learned data augmentation:

In order to perform Learned Data Augmentation, one could use a learned VAE of data,

One will first train a VAE on input data,

Then each training point would be transformed by encoding to a latent space,

Then applying a simple perturbation in latent space,

And finally decoding back to observed space.
The ability that it could include many nonlinear transformations like changes in scene lightning and viewpoint changes adds up as one potential benefit of such data augmentation.
6. Regularization in Reinforcement Learning:
Experimentally and qualitatively investigate the effect of different regularization methods on an RL algorithm of choice.
Regularization, in supervised deep learning, with very successful methods like dropout, batch normalization, and L2 regularization is extremely important for the improvement, optimization and for preventing overfitting. Incidentally, people generally use much smaller models in RL than in supervised learning, as large models perform worse and perhaps because they overfit to recent experience.
7. Automated Solutions of Olympiad Inequality Problems:
Problems related to Olympiad inequality are simple to express, but solving them often requires clever manipulations.

Build a dataset of olympiad inequality problems and write a program that can solve a large fraction of them.

It’s not clear whether machine learning will be useful here, but one could potentially use a learned policy to reduce the branching factor.
For More Information: Click Here