A Scalable MetaLearning Algorithm released by OpenAI: "Reptile" (Includes An Interactive Tool To Test It OnSite.)
This paper considers problems that are related to Metalearning, where there is a distribution of tasks, and we wish to obtain an agent that when presented with a previously unseen task sampled from this distribution learns quickly. In relation to the same, here is the presentation of Reptile: a remarkably simple metalearning algorithm, which learns a parameter initialization that on a new task can be finetuned quickly.
This simple metalearning algorithm called Reptile works by sampling a task repeatedly, performing stochastic gradient descent on it, and on that task updating the initial parameters towards the final parameters that were learned. This method performs as well as a metalearning algorithm called MAML that is broadly applicable while being more computationally efficient as well as simpler to implement.
But, what is MetaLearning?
Metalearning is the process of learning how to actually learn. A metalearning algorithm (where each task is a learning problem) takes in a distribution of tasks, and in return produces a quick learner that can generalize well from even a small number of examples.
In this paper that has been presented, You are put forward with as to how Reptile performs well on some wellestablished benchmarks for fewshot classification. You are also provided with some theoretical analysis that is mainly aimed at understanding why Reptile works.
How Does Reptile Work?
Reptile works by repeated sampling of a task, training on it, and then moving the initialization towards the trained weights on that task. Reptile doesn’t differentiate through the optimization process making it different from MAML, that as such also learns an initialization, hence, making it more suitable for optimization problems where there is a requirement of many update steps as Reptile simply on each task in a standard way performs stochastic gradient descent (SGD) — it does not unroll a computation graph or calculate any second derivatives. This makes Reptile take less computation as well as memory when compared to a MAML.
To further analyze why Reptile works, the update is approximated making the use of a Taylor series. It is shown that the update of Reptile, when corresponded to improved generalization from the same task maximizes the inner product between gradients of different minibatches. This finding outside of the metalearning setting may have implications for explaining the generalization properties of SGD. But the analysis made here suggests that Reptile and MAML perform a very similar update, including the same two terms with different weights.
Experiments conducted:
In the experiments conducted here, it is shown that Reptile and MAML on the Omniglot and MiniImageNet benchmarks yield similar performance for fewshot classification. Reptile, since the update has lower variance also converges to the solution faster.
Implementations:
Their implementation of Reptile is available on GitHub to which the link is mentioned below. It uses TensorFlow for the computations involved, and also for the replication of the experiments on Omniglot and MiniImageNet includes code. There will also soon be a release of a smaller JavaScript implementation that finetunes a model pretrained with TensorFlow.
Discussion:
In problems that are related to metalearning, it is assumed that to have access to a training set of tasks, which is further used to train a fast learner. They also describe an approach for metalearning, that is surprisingly simple and which works by repeatedly optimizing on a single task, and moving the parameter vector towards the parameters learned on that task. This algorithm performs similarly to MAML, while also being significantly simpler to implement.
You are presented with two theoretical explanations for why Reptile works:

First, by approximating the update with a Taylor series, it is showed that the key leadingorder term matches the gradient from MAML [FAL17]. This term adjusts the initial weights to maximize the dot product between the gradients of different mini batches on the same task—i.e., it encourages the gradients to generalize between mini batches of the same task.

You are also provided with another informal argument, which is that: Reptile finds a point that (in Euclidean distance) is close to all of the optimal solution manifolds of the training tasks.
For More Information(Implementations): Github
Link To The PDF: Click Here