Data Distillation: Towards Omni-Supervised Learning

Dec. 17, 2017, 4:01 a.m. By: Kirti Bakshi

Data Distillation

Data Distillation: Towards Omni-Supervised Learning is a paper that is related to the investigation of Omni-supervised learning, a paradigm in which the learner gets to exploit as much well-annotated data as possible and is also provided with the benefit of potentially unlimited unlabeled data, for example: from internet-scale sources. We can, in short, say that it is a special regime that is related to semi-supervised learning.

However, as we know, most of the research that has taken place on semi-supervised learning has resulted in the simulation of labelled/unlabeled data by the splitting of a fully annotated dataset and is therefore very likely to also be upper-bounded by fully supervised learning along with all annotations. On the contrary, Omni-supervised learning is also lower-bounded by the accuracy of training on all annotated data, and its success can be evaluated on the basis of how much it surpasses the fully supervised baseline.

And in the same context, in order to tackle Omni-supervised learning, there is a proposal to perform knowledge distillation from data, that has been inspired by which it has then been performed knowledge distillation from models. The main idea lies in the generation of annotations on unlabeled data with the use of a model that has been trained on large amounts of labeled data, and then retrain the model with the use of extra generated annotations.

Moving onto what Data Distillation is, Data distillation is a simple and natural approach that is mainly based on “self-training”, that means making predictions on unlabeled data and then also using them to update the model, related to which there have been continuous efforts that date back to the 1960s, if not any earlier.

The Abstract of the paper can be summed up to:

This paper as mentioned before is related to the investigation of Omni-supervised learning, that is a special regime of semi-supervised learning in which the learner gets to exploit all available labelled data along with internet-scale sources of unlabeled data. Omni-supervised learning also finds its lower bound by the performance on existing labelled datasets, thus offering the potential to surpass methods that are fully supervised. To exploit the Omni-supervised setting, there is a proposal of data distillation, a method that ensembles predictions from multiple transformations of unlabeled data, using a single model, in order to automatically generate new training annotations. There is an argument that visual recognition models have recently become accurate enough that it is now possible to apply classic ideas about self-training to challenging data of real-world. Also, to add to the same, the experimental results show that in the cases of human key point detection as well as general object detection, state-of-the-art models trained with data distillation surpass the performance of using labeled data from the COCO dataset alone.

Moving towards the end, the paper can, therefore, be Concluded as:

The paper shows that it is possible to surpass large-scale supervised learning with the help of Omni-supervised learning, that is, using all available supervised data together with large amounts of unlabeled data. The same is achieved by the application of data distillation to the challenging problems of COCO object as well as key point detection.

It is, therefore, hoped that the work will attract more attention to this practical, large-scale setting.

Link To The PDF: Click Here