Presenting Multitask Learning as Question Answering: The Natural Language Decathlon

July 1, 2018, 8:28 a.m. By: Kirti Bakshi


Individually Deep learning on many Natural Language Processing (NLP) tasks has improved performance. However, general NLP models that focuses on the particularities of a single metric, dataset, and task cannot emerge within the same paradigm. Here you are presented a challenge that is spanned around ten tasks:

  • Question answering,

  • Machine translation,

  • Summarization,

  • Natural Language Inference,

  • Sentiment analysis,

  • Semantic role labeling,

  • Relation extraction,

  • Goal-oriented dialogue,

  • Semantic parsing,

  • Commonsense pronoun resolution

A Challenge: The Natural Language Decathlon (DecaNLP)

Here all the tasks are set as question answering over a context. In addition to it, they also present (MQAN): A New Multitask Question Answering Network that without any task-specific modules or parameters jointly in decaNLP learns all tasks.

A Deeper Insight:

In order to explore models that generalize to many different kinds of NLP tasks, you are introduced to the Natural Language Decathlon (decaNLP) that encourages a single model to simultaneously optimize for ten tasks as mentioned before. All the tasks by allowing task specification to take the form of a natural language question are framed as question answering: All the inputs have a context, question, as well as an answer.

You are also provided with a set of baselines for decaNLP that with the help of pointer networks combine the basics of sequence-to-sequence learning, question answering, attention networks, advanced attention mechanisms, as well as curriculum learning.

They also design Multitask Question Answering Network (MQAN) for decaNLP and to multitask across all tasks in decaNLP make use of a novel dual coattention and multi-pointer-generator decoder. The results obtained demonstrate that with the right anti-curriculum strategy training the MQAN jointly on all tasks can achieve performance that can be compared to that often separate MQANs, each trained separately.

A MQAN that has been pretrained on decaNLP shows improvements in transfer learning for machine translation and named entity recognition, zero-shot capabilities for text classification, domain adaptation for sentiment analysis and natural language inference and MQAN proves to be a great model in the single-task setting as well even if it is not designed explicitly for anyone task it is still on the semantic parsing component of decaNLP putting forward its state-of-the-art results.

They through a leaderboard based on decathlon scores (decaScore) have released code1 for obtaining and preprocessing datasets, training and evaluating models, and tracking progress. They further hope that the combination of these resources will facilitate research in a number of areas.

Related Work

  • Transfer Learning in NLP.

  • Multitask Learning in NLP.

  • Optimization and Catastrophic Forgetting.

  • Meta-Learning.


You are introduced to a new benchmark for measuring the performance of NLP models across ten tasks that until unified as question answering appear disparate: The Natural Language Decathlon (decaNLP). You are a network called MQAN, a model for general question answering that to capitalize on questions as natural language descriptions of tasks uses a multi-pointer-generator decoder.

Despite not having any task-specific modules, we trained MQAN on all decaNLP tasks jointly, and we showed that anti-curriculum learning gave further improvements. After training on decaNLP, MQAN exhibits transfer learning and zero-shot capabilities. When used as pretrained weights, MQAN improved performance on new tasks. It also demonstrated zero-shot domain adaptation capabilities on text classification from new domains.

It is also expected that the experimental results, decaNLP benchmark, and the code that is made publicly available will further encourage research into general models for NLP.

More Information: GitHub

PDF: Click Here