Enriching Word Vectors With Subword Information

July 7, 2018, 8:45 p.m. By: Kirti Bakshi


The representations of words continuously, when trained on large unlabeled corpora are known o be very useful for many tasks related to Natural Language Processing (NLP). Popular models that learn such representations, ignore the morphology of words by assigning a distinct vector to each word.

A new approach based on the Skipgram Model in this paper has been proposed, where each word has been represented as a bag of character n-grams. There is an association of vector representation to each character n-gram; Hence, words are represented as the sum of these representations.

This fast method allows training of models on large corpora very quickly and for words that did not appear in the training data also allows us to compute word representations. The word representations are evaluated on nine different languages, both on word similarity as well as analogy tasks.

Vectors on these tasks then achieve state-of-the-art performance by comparing to morphological word representations that have been recently proposed.


In Natural Language Processing (NLP)](https://en.wikipedia.org/wiki/Natural_language_processing) learning continuous representations of words has a very long history. These representations using co-occurrence statistics are typically derived from large unlabeled corpora. The properties of these methods have been studied by a large body of work, known as distributional semantics.

In this paper, they propose to learn representations for character n-grams, as well as to represent words as the sum of the n-gram vectors. The main contribution of the paper is to put forward an extension of the continuous skipgram model, where the subword information has been taken into account. They then evaluate this model on nine languages that further exhibit different morphologies, hence showing the benefit of their approach.

Related work:

  • Morphological word representations.

  • Character level features for NLP.


In this section of the paper, they propose their model to while taking into account morphology learn word representations. By the consideration of subword units, and representation of words by a sum of its character n-grams they then model morphology. After which, as a beginning, the general framework that they used in order to train word vectors will be presented, and then eventually also present their subword model and describe how they further handle the dictionary of character n-grams. Given below are the models, to which the brief is in the paper (Link mentioned at the end).

  • General model

  • Subword model


The model has been evaluated in five different experiments that are mentioned below:

  • An evaluation of word similarity and word analogies,

  • The comparison to state-of-the-art methods,

  • An analysis of the effect of the size of training data

  • And finally of the size of character n-grams that were taken into consideration.

The description of all these experiments is made in the sections that follow that Results.


In this paper, they by taking into account subword information investigate a simple method in order to learn word representations. This approach, that is related to an idea that was introduced by Schütze in the year 1993 mainly incorporates character n-grams into the skipgram model. And because of its simplicity, the introduced model trains pretty fast and does not require any preprocessing or supervision as such.

The paper further shows that the model outperforms baselines in which the subword information is not taken into account, as well as all the methods that rely on morphological analysis. And in order to further facilitate comparison of future work on learning subword representations they will also soon open source the implementation of their model.

Cover Picture Reference and Source: GitHub

Link To PDF: Click Here