BERT: State-of-the-Art Pre-training for NLP
Google introduced a brand new open-source pre-training technique in NLP called Bidirectional Encoder Representations from Transformers or simply BERT.
The BERT novel release facilitates anyone across the globe to train their own state-of-the-art question answering system under few hours to 30 minutes with a single Cloud TPU or GPU.
In BERT, the source code is built over TensorFlow and various other pretrained natural language representation models. For demonstrating the capability of this new model, Google implemented it on 11 NLP tasks, including the very robust Stanford Question Answering Dataset (SQuAD v1.1).
How is BERT different from other NLP models?
One of the most significant challenges in training NLP models is the lack of availability in training data. To overcome this, Google superseded with pretrained data. That includes unannotated texts like statements or comments that are non-critical kind with full of information. Say, for example, Wikipedia content. To everyone's surprise, these pretrained models produced more accurate inferences than the already existing Natural Language processing models.
This is how BERT evolved and is standing out; yes, it is built upon pretrained contextual representation that includes Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit.
BERT is deeply bidirectional, which means it can generate a representation of words for both the previous and next contexts. It has established the root from the deep neural network. This is the speciality of BERT.
And this bidirectional representation is achieved by the simple method. First, some of the words in the input are masked out and then implement condition statements for each word bidirectionally to predict the masked words. Although this bidirectional technique is found long back in the timeline, it is deployed first only in this pre-training deep neural network, BERT model.
References: