SpaCy 2.0 Released- Natural Language Processing With Python

Nov. 11, 2017, 7:44 a.m. By: Kirti Bakshi

 SpaCy 2.0

SpaCy, that has been built on the very latest research, and was designed from the very start to be used in real products is a library for advanced Natural Language Processing in Python and Cython. It comes with the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and for easy as well as deep learning integration.

A few of Its Features Include:

  • Fastest syntactic parser in the world

  • Named entity recognition

  • Non-destructive tokenization

  • Support for 20+ languages

  • Easy deep learning integration

  • Part-of-speech tagging

  • Labelled dependency parsing

  • Syntax-driven sentence segmentation

  • Built-in visualizers for syntax and NER

  • State-of-the-art speed

  • Robust, rigorously evaluated accuracy

And Now, comes SpaCy v2.0 with new features and improvements!

This release of SpaCy features entirely new deep learning-powered models for spaCy's entity recognizer tagger, and parser. The new models are 10× smaller, 20% more accurate than before and even cheaper to run as compared to the previous generation.

"There are also several usability improvements that are helpful particularly for production deployments. spaCy v2 makes it easier to use spaCy with Apache Spark as it now fully supports the Pickle protocol, It is also now easy to reconcile annotations made in different processes as String-to-integer mapping is no longer stateful. Models now use less memory and are smaller, and the APIs for serialization is now much more consistent as well. Custom pipeline components let you modify the Doc at any stage in the pipeline. You can now also add your own custom properties, attributes, and methods to the Doc, Token, and Span." Says the team.

The new and improved features of the latest version include:

  • Convolutional neural network models

  • Improved processing pipelines

  • Text classification

  • Hash values instead of integer IDs

  • Improved word vectors support

  • Saving, loading, and serialization

  • displaCy visualizer with Jupyter support

  • Improved language data and lazy loading

  • phrase matcher and Revised matcher API

You'll also notice that the main usability improvements in spaCy v2.0 are basically revolved around training, defining, and loading your own models as well as components. The new neural network models make it comparatively much easier to update an existing model with a few examples or train a model from scratch.

SpaCy v2.0 now comes with 13 new convolutional neural network models for more than 7 languages that have been designed and implemented from scratch specifically for spaCy.

As the statistical models learn new vocabulary items they never change size due to some clever use of hashing and also The whole pipeline is now completely differentiable. You can now also update spaCy using all the latest deep learning tricks like adversarial training, noise contrastive estimation or reinforcement learning even if you don't have explicitly annotated data!

For More Information: GitHub