With the tremendous increase in software innovation in past few years, the world has become more reachable and accessible to humans. The libraries available have been unrolled from exclusive specialized codes to multifaceted marginal support for sparse models. Netflix which is an American Entertainment company providing streaming media, on-demand video and DVD by email to its customers. At Netflix, the scientists deal with a large set of problems regarding the concept of machine learning in tailoring TV and movie recommendation according to your area of interest through an encoding algorithm.
A part of the problem may reach around tens of millions of features including a number of non-zero entries because of such cases there was a need for a minimalist library for sparse data in a single-machine which is particularly merged for training shallow feedforward neural nets.This led to the Vectorflow which is easy to work and is one of the many machine learning tools used by scientists at Netflix.The library works on sparse data and is distributed as dub package which is a D package registry. It basically includes vectorflow to dub.jason dependency section. The library doesn't have dependencies and only requires D compiler. LDC is proposed compiler for fastest runtime speed.
There are a number of factors which have been included while designing of library to fetch better results. The Designing consideration includes:
As vectorflow is written in D the library runs easily and iterate on their models in complete self-governance. Deployment is also easy to carry forward due to its lack of third-party dependency.
It offers a great experience to its users with typically multiple orders of magnitude of performance gain because of fast compilers and functional programming features.
It provides developers to hold templating engine, compile-time functionalities, and lower-level features.
Dispenses callback-based API which provides plug-in custom loss functions for training.
- Vectorflow imposes relatively loose requirements on the data schema due to which one can write efficient data adapters avoiding pre-processing. This allows an efficient way through which you move the code to the data.
Distributed systems are difficult to debug and adds fixed these shortcomings led to an efficient solution in a single machine setting.
The scientist moved towards generic asynchronous SGD solvers through Hogwild as a lock-free strategy. As the whole work is being done in a non-distributed case from a user perspective it restricts the approach of a distributed aspect of the algorithm.
Vectorflow avoids duplicating or allotting memory during both the forward and backward passes.
Matrix-vector carries both sparse and dense execution, the latter ones being SIMD-vectorized.
It even provides a way to execute sparse backpropagation while dealing with sparse output gradients.
After the inception of the project, it has been taken up in a wide variety of fields including causal inference, density estimation, ranking algorithms or survival regression for the recommendation. It is also being tested to power part of the Netflix homepage experience. It comes with a default toolbox installed on basic instances used by Netflix machine learning practitioners. And is all set to provide a better functionalities which focuses more on enhancing user experience. Further, the research is being done to provide more specialized layers and indulges in more specific parallelism strategies.
More Information: GitHub