Recently Databricks announced a Serverless Platform for Apache Spark in Spark Summit 2017. The company introduced Deep Learning Pipelines, a library that makes it easy to mix deep learning frameworks with Spark and aims to simplify developer experiences.
Deep learning has been claimed to be unapproachable because of the dependency on separate, low-level frameworks that needs specific skills. These frameworks have another drawback that they do not scale well because they run on a single node. Deep Learning Pipelines offer an open source package that adds high-level, user-friendly deep learning APIs for technologies such as TensorFlow to Apache Spark. It has also made it possible for enterprises to scale deep learning across multiple nodes
Databricks aim is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. It was founded by the team of Apache Spark. It provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products.
The new Deep Learning Pipelines package enables the user to easily call deep learning libraries within existing Spark ML workflows, which exempt them from learning a separate tool. It combines the power of deep learning with Spark's data processing and ML capabilities to perform learning of deep learning models. It enhances production of quality models by integration of Spark's distributed computation engine with TensorFlow and Keras and resulting model can be used directly y SQL analysis. It also works more easily with complex data such as images through a set of Spark-native utilities.
Lastly, Databricks' new open-source library gives the ability to developers to convert deep learning models into SQL functions and democratises access to artificial intelligence by minimizing latency.