Cloudera to Accelerate Machine learning through Cloudera Data Science Workbench

July 13, 2017, 1:57 p.m. By: Vishakha Jha

Data Science Workbench

Machine learning is all about the data, it allows software applications to turn into more accurate in estimating outcomes without being explicitly programmed but at times the data is out of reach. Cloudera Data Science Workbench allows fast, light, and secure self-service data science allowing to work on new big data and machine learning projects. Cloudera Inc. is a United States-based software company providing Apache Hadoop-based software and services and also trains the business customers. Initially, they had acquired a startup, Sense.io, which worked towards improving and establishing the experience of data scientists on Cloudera's enterprise platform for machine learning and advanced analytics. The result of this procurement and subsequent development is Cloudera Data Science Workbench.

Cloudera Data Science Workbench is basically a web application which allows data scientists to work on open source libraries and languages in secure environments which result in accelerating analytics projects from exploration to production. It supports languages such as R, Python, and Scala. It can also be combined with the Apache Spark deep learning library BigDL which will help data scientists to use big data tactics without additional hardware investments.

There are several other benefits including bringing Data Science to Hadoop which enables easy access HDFS data and to use Hadoop engines. It gives a self-service collaborative platform which allows accessing of Python, R, and Scala from your web browser and customization and reuse of analytic project environments. It also provides Enterprise-Ready Technology which leads to Self-service analytics for enterprise business teams, deployment on-premises or in the cloud. Ensuring security and compliance by default.

It has been built through container technology, who offers data science teams and reproducibility, including easier collaboration. It supports complete validation and controls accessing data in the cluster. It allows installation of any library or framework in isolated project environments and provides secure clusters data access with Spark and Impala. Automation and monitoring of data pipelines using built-in job scheduling can also take place.

At the 2017 Strata + Hadoop World in San Jose, CA, Cloudera Data Science Workbench was released in beta version. It has been included under the umbrella by many organisations which have been using it for statistical research. As it stipulates a collaborative, shareable project environment making sure that diverse data science teams can work collectively for standard, reproducible research.

More Information: Cloudera Data Science Workbench