Insight into Apache PredictionIO

Sept. 2, 2017, 9:21 p.m. By: Vishakha Jha

PredictionIO

Apache PredictionIO (incubating) is a Machine Learning Server, built for developers and data scientist. It is built on open source stack which aims to create predictive engines for any machine learning task. Basically, it comprises of three main components.

  • PredictionIO platform which is responsible for constructing, assessing and deploying the engines through machine learning algorithms.

  • Template Gallery provides you a platform to download different engine templates according to preferred machine learning application.

  • Event Server is another component which helps in combining the events from a different platform by continuously collecting data from the application in real-time. After collecting the data it provides data to engines for prediction and evaluation. It is then deployed as a web service and later provides an integrated view for data assessment and analysis.

The Apache PredictionIO requires following criteria as minimum configuration

  • Apache Hadoop 2.6.5 (needed for YARN and HDFS)

  • Apache Spark 1.6.3 for Hadoop 2.6

  • Java SE Development Kit 8

  • One of the following: PostgreSQL 9.1/ MySQL 5.1/ Apache HBase 0.98.5 and Elasticsearch 1.7.6

Apache PredictionIO helps in developing and deploying an engine with different templates in form of web service. It is built on the different open source services like Elasticsearch, HBase (and other DBs), Spark, Hadoop. While implementing Lambda Architecture i.e. is data processing architecture designed to manage a large amount of data. Some of its main features include-

  • Simplification of data infrastructure management and implements your ML model, integrating into your machine.

  • Allows estimating and assessing variants of engine systematically.

  • Acknowledges and responds to real-time dynamic queries after the deployment. Along with that, it aims to enhance the speed of machine learning modeling by pre-built evaluation measures and systematic processes.

The PredictionIO supports libraries such as Spark MLLib and OpenNLP for machine learning and data processing. Spark MLLib contains logistic regression algorithms and support vector machines, Gaussian mix models; least squares techniques; Bayesian regression tree models; analysis of K-means clustering, etc.

Apache PredictionIO provides a great open source platform and is widely accepted. It has been demonstrating itself to be exceptionally utilitarian and provides their users with constant support. You can learn more customization and implementation of the engine along with other important information here.

Github Link : PredictionIO