Discover Extreme Gardient Boosting(XGBoost) for Applied Machine Learning

Oct. 31, 2017, 9:33 p.m. By: Vishakha Jha


XGBoost is an algorithm that has claimed its onset by dominating the applied machine learning and Kaggle competitions for structured or tabular data. It is a highly optimized distributed gradient boosting software library designed to be explicitly portable, efficient and flexible. The very basis of its purpose is to enhance speed and performance dramatically through parallel tree boosting.

The XGBoost library implements the gradient boosting decision tree algorithm. Boosting is an approach where we add new models to rectify the errors made by existing models. Models are attached in sequentially order until no further advancements can be made. Gradient boosting is a technique in which new models are developed that predict the residuals or errors of existing models. Then added together to calculate the final prediction.

One of its significant features is the same code runs on the major distributed environment and has the ability to solve problems beyond billions of cases. XGBoost is really fast in terms of execution when compared to other executions of gradient boosting. It also overtops tabular datasets or structured on the classification as well as regression predictive modeling problems.

XGBoost works with the following main interfaces:

  • C++, Java and JVM languages.

  • Julia.

  • Command Line Interface.

  • Python interface along with integrated model in scikit-learn.

  • R interface as well as a model in the caret package.

Some of the key features that XGBoost provided:

  • Portability: Compatibility with Linux, Windows and OS X, as well as various cloud Platforms.

  • Flexibility: It supports classification, regression, ranking and user-defined objectives.

  • Performance: It claims to provide the best performance with the limited set of available resources due to its well-optimized backend system.

  • Distributed on Cloud: Supports distributed training on multiple machines, including GCE, AWS, Azure, and Yarn clusters. It can also be collaborated with Flink, Spark and other cloud dataflow systems.

  • Supports Multiple Languages


Model Features and System Features of Library

The implementation of the model supports the variety of features of the scikit-learn and R implementations along with new additional features. The important features include Gradient Boosting algorithm, Regularized and Stochastic Gradient Boosting, Out-of-Core and distributed computing, Parallelization of construction and cache optimization of data structures and algorithm.

Comparision of XGBoost and GBM

Even though XGBoost provides us with such magnificent scale of features, Gradient Boosting Machine has broader application and certain advantages over the former and vice-versa.

  • For each iteration, both GBM and XGBoost have to compute gradient. But XGBoost also needs to calculate hessian, which leads to an overhead.

  • Another major advantage that lies with GBM is that it only requires a differentiable loss function, thus it can be used in more applications.

  • Looking at the brighter side of XGBoost, it is faster.

  • The difference also lies in terms of comparing the weights that are calculated by XGBoost and GBM.

  • XGBoost considers weight to be the sum of gradients scaled by the sum of Hessians whereas, for GBM, it is simply the average value of the gradients.

  • Comparing both in terms of regularisation point we can conclude that XGBoost provides more points.

  • XGBoost provides more randomness as compared to GBM due to two levels of column sampling.

XGBoost is a library providing high and fast performance gradient boosting tree model. It has been widely accepted and used, achieving the best level of performance according to the range of difficult machine learning tasks. One can easily download and install this software and later access it through a variety of interfaces. The software has the potential to take us into a new direction of work efficient model.

Github Link: XGBoost

Trevor Hastie - Gradient Boosting Machine Learning:

Video Source: