Things to keep in mind while working with Decision Trees
Anyone working in the field of Machine Learning knows a thing or two about Decision Trees. It’s a kind of supervised learning algorithm with a pre defined target variable mostly used in classification problems, and can work for both categorical and continuous variables. The sample data is split into two or more homogeneous sets based on the most significant differentiator in the input variables.
Regression vs Classification Trees
Main features which differentiates regression trees from classification trees are as follows.
Regression Trees: Regression Trees are the types of Decision Trees which mostly work on input variables which are continuous. In these, the terminal node value is the mean response of observations in the training set, which fall in that particular region. Any prediction of an unseen data is hence predicted by the mean value.
Classification Trees: Classification Trees Learning Model, unlike the Regression counterparts, works with the categorically dependent input variables. In this case, the value or category obtained by the terminal node is the mode of the observations of the training set falling in that region. So any prediction of the unseen data is hence predicted by the mode value.
However, due to the fact that both of the types fall under Decision Trees, they both follow a top down greedy approach, also known as Recursive Binary Splitting. They do this by dividing the predictor space (or the independent variables), into distinct and non overlapping regions. The splitting process is continued until a user defined stopping criteria is reached. And in both cases, the splitting process result in fully grown trees until the stopping criteria is reached. This increases the likelihood of over fitting the data.
Avoiding Over fitting in Decision Trees
Over fitting is one of the key challenges while modeling any Machine Learning Algorithm. If we set no limit set on a decision tree, it will be 100% accurate on training set because in the worst case, the model will end up making 1 leaf for each observation. Over fitting is thus pivotal for an efficient working model, and this can be done in two ways
1: Setting constraints of tree size
Constraints on the tree size can be put by changing the innate parameters of the model. These are

Minimum samples for a node split: This defines the minimum number of samples required in a node to be considered for splitting. Higher values prevent a model to learn its dataset properly and would lead to under fitting

Minimum samples for terminal node: This denotes the minimum number of samples required in a terminal node. Lower values should be chosen in general.

Maximum depth of tree: Higher depths allow model to learn relations very specific to training data, so this also controls overfitting.

Maximum number of terminal nodes: An alternative for Maximum Depth, since number of leaves are 2^n in a complete binary tree of height ‘n’

Maximum features to consider for split: This should generally be selected as the root of total features. Higher values tend to overfit the data.
2: Pruning the tree
Pruning is done to avoid the down effects of the greedy approach towards splitting. Pruning helps focus on future, unlike the greedy approach which tends to favour the present scenario. Pruning is normally done from the bottom of the leaves which give negative returns wrt to the top ones and the features pruned are merged with the parent nodes.
How effective are Tree Based Models compared to Linear Models?
The question finally boils down to the type of problems we have in front of us. The following factors will help gauge the problems in a better way

If the dependent and independent variables’ weights are well approximated by a linear model, linear regression would certainly outperform the tree based. But if there is high degree of non linearity and complex relationship between the dependent and independent variables, tree models are preferred.

Decision tree models are, at any point of time, easier to explain, visualize and interpret than a linear regression.
Hope this article comes to some help while using Decision Trees the next time.