Getting started with XGBoost : A Beginner’s Introduction to High-Performance Machine Learning | by Aditya Kumar Manethia | Nov, 2024


Let’s dive into how XGBoost works to address Linear Regression and Classification problems, supported by examples and insights from the original research paper.

XGBoost for Linear Regression

Here, XGBoost works like gradient boosting but speed and performance is much better. For our example, base estimator will be mean followed by training decision tree. In XGBoost, process of constructing decision tree is different than gradient boosting.

Example consists a dataset of some student with expected salary based on their grades out of 10.

Dataset

Now, to calculate Similarity score (SS), take grades and residual column.

Similarity score formula

Here, λ is Regularization parameter, for ease let’s keep is as 0. To find SS for leaf node ( Residual values) , calculate root SS :

SS = (-2.8 + 3.7–1.3 + 0.7)² / 4 + 0 = 0.02

The goal is to split leaf node in a way that increases the similarity score after the split. Split will be based on grade column, from table above, three potential splitting criteria are identified. Among these, one with highest SS will be selected for root node.

Take first splitting criteria — 5.85

Decision tree based on 1st criteria

Calculate SS for both branches to find gain.

SS(left) = 0.7²/1+0 = 0.49 | SS(right) = (-2.8–1.3+3.7)²/3+0 = 0.05

Gain = SS(left) + SS(right) — SS(root)

Gain for 1st criteria = 0.40 + 0.05- 0.02 = 0.52

Take second splitting criteria — 7.1

Decision tree based on 2nd criteria

Following the same steps as previously mentioned:

Gain for 2nd criteria = 5.06

Take third splitting criteria — 8.25

Decision tree based on 3rd criteria

Gain for 3rd criteria = 17.52

Out of all three, third splitting criteria yields maximum gain, so it is selected as the root node. This process continues for remaining nodes, with each split being selected based on highest gain. By repeating this process, XGBoost construct decision tree. In XGBoost depth is already set at 6 for large dataset.

Output values are calculated for all leaf nodes to evaluate model performance.

Output = Sum of Residuals / (number of Residuals + λ)

Let’s assume combine model perform well after two decision tree, this is how it will look like —

Combined Model, LR- Learning rate

XGBoost for Classification

Let’s briefly go through classification. In this case, the process remain same, only difference is that the base estimator is logarithm of the odds log(odds).

Concept of log(odds) —

  • Odds: Ratio of probability of an event occurring to the probability of it not occurring. [Odds = P/(1-P)]
  • Log(Odds): It transform probabilities into a continuous unbounded scale which helps in Linearization and Interpretability.

Similarity score (SS) formula is different for classification case.

SS for Classification, P is probability

and Output Value is calculated by-

Output value for classification

Apart from this, every other step is similar to the process in linear regression. By following this approach, we can build a decision tree for a classification problem, and then get a combined model of several decision trees to achieve better performance, ultimately reducing the residuals to nearly zero.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here