Out-of-bag Evaluation. While working on the Kaggle “House… | by Subhasmita Sahoo | Jun, 2024


While working on the Kaggle “House Prices” challenge, I came across this neat metric called “out-of-bag evaluation” (OOB). It’s a way to check the accuracy of a Random Forest model without needing extra data, similar to cross-validation.

Idea: To begin with, Random forest is a collection of Decision trees. When each “tree” in the Random Forest is built, only a portion of the training data is used. This leaves the rest of the training data for mini testing of that specific tree. The table below illustrates how this works:

https://developers.google.com/machine-learning/decision-forests/out-of-bag

In the above example, we have 6 examples in training data, with which a Random Forest with 3 trees is built. Each tree is made using 6 examples, where each example is sampled from the training data with replacement.

  • Tree 1: Uses all houses except house #3, so we can test the tree on house #3.
  • Tree 2: Uses all houses except house #2, #4, and #6, so we can test the tree on house #2, #4, and #6.
  • Tree 3: Uses all houses except house #1 and #5, so we can test the tree on house #1 and #5.

This way, each tree gets evaluated on data it hasn’t seen before, giving us a reliable estimate of how well our model will do in the real world.

Hope this article helped! Any relevant feedback is appreciated!

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here