Lag columns can significantly boost your model’s performance. Here’s how you can use them to your advantage
The nature of a time series model is such that past values often affect future values. When there’s any kind of seasonality in your data (in other words, your data follows an hourly, daily, weekly, monthly or yearly cycle) this relationship is even stronger.
Capturing this relationship can be done with features like hour, day of week, month, etc, but you can also add lags, which can quickly take your model to the next level.
A lag value is simply this: A value that at one time point or another, preceded your current value.
Let’s say you have a time series dataset that has the following values: [5,10,15,20,25].
25, being your most recent value, is the value at time t.
20 is the value at t-1. 15 is the value at t-2, and so on, until the beginning of the dataset.
This makes intuitive sense, since the word “lag” insinuates that something is “lagging behind” something else.
When we train a model using lag features, we can train it to recognize patterns with regard to how…