In the second part of this tutorial we will be using ChatGPT (o3-mini-high model) and the textbook Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos to understand Time Series Decomposition.
I started with one general prompt to summarize the entire chapter, followed by one prompt per subchapter in which I ask the model to write code to better visualize the concepts in that section.
Main Prompt:
“Give me the main topics of Chapter 3 of the textbook and summarize each topic. Make it thorough but easy to understand.”
Purpose: Time series data often contain multiple patterns. Decomposition is the process of splitting a series into different components so that each type of pattern can be studied separately.
Components:
- Trend-Cycle Component (Tₜ): This represents the long-term movement or direction of the data. It is sometimes called just the trend.
- Seasonal Component (Sₜ): This shows the regular, repeating patterns that occur within fixed periods (for example, months in a year).
- Remainder Component (Rₜ): Also known as the residual, this is what remains after removing the trend-cycle and seasonal components. It includes any random or irregular fluctuations.
Usefulness: Decomposition helps improve understanding of the underlying patterns and can lead to more accurate forecasts.
Follow-up Prompt:
“Write Python code to visualize the components of a generated time series.”
ChatGPT Answer:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose# Set random seed for reproducibility
np.random.seed(42)
# Generate synthetic time series data for two years (730 days)
dates = pd.date_range(start='2025-01-01', periods=730, freq='D')
# Create the components:
# Trend: a linear increase with a small random walk to add realism.
trend = np.linspace(10, 20, 730) + np.cumsum(np.random.normal(0, 0.05, 730))
# Seasonality: a sine function to simulate yearly repeating patterns.
seasonality = 5 * np.sin(2 * np.pi * np.arange(730) / 365)
# Noise: random fluctuations added to the data.
noise = np.random.normal(0, 0.5, 730)
# Combine the components to create the final time series
data_values = trend + seasonality + noise
# Create a Pandas DataFrame with the synthetic data
df = pd.DataFrame({'date': dates, 'sales': data_values})
df.set_index('date', inplace=True)
# Decompose the time series using an additive model
decomposition = seasonal_decompose(df['sales'], model='additive', period=365, extrapolate_trend='freq')
# Plot the decomposition components
fig = decomposition.plot()
fig.set_size_inches(12, 9)
plt.suptitle(
"Time Series Decomposition\n"
"Trend-Cycle Component (Tₜ): Long-term movement\n"
"Seasonal Component (Sₜ): Regular repeating patterns\n"
"Remainder Component (Rₜ): Random or irregular fluctuations",
fontsize=12
)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
Before decomposing a time series, it can be very helpful to adjust or transform the data to make the underlying patterns easier to model.
Remove variations caused by the calendar. For example, monthly sales might differ simply because some months have more trading days than others. Converting total sales to average sales per day can eliminate this effect.
When data are affected by population size, it makes sense to use per-capita values. For example, rather than using the total number of hospital beds, you might use the number of beds per 1,000 people.
file_path = "datasets/Global Economy Indicators.csv"
df = pd.read_csv(file_path)# Strip whitespace from column names and country names
df.columns = df.columns.str.strip()
df["Country"] = df["Country"].str.strip()
# Filter data for United Arab Emirates
uae_data = df[df["Country"] == "United Arab Emirates"]
# Plot GDP per capita over time
plt.figure(figsize=(10, 5))
plt.plot(uae_data["Year"], uae_data["Per capita GNI"], marker='o', linestyle='-', color='orange')
# Customize plot
plt.xlabel("Year")
plt.ylabel("GDP per Capita (GNI)")
plt.title("GDP per Capita of United Arab Emirates Over Time")
plt.grid(True)
# Show the plot
plt.show()
Financial data should often be adjusted for inflation. By converting all monetary values to constant dollars (using a price index like the CPI), you can compare values across time more accurately.
- Mathematical Transformations:
Sometimes a time series shows variation that changes with the level of the series. For example, when the seasonal fluctuations become larger as the overall level increases, a simple logarithmic transformation may help stabilize the variance. A logarithmic transformation converts each observation into
This is useful because differences on the log scale correspond to percentage changes on the original scale. In other words, if you use logarithms (commonly the natural logarithm), a difference of 1 unit on the log scale implies a multiplication by e (approximately 2.718) on the original scale.
Other power transformations — such as square roots or cube roots — can also be applied. These are known as power transformations because they can be written in the form
A particularly useful family of transformations, which includes both logarithms and other power transformations, is the Box‐Cox transformation (Box & Cox, 1964). The Box‐Cox transformation depends on a parameter lambda and is defined as follows:
In most applications (and in the version discussed in this textbook), the transformation is applied only to positive data. A good value of lambda is one that makes the seasonal variation approximately constant across the series, which simplifies the forecasting model. Methods such as the Guerrero method can help select an optimal lambda; for example, in some cases, a value around 0.10–0.11 is recommended.
Prompt:
“Write Python code for the Box-Cox method on a synthetic dataset. Make it interactive”
ChatGPT Answer:
# Enable an interactive matplotlib backend.
# If using Jupyter Notebook, you may try:
# %matplotlib widget
# If that doesn't work, try: %matplotlib inline%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import boxcox, boxcox_normmax
import ipywidgets as widgets
from ipywidgets import interact
# -------------------------------
# Generate Synthetic Time Series Data
# -------------------------------
np.random.seed(42) # Set random seed for reproducibility
# Create a date range for two years (730 days)
dates = pd.date_range(start='2025-01-01', periods=730, freq='D')
# Generate components of the time series:
# Trend: linear increase with a small random walk for realism
trend = np.linspace(10, 20, 730) + np.cumsum(np.random.normal(0, 0.05, 730))
# Seasonality: yearly seasonality using a sine function
seasonality = 5 * np.sin(2 * np.pi * np.arange(730) / 365)
# Noise: random noise added to the data
noise = np.random.normal(0, 0.5, 730)
# Combine the components
data_values = trend + seasonality + noise
# Ensure that all values are positive (required for Box-Cox transformation)
if (data_values <= 0).any():
offset = abs(data_values.min()) + 1
data_values += offset
print("Data shifted by offset:", offset)
# Create a DataFrame with the synthetic data
df = pd.DataFrame({'date': dates, 'sales': data_values})
df.set_index('date', inplace=True)
# -------------------------------
# Box-Cox Transformation Setup
# -------------------------------
# Choose the data for Box-Cox transformation
data_for_boxcox = df['sales']
# Compute the optimal lambda using maximum likelihood estimation (MLE)
optimal_lambda = boxcox_normmax(data_for_boxcox, method='mle')
print("Optimal lambda (from MLE):", optimal_lambda)
# -------------------------------
# Interactive Box-Cox Transformation Visualization
# -------------------------------
def update_boxcox(lmbda_value):
"""
Apply the Box-Cox transformation with the provided lambda value and
plot the original and transformed series side by side.
"""
# Apply Box-Cox transformation (using the provided lambda)
transformed = boxcox(data_for_boxcox, lmbda=lmbda_value)
plt.figure(figsize=(14, 6))
# Plot Original Series
plt.subplot(1, 2, 1)
plt.plot(df.index, df['sales'], color='blue', label='Original Sales')
plt.title("Original Sales Data")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
# Plot Transformed Series
plt.subplot(1, 2, 2)
plt.plot(df.index, transformed, color='orange', label=f"Transformed (λ = {lmbda_value:.2f})")
plt.title(f"Box-Cox Transformed Sales Data (λ = {lmbda_value:.2f})")
plt.xlabel("Date")
plt.ylabel("Transformed Sales")
plt.legend()
plt.tight_layout()
plt.show()
# Create an interactive slider for lambda.
interact(update_boxcox,
lmbda_value=widgets.FloatSlider(
value=optimal_lambda, min=-2, max=2, step=0.01, description='Lambda:')
);
Moving averages are a basic tool for estimating the trend-cycle component.
- Simple Moving Average (SMA):
Averages a fixed number of observations (order m = 2k+1) to smooth out short-term fluctuations.
- Weighted Moving Averages:
Different weights are assigned to observations (often giving more weight to the center) to create a smoother estimate. In some cases, a moving average is applied to a moving average to achieve symmetry (for example, a 2×4-MA).
Follow-up Prompt:
“Write an interactive Python script similar to the Box-Cox method one, where using several sliders I can change the weights of moving averages on a sample time series. Make sure they all add to 1 and are symmetrical.”
ChatGPT Answer:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interact, FloatSlider
%matplotlib inline# -------------------------------
# Generate Synthetic Time Series Data
# -------------------------------
np.random.seed(42) # Set random seed for reproducibility
# Create a date range for two years (730 days)
dates = pd.date_range(start='2025-01-01', periods=730, freq='D')
# Generate components of the time series:
# Trend: linear increase with a small random walk for realism
trend = np.linspace(10, 20, 730) + np.cumsum(np.random.normal(0, 0.05, 730))
# Seasonality: yearly seasonality using a sine function
seasonality = 5 * np.sin(2 * np.pi * np.arange(730) / 365)
# Noise: random noise added to the data
noise = np.random.normal(0, 0.5, 730)
# Combine the components
data_values = trend + seasonality + noise
# Ensure that all values are positive (Box-Cox requires positive values, and it is useful here too)
if (data_values <= 0).any():
offset = abs(data_values.min()) + 1
data_values += offset
print("Data shifted by offset:", offset)
# Create a DataFrame with the synthetic data
df = pd.DataFrame({'date': dates, 'sales': data_values})
df.set_index('date', inplace=True)
# -------------------------------
# Interactive Moving Average Smoothing
# -------------------------------
# For a 5-term symmetric moving average, we define the kernel as:
# [w, v, c, v, w] where c = 1 - 2*(w+v)
# To ensure all weights are nonnegative, we must have w+v <= 0.5.
def update_ma(w, v):
# Check the constraint on the weights:
if w + v > 0.5:
print(f"The sum of outer and inner weights must be <= 0.5. Current sum = {w+v:.2f}")
return
# Compute the center weight
c = 1 - 2 * (w + v)
# Create the symmetric kernel
kernel = np.array([w, v, c, v, w])
print("Kernel weights:", kernel)
# Apply the moving average smoothing using convolution in 'valid' mode.
# 'valid' mode returns only the values where the kernel fully overlaps the data.
smoothed_valid = np.convolve(df['sales'], kernel, mode='valid')
# Create a new index that corresponds to the "valid" part of the convolution.
# For a 5-term kernel, skip the first 2 and last 2 observations.
new_index = df.index[2:-2]
# Plot original and smoothed data side by side
plt.figure(figsize=(14, 6))
plt.plot(df.index, df['sales'], label="Original Data", alpha=0.5, color="blue")
plt.plot(new_index, smoothed_valid, label="Smoothed Data", color="orange", linewidth=2)
plt.title("Moving Average Smoothing with Custom Kernel (Edges Skipped)")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.show()
# Create interactive sliders for 'w' (outer weight) and 'v' (inner weight)
interact(update_ma,
w=FloatSlider(min=0, max=0.5, step=0.01, value=0.1, description='Outer Weight (w):'),
v=FloatSlider(min=0, max=0.5, step=0.01, value=0.1, description='Inner Weight (v):'));
Classical decomposition breaks down a time series into its components using a simple, step-by-step method.
• Additive Decomposition:
For a series assumed to be the sum of its components:
1. Estimate the Trend-Cycle: Use a moving average. If the seasonal period (m) is even, a 2×m moving average is often used; if odd, an m moving average is used.
2. Detrend the Series: Subtract the estimated trend-cycle from the original series.
3. Estimate the Seasonal Component: Average the detrended values for each season (e.g., average all the detrended values for March in monthly data).
4. Calculate the Remainder: Subtract both the trend-cycle and seasonal components from the original data.
• Multiplicative Decomposition:
For series where seasonal fluctuations change proportionally with the level of the series, the components are multiplied rather than added.
• Limitations:
Classical methods can be simple to understand but have drawbacks. They often do not produce trend estimates for the first and last few observations, may oversmooth rapid changes, and assume that the seasonal pattern remains exactly the same each period.
Follow-up Prompt:
“Write a Python script to exemplify the additive and multiplicative time series decomposition methods with plots. The plots should be next to each other and on the same time series.”
ChatGPT Answer:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose# Set random seed for reproducibility
np.random.seed(42)
# -------------------------------
# Generate Synthetic Time Series Data
# -------------------------------
# Create a date range for two years (730 days)
dates = pd.date_range(start='2025-01-01', periods=730, freq='D')
# Generate components of the time series:
# Trend: a linear increase with a small random walk for realism
trend = np.linspace(10, 20, 730) + np.cumsum(np.random.normal(0, 0.05, 730))
# Seasonality: yearly seasonality using a sine function
seasonality = 5 * np.sin(2 * np.pi * np.arange(730) / 365)
# Noise: random noise added to the data
noise = np.random.normal(0, 0.5, 730)
# Combine the components to form the synthetic series
data_values = trend + seasonality + noise
# Ensure all values are positive (required for multiplicative decomposition)
if (data_values <= 0).any():
offset = abs(data_values.min()) + 1
data_values += offset
print("Data shifted by offset:", offset)
# Create a DataFrame with the synthetic data
df = pd.DataFrame({'date': dates, 'value': data_values})
df.set_index('date', inplace=True)
# -------------------------------
# Perform Decomposition
# -------------------------------
# For this demonstration, we use a seasonal period of 365 (yearly seasonality).
# Note: With 730 days of data, we have exactly 2 cycles.
add_decomp = seasonal_decompose(df['value'], model='additive', period=365, extrapolate_trend='freq')
mult_decomp = seasonal_decompose(df['value'], model='multiplicative', period=365, extrapolate_trend='freq')
# -------------------------------
# Plot Additive vs. Multiplicative Decomposition Side by Side
# -------------------------------
# Create a figure with 4 rows (for each component) and 2 columns (additive & multiplicative)
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(14, 12), sharex=True)
# Titles for the columns
axes[0,0].set_title("Additive Decomposition")
axes[0,1].set_title("Multiplicative Decomposition")
# Row labels for each component
row_labels = ['Observed', 'Trend', 'Seasonal', 'Residual']
components = ['observed', 'trend', 'seasonal', 'resid']
# Plot each component for both decomposition methods
for i, comp in enumerate(components):
# Additive decomposition plot
axes[i, 0].plot(add_decomp.observed.index, getattr(add_decomp, comp), color='blue')
axes[i, 0].set_ylabel(row_labels[i])
# Multiplicative decomposition plot
axes[i, 1].plot(mult_decomp.observed.index, getattr(mult_decomp, comp), color='red')
axes[i, 1].set_ylabel(row_labels[i])
# Set x-axis label on the bottom row
axes[3,0].set_xlabel("Date")
axes[3,1].set_xlabel("Date")
plt.suptitle("Additive vs. Multiplicative Decomposition", fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
Beyond classical methods, more advanced techniques are available to handle various types of seasonality and to overcome some of the limitations of simple moving averages.
X-11 Method:
Developed by the US Census Bureau, this method refines classical decomposition by adjusting for calendar effects and by providing trend estimates for all periods, including the endpoints.
SEATS Method:
Developed by the Bank of Spain, SEATS is a method based on ARIMA models that also handles seasonal adjustment robustly.
STL (Seasonal and Trend decomposition using Loess):
STL is a flexible and robust method that uses local regression (loess) to estimate the trend-cycle and seasonal components.
Advantages:
- It can handle any type of seasonality (not just monthly or quarterly).
- The seasonal component can change over time, and the user can control the smoothness of both the trend and seasonal components.
- It can be made robust to outliers.
Disadvantages:
- It does not automatically adjust for trading-day effects or other calendar variations.
• Choosing Parameters:
For methods like STL, the trend window (how many observations to use for the trend) and the seasonal window (how many seasons to average) are key parameters. Adjusting these can improve the model’s ability to capture rapid changes, such as those during a financial crisis.
Follow-up Prompt:
“Write some Python code to illustrate the difference between STL and classical decomposition methods. Make choosing STL parameters interactive.”
ChatGPT Answer:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL, seasonal_decompose
import ipywidgets as widgets
from ipywidgets import interact
%matplotlib inline# -------------------------------
# Generate Synthetic Time Series Data
# -------------------------------
np.random.seed(42) # For reproducibility
# Create a date range for two years (730 days)
dates = pd.date_range(start='2025-01-01', periods=730, freq='D')
# Generate synthetic components:
# Trend: Linear increase with a small random walk for realism
trend = np.linspace(10, 20, 730) + np.cumsum(np.random.normal(0, 0.05, 730))
# Seasonality: Yearly seasonality using a sine function
seasonality = 5 * np.sin(2 * np.pi * np.arange(730) / 365)
# Noise: Random noise added to the data
noise = np.random.normal(0, 0.5, 730)
# Combine components to form the synthetic series
data_values = trend + seasonality + noise
# Ensure all values are positive (required for multiplicative methods)
if (data_values <= 0).any():
offset = abs(data_values.min()) + 1
data_values += offset
print("Data shifted by offset:", offset)
# Create a DataFrame with the synthetic data
df = pd.DataFrame({'date': dates, 'sales': data_values})
df.set_index('date', inplace=True)
# -------------------------------
# Interactive Decomposition Comparison
# -------------------------------
def update_decompositions(trend_window, seasonal_window):
"""
Update and plot STL and classical additive decompositions for the synthetic series,
using interactive sliders for the STL trend and seasonal window parameters.
For daily data with period=365, trend_window must be > 365.
"""
# Ensure the provided parameters are integers and odd.
trend_window = int(trend_window)
seasonal_window = int(seasonal_window)
if trend_window % 2 == 0:
trend_window += 1
if seasonal_window % 2 == 0:
seasonal_window += 1
# Print the parameters for clarity.
print(f"Using STL parameters: trend window = {trend_window}, seasonal window = {seasonal_window}")
# Perform STL decomposition with the chosen parameters.
stl = STL(df['sales'], period=365, trend=trend_window, seasonal=seasonal_window, robust=True)
stl_result = stl.fit()
# Perform classical additive decomposition using seasonal_decompose.
classical_result = seasonal_decompose(df['sales'], model='additive', period=365, extrapolate_trend='freq')
# Create a figure with 4 rows (for each component) and 2 columns (STL vs. Classical)
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(14, 12), sharex=True)
plt.subplots_adjust(hspace=0.4)
# Titles for each column
axes[0, 0].set_title(f"STL Decomposition\n(trend window = {trend_window}, seasonal window = {seasonal_window})")
axes[0, 1].set_title("Classical Decomposition")
# Define component labels and mapping for STL
components = ['Observed', 'Trend', 'Seasonal', 'Residual']
stl_components = {
'Observed': stl_result.observed,
'Trend': stl_result.trend,
'Seasonal': stl_result.seasonal,
'Residual': stl_result.resid
}
# For classical decomposition, the attributes are: observed, trend, seasonal, resid
classical_components = {
'Observed': classical_result.observed,
'Trend': classical_result.trend,
'Seasonal': classical_result.seasonal,
'Residual': classical_result.resid
}
# Plot each component for both decompositions
for i, comp in enumerate(components):
# Plot STL decomposition (left column)
axes[i, 0].plot(stl_components[comp].index, stl_components[comp], label=comp, color=f"C{i}")
axes[i, 0].set_ylabel(comp)
axes[i, 0].legend(loc='upper left')
# Plot Classical decomposition (right column)
axes[i, 1].plot(classical_components[comp].index, classical_components[comp], label=comp, color=f"C{i}")
axes[i, 1].set_ylabel(comp)
axes[i, 1].legend(loc='upper left')
# Set x-axis labels on the bottom row for both columns
axes[3, 0].set_xlabel("Date")
axes[3, 1].set_xlabel("Date")
fig.suptitle("Comparison of STL vs. Classical Additive Decomposition", fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
# Create interactive sliders:
# For daily data with period=365, the STL trend window must be greater than 365.
# We set the trend window slider from 367 to 701 (odd values only).
# The seasonal window slider can be from 3 to 31.
interact(update_decompositions,
trend_window=widgets.IntSlider(min=367, max=701, step=2, value=367, description='Trend Window'),
seasonal_window=widgets.IntSlider(min=3, max=31, step=2, value=11, description='Seasonal Window'));
Once the seasonal component is removed from the original data, the remaining values are called seasonally adjusted data.
- Additive: yₜ − Sₜ
- Multiplicative: yₜ / Sₜ
- Usage:
Seasonally adjusted series are often used when the focus is on understanding the underlying trend and cyclic behavior, without the distraction of regular seasonal effects.
Follow-up Prompt:
“Write some Python code to illustrate the difference between a seasonally adjusted time series versus a non-adjusted time series.”
ChatGPT Answer:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose%matplotlib inline
# =============================================================================
# 6. Seasonally Adjusted Data
#
# Definition:
# Once the seasonal component is removed from the original data, the remaining
# values are called seasonally adjusted data.
# Additive: yₜ − Sₜ
# Multiplicative: yₜ / Sₜ
#
# Usage:
# Seasonally adjusted series are often used when the focus is on understanding
# the underlying trend and cyclic behavior without the distraction of regular
# seasonal effects.
# =============================================================================
# -------------------------------
# Part A: Additive Series Example
# -------------------------------
# Generate synthetic additive series:
np.random.seed(42) # For reproducibility
dates = pd.date_range(start='2025-01-01', periods=730, freq='D')
# Components for additive series
trend_add = np.linspace(50, 60, 730) # Linear trend
seasonal_add = 10 * np.sin(2 * np.pi * np.arange(730) / 365) # Seasonal (sine wave) component
noise_add = np.random.normal(0, 2, 730) # Random noise
# Create additive series: y = trend + seasonal + noise
y_add = trend_add + seasonal_add + noise_add
# Create DataFrame for additive series
df_add = pd.DataFrame({'date': dates, 'y': y_add})
df_add.set_index('date', inplace=True)
# Perform additive decomposition (using a seasonal period of 365 days)
decomp_add = seasonal_decompose(df_add['y'], model='additive', period=365, extrapolate_trend='freq')
# Compute seasonally adjusted data: Remove the seasonal component
seasonally_adjusted_add = decomp_add.observed - decomp_add.seasonal
# -------------------------------
# Part B: Multiplicative Series Example
# -------------------------------
# For a multiplicative series, the seasonal effect is proportional to the level.
# Generate synthetic multiplicative series:
np.random.seed(42)
# Trend remains similar
trend_mult = np.linspace(50, 60, 730)
# Seasonal multiplicative component: oscillates around 1 (e.g., 1 ± 0.2)
seasonal_mult = 1 + 0.2 * np.sin(2 * np.pi * np.arange(730) / 365)
# Multiplicative noise around 1 (small variation)
noise_mult = np.random.normal(1, 0.05, 730)
# Create multiplicative series: y = trend * seasonal * noise
y_mult = trend_mult * seasonal_mult * noise_mult
# Create DataFrame for multiplicative series
df_mult = pd.DataFrame({'date': dates, 'y': y_mult})
df_mult.set_index('date', inplace=True)
# Perform multiplicative decomposition
decomp_mult = seasonal_decompose(df_mult['y'], model='multiplicative', period=365, extrapolate_trend='freq')
# Compute seasonally adjusted data: Divide out the seasonal component
seasonally_adjusted_mult = decomp_mult.observed / decomp_mult.seasonal
# -------------------------------
# Plotting the Results Side by Side
# -------------------------------
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(14, 10), sharex=True)
# Plot for the Additive Series
axes[0].plot(df_add.index, df_add['y'], label='Original Additive Series', color='blue', alpha=0.6)
axes[0].plot(df_add.index, seasonally_adjusted_add, label='Seasonally Adjusted (y - Sₜ)', color='orange', linewidth=2)
axes[0].set_title("Additive Series: Original vs. Seasonally Adjusted")
axes[0].set_ylabel("Value")
axes[0].legend()
axes[0].grid(True)
# Plot for the Multiplicative Series
axes[1].plot(df_mult.index, df_mult['y'], label='Original Multiplicative Series', color='green', alpha=0.6)
axes[1].plot(df_mult.index, seasonally_adjusted_mult, label='Seasonally Adjusted (y / Sₜ)', color='red', linewidth=2)
axes[1].set_title("Multiplicative Series: Original vs. Seasonally Adjusted")
axes[1].set_xlabel("Date")
axes[1].set_ylabel("Value")
axes[1].legend()
axes[1].grid(True)
plt.tight_layout()
plt.show()
• Purpose of Decomposition:
Decomposing a time series helps to understand and isolate the different factors affecting the data. This improves both analysis and forecasting accuracy.
• Method Choice:
The choice between classical, X-11, SEATS, STL, or other decomposition methods depends on the data characteristics (e.g., frequency of observations, presence of multiple seasonal patterns) and the specific needs of the forecast.
• Application in Forecasting:
Once the series is decomposed, the trend-cycle component is especially useful for long-term forecasts, while seasonally adjusted data is often used to analyze short-term changes without seasonal noise.
In our exploration of time series decomposition, we’ve seen how breaking down a series into its constituent components — trend-cycle, seasonal, and residual — can offer powerful insights into underlying patterns. By applying both classical and advanced methods such as STL and X‑11/SEATS (or their practical alternatives), you gain a clearer view of how your data behaves, allowing for more accurate forecasting and more informed decision-making.
Understanding seasonal adjustment is especially valuable. Whether you’re subtracting the seasonal component in an additive model or dividing it out in a multiplicative model, the goal is the same: to remove regular seasonal fluctuations so that the true trend and cyclic behavior of the data stand out. This is essential when the seasonal effects might otherwise mask important shifts or turning points in the underlying data.
One of the exciting aspects of modern data analysis is how tools like large language models (LLMs) can accelerate our learning process. With LLMs such as ChatGPT, complex textbook content can be summarized quickly, and example code can be generated and customized in real time. Interactive visualizations further empower us to experiment with different parameters and immediately see their impact, bridging the gap between theory and practice.
As you continue to explore time series forecasting, remember that the key lies in understanding your data deeply. Whether you’re dealing with economic indicators, sales data, or any other sequential information, effective decomposition and seasonal adjustment are foundational techniques that pave the way for accurate and actionable forecasts.
You can find the code snippets arranged in a notebook in this repo. Alternatively, you can copy and paste each code block into your IPython Notebook, run them in sequence, and see how each visualization technique reveals different aspects of your time series data.
See you in Part 3 of the tutorial!