Estimate the Unobserved: Moving-Average Model Estimation with Maximum Likelihood in Python | by Daniel Pollak | Jun, 2024


MLE provides a framework that precisely tackles this question. It introduces a likelihood function, which is a function that yields another function. This likelihood function takes a vector of parameters, often denoted as theta, and produces a probability density function (PDF) that depends on theta.

The likelihood function general definition

The probability density function (PDF) of a distribution is a function that takes a value, x, and returns its probability within the distribution. Therefore, likelihood functions are typically expressed as follows:

Likelihood as a function of theta given x

The value of this function indicates the likelihood of observing x from the distribution defined by the PDF with theta as its parameters.

The goal

When constructing a forecast model, we have data samples and a parameterized model, and our goal is to estimate the model’s parameters. In our examples, such as Regression and MA models, these parameters are the coefficients in the respective model formulas.

Statistic model estimation process

The equivalent in MLE is that we have observations and a PDF for a distribution defined over a set of parameters, theta, which are unknown and not directly observable. Our goal is to estimate theta.

The MLE approach involves finding the set of parameters, theta, that maximizes the likelihood function given the observable data, x.

Maximization of the likelihood function

We assume our samples, x, are drawn from a distribution with a known PDF that depends on a set of parameters, theta. This implies that the likelihood (probability) of observing x under this PDF is essentially 1. Therefore, identifying the theta values that make our likelihood function value close to 1 on our samples, should reveal the true parameter values.

Conditional likelihood

Notice that we haven’t made any assumptions about the distribution (PDF) on which the likelihood function is based. Now, let’s assume our observation X is a vector (x_1, x_2, …, x_n). We’ll consider a probability function that represents the probability of observing x_n conditional on that we have already observed (x_1, x_2, …, x_{n-1}) —

This represents the likelihood of observing just x_n given the previous values (and theta, the set of parameters). Now, we define the conditional likelihood function as follows:

Conditional likelihood function

Later, we will see why it is useful to employ the conditional likelihood function rather than the exact likelihood function.

Log-Likelihood

In practice, it is often convenient to use the natural logarithm of the likelihood function, referred to as the log-likelihood function:

Maximize the log-likelihood function

This is more convenient because we often work with a likelihood function that is a joint probability function of independent variables, which translates to the product of each variable’s probability. Taking the logarithm converts this product into a sum.

For simplicity, I’ll demonstrate how to estimate the most basic moving average model — MA(1):

MA(1) model

Here, x_t represents the time-series observations, alpha and beta are the model parameters to be estimated, and the epsilons are random noise drawn from a normal distribution with zero mean and some variance — sigma, which will also be estimated. Therefore, our “theta” is (alpha, beta, sigma), which we aim to estimate.

Let’s define our parameters and generate some synthetic data using Python:

import pandas as pd
import numpy as np

STD = 3.3
MEAN = 0
ALPHA = 18
BETA = 0.7
N = 1000

df = pd.DataFrame({"et": np.random.normal(loc=MEAN, scale=STD, size=N)})
df["et-1"] = df["et"].shift(1, fill_value=0)
df["xt"] = ALPHA + (BETA*df["et-1"]) + df["et"]

Note that we have set the standard deviation of the error distribution to 3.3, with alpha at 18 and beta at 0.7. The data looks like this —

Simulation of MA(1) DGP

Likelihood function for MA(1)

Our objective is to construct a likelihood function that addresses the question: how likely is it to observe our time series X=(x_1, …, x_n) assuming they are generated by the MA(1) process described earlier?

Likelihood for observing X

The challenge in computing this probability lies in the mutual dependence among our samples — as evident from the fact that both x_t and x_{t-1} depend on e_{t-1) — making it non-trivial to determine the joint probability of observing all samples (referred to as the exact likelihood).

Therefore, as discussed previously, instead of computing the exact likelihood, we’ll work with a conditional likelihood. Let’s begin with the likelihood of observing a single sample given all previous samples:

Conditional likelihood for observing x_n given the rest

This is much simpler to calculate because —

PDF of normal distribution

All that remains is to calculate the conditional likelihood of observing all samples:

applying a natural logarithm gives:

Final likelihood function to maximize

which is the function we should maximize.

Code

We’ll utilize the GenericLikelihoodModel class from statsmodels for our MLE estimation implementation. As outlined in the tutorial on statsmodels’ website, we simply need to subclass this class and include our likelihood function calculation:

from scipy import stats
from statsmodels.base.model import GenericLikelihoodModel
import statsmodels.api as sm

class MovingAverageMLE(GenericLikelihoodModel):
def initialize(self):
super().initialize()
extra_params_names = ['beta', 'std']
self._set_extra_params_names(extra_params_names)

self.start_params = np.array([0.1, 0.1, 0.1])

def calc_conditional_et(self, intercept, beta):
df = pd.DataFrame({"xt": self.endog})
ets = [0.0]
for i in range(1, len(df)):
ets.append(df.iloc[i]["xt"] - intercept - (beta*ets[i-1]))

return ets

def loglike(self, params):
ets = self.calc_conditional_et(params[0], params[1])
return stats.norm.logpdf(
ets,
scale=params[2],
).sum()

The function loglike is essential to implement. Given the iterated parameter values paramsand the dependent variables (in this case, the time series samples), which are stored as class members self.endog, it calculates the conditional log-likelihood value, as we discussed earlier.

Now let’s create the model and fit on our simulated data:

df = sm.add_constant(df) # add intercept for estimation (alpha)
model = MovingAverageMLE(df["xt"], df["const"])
r = model.fit()
r.summary()

and the output is:

MLE results from python

And that’s it! As demonstrated, MLE successfully estimated the parameters we selected for simulation.

Estimating even a simple MA(1) model with maximum likelihood demonstrates the power of this method, which not only allows us to make efficient use of our data but also provides a solid statistical foundation for understanding and interpreting the dynamics of time series data.

Hope you liked it !

[1] Andrew Lesniewski, Time Series Analysis, 2019, Baruch College, New York

[2] Eric Zivot, Estimation of ARMA Models, 2005

Unless otherwise noted, all images are by the author

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here