Mathematics for Machine Learning: Variance, Covariance, and Covariance Matrices | by Gary Drocella | Feb, 2025


Mark Rimmel

In machine learning, it is common to want to perform dimensionality reduction on a dataset so that the dataset becomes easier to visualize and reason about. Performing dimensionality reduction can take datasets with thousands of features and narrow it down to one or several features. Being able to compute the Eigenvalues and Eigenvectors of a covariance matrix is an important step in dimensionality reduction, which motivates us to learn about variance, covariance, and covariance matrices.

Finding the Mean of a Dataset

Given a dataset of x and y points, you can compute the average of the data with the following formulas:

This will calculate a middle point of the data:

Finding the Variance of a Dataset

The variance of a dataset measures how spread out the data is, and it can be computed given the following formula:

It is a measure of the average square distance from the mean.

Finding the Covariance of Two Features of a Dataset

The covariance measures the relationship between two features of a dataset. A positive covariance means that the data is trending upwards, a negative covariance means that the data is trending downwards, and a…

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here