In machine learning, it is common to want to perform dimensionality reduction on a dataset so that the dataset becomes easier to visualize and reason about. Performing dimensionality reduction can take datasets with thousands of features and narrow it down to one or several features. Being able to compute the Eigenvalues and Eigenvectors of a covariance matrix is an important step in dimensionality reduction, which motivates us to learn about variance, covariance, and covariance matrices.
Finding the Mean of a Dataset
Given a dataset of x and y points, you can compute the average of the data with the following formulas:
This will calculate a middle point of the data:
Finding the Variance of a Dataset
The variance of a dataset measures how spread out the data is, and it can be computed given the following formula:
It is a measure of the average square distance from the mean.
Finding the Covariance of Two Features of a Dataset
The covariance measures the relationship between two features of a dataset. A positive covariance means that the data is trending upwards, a negative covariance means that the data is trending downwards, and a…