An Illustrated Overview of Mathematical Logic Used in AI Training
It’s always beneficial to understand how things work. In this article, I will provide a very simple overview of the basic mathematical logic used in training AI models. I promise that if you have a basic education, the following examples will be understandable, and you’ll gain a slightly better understanding of the field of artificial intelligence.
Let’s assume we want to create a new AI model to forecast our company’s sales revenue. We have data on the sales revenue of the past two months, advertising costs, and product prices.
In other words, we want to create a model that tells us how our sales revenue depends on the price of our product and the advertising expenses. Using such a tool, a marketing specialist could, for example, calculate the expected sales revenue if they spend €50 on advertising and set the product price at €6.
At its core, AI is nothing more than a mathematical formula (or a set of formulas). Our sales forecasting example could be presented as a mathematical formula like this:
The formula exists, but we don’t know the values to assign to the model’s parameters m and n. In other words, we don’t know how much increasing advertising costs and adjusting the product price affects our sales revenue.
When we start training the AI, we can assign random values to the model’s parameters. For example, we initially set the advertising cost parameter to 2 and the price parameter to -2.
Now, we simply try it out. If we multiply the advertising cost and product price by their respective parameter values, we see that our initial model is overly optimistic. In the first month, the actual sales revenue was €5, but our model predicted €30. In the second month, the actual sales revenue was €18, and our model predicted €52.
If the error is 0, the model is perfect, and no adjustment is needed.
If the error is > 0, the model gave an overly optimistic result:
- Decrease the weights (parameters) if the corresponding input feature (e.g., advertising cost or product price) has a positive value.
- Increase the weights (parameters) if the corresponding input feature has a negative value.
If the error is < 0, the model was too pessimistic:
- Increase the weights (parameters) if the corresponding input feature has a positive value.
- Decrease the weights (parameters) if the corresponding input feature has a negative value.
Following the learning rule, we need to decrease both parameters because both advertising cost and product price have positive value. For instance, we reduce the advertising cost weight from 2 to 1 and the price parameter from -2 to -3.
If we recalculate, we see that our model now predicts accurately. Great, our first manually trained AI model is ready.
If you think the above model is too good to be true, you are correct. Our model worked perfectly on the training data. To assess the model’s accuracy, it must be tested on data that wasn’t used in the training process.
We trained our model on data from January and February. Now, let’s check how well the model can predict the sales revenue for March and April.
From the table above, we see that the model predicts the sales revenue for March as €28 (actual €24) and for April as €21 (actual €18). On average, our model makes an error of €3.5 on new data, which we can call the accuracy of our model.
In conclusion, AI at its core is a mathematical formula. In our example, the formula had two parameters; the GPT-4 model has more than a trillion (1 trillion = 1,000,000,000,000) parameters. Both are trained on the same principle: gradually adjusting the model’s parameters to reduce the error.
It is also important to remember that AI learns on training data, but its accuracy can only be assessed using data not used during training (test data).