Or why you should not trust predict_proba
methods
In previous articles, I pointed out the importance of knowing how sure a model is about its predictions.
For classification problems, it is not helpful to only know the final class. We need more information to make well-informed decisions in downstream processes. A classification model that only outputs the final class covers important information. We do not know how sure the model is and how much we can trust its prediction.
How can we achieve more trust in the model?
Two approaches can give us more insight into classification problems.
We could turn our point prediction into a prediction set. The goal of the prediction set is to guarantee that it contains the true class with a given probability. The size of the prediction set then tells us how sure our model is about its prediction. The fewer classes the prediction set contains, the surer the model is.