Maximum A Posteriori Learning

Instead of Full Bayesian Learning we try to choos the MAP hypothesis that maximizes This can be rewritten using Bayes Rule to (we can leave out as it is constant for all hypothesis. We could in theory even leave out when the dataset is large → ML Learning).

or even better by utilizing the properties of the logarithm So we get with which requires solving an optimization problem instead.

Predictions The predictions made this way are approximately Bayesian to the extent that

This means, for predictions we have to only compute this one probability and do not have to sum over all possible hypothesis.