Minimum Description Length Learning

MDL Learning is basically another interpretation for MAP Learning where we interpret the log terms similar to how we did with Information Gain in Decision Tree Induction, as the number of bits to encode data given the hypothesis

and the additional bits to encode the hypothesis So to get the best hypothesis we have to minimize the Entropy of the hypothesis Likelihood.

Intuition If the hypothesis predicts the data exactly, then which yields bits. So this is the preferred hypothesis.