Minimum Description Length Learning

MDL Learning is basically another interpretation for MAP Learning where we interpret the log terms $lo g_{2} (P (d ∣ h_{i})) + lo g_{2} (P (h_{i}))$ similar to how we did with Information Gain in Decision Tree Induction, as the number of bits to encode data given the hypothesis

- lo g_{2} (P (d ∣ h_{i}))

and the additional bits to encode the hypothesis $- lo g_{2} (P (h_{i})) .$ So to get the best hypothesis $h_{MDL}$ we have to minimize the Entropy of the hypothesis Likelihood. $h_{i} ar g min - lo g_{2} (P (d ∣ h_{i}))$

Intuition If the hypothesis predicts the data exactly, then $P (d ∣ h_{i}) = 1$ which yields $lo g_{2} (1) = 0$ bits. So this is the preferred hypothesis.

Marcs Notes

Explorer

Minimum Description Length Learning

Minimum Description Length Learning

Graphansicht

Backlinks