Maximum Likelihood Estimation

From all possible parameters we will select the ones which most likely generated the training set. (selecting the maximum depending on the Dataset) = MAP Learning for a uniform Prior.

This method uses the Log-Likelihood function. It has to be maximized to get to its minimum (negative sign).

Solve with Newton Method:

So we have:

So basically current parameters minus inverse of the hessian matrix times the Gradient.

One can use different methods to calculate the minimum: