Maximum Likelihood Estimation
From all possible parameters we will select the ones which most likely generated the training set. (selecting the maximum depending on the Dataset) = MAP Learning for a uniform Prior.
This method uses the Log-Likelihood function. It has to be maximized to get to its minimum (negative sign).
Solve with Newton Method:
So we have:
So basically current parameters minus inverse of the hessian matrix times the Gradient.
One can use different methods to calculate the minimum:
- Gradient Descent
- Newtons Method
- Analytical solution