Skip to content
/ mle Public

Machine Learning: Maximum Likelihood Estimation (MLE)

License

Notifications You must be signed in to change notification settings

hyzhak/mle

Repository files navigation

MLE and MAP

Machine Learning: Maximum Likelihood Estimation (MLE) and Maximum a Posteri (MAP) Estimation

Subtleties

ML doesn't work good with sparse data because P(X | Y) might be zero. (for example, Xi = birthdate, Xi = Jan_25_1992)

P(Y=1 | X1...Xn) = (P(Y=1) * Mult P(Xi | Y=1) for i) / P (X1...Xn)

We can solve it by using prior with MAP estimation.

MLE

Pros

  • invariant under reparameterization. So we can wrap \theta_{MLE} in any function.

MAP

Pros

  • avoid overfitting (regularization / shrinkage)
  • tends to look like MLE asymptotically

Cons

  • point estimation (no representation of uncertainty in θ). Because it could choose spike of θ because it has higher probability
  • not invariant under reparameterization
  • must assume prior on θ

Examples

Univariate Gaussian mean

\theta_{MAP} =  \overline{x} * \frac{n}{n   \sigma^2}   \mu * \frac{\sigma^2}{n   \sigma^2}

in other words it is sample mean plus prior mean.

\theta_{MLE} = \overline{x}

so when n->0 we get

\theta_{MAP} \rightarrow \mu

but when n->∞ we get

\theta_{MAP} \rightarrow \theta_{MLE}

Related Topics

Videos

Jeff Miller (mathematicalmonk)

Releases

No releases published

Packages

No packages published