# The Expectation-Maximization Algorithm

This post is going to talk about a widely used method to find the
maximum likelihood (MLE) or maximum a posteriori (MAP) estimate of parameters
in latent variable models called the Expectation-Maximization algorithm. You
have probably heard about the most famous variant of this algorithm called the
k-means algorithm for clustering.
Even though it's so ubiquitous, whenever I've tried to understand *why* this
algorithm works, I never quite got the intuition right. Now that I've taken
the time to work through the math, I'm going to *attempt* to explain the
algorithm hopefully with a bit more clarity. We'll start by going back to the
basics with latent variable models and the likelihood functions, then moving on
to showing the math with a simple Gaussian mixture model 1.