ML Math Derivations (13): EM Algorithm and GMM

Sun, 01 Feb 2026 09:00:00 +0000

When data has hidden structure — like an unobserved cluster label, a missing feature, or an unseen topic — maximum likelihood becomes challenging. The log of a sum has no closed form, and gradient methods get entangled with the latent variables. The EM algorithm sidesteps the difficulty with a deceptively simple idea: alternate between guessing the hidden variables under a posterior (E-step) and fitting the parameters as if those guesses were true (M-step). Each iteration is mathematically guaranteed to push the likelihood up. This post derives EM from first principles, proves the monotone-ascent property using Jensen’s inequality, and explores its most famous application: Gaussian Mixture Models (GMM) — the soft, elliptical generalization of K-means.

Expectation Maximization on Chen Kai Blog

ML Math Derivations (13): EM Algorithm and GMM