ML Math Derivations (14): Variational Inference and Variational EM

Mon, 02 Feb 2026 09:00:00 +0000

When the posterior $p(\mathbf{z}\mid\mathbf{x})$ is intractable, you have two roads. Sampling (MCMC) walks a Markov chain whose stationary distribution is the posterior — eventually exact, but slow and hard to diagnose. Variational inference (VI) instead picks a simple family $\mathcal{Q}$ of distributions and finds the member $q^\star\in\mathcal{Q}$ that lies closest to the true posterior. Inference becomes optimization, and the same machinery that fits a neural network now fits a Bayesian model.

This post derives VI from a single identity, builds the mean-field algorithm and CAVI from that identity, connects EM and variational EM as special cases, and ends with the reparameterization trick that turns the ELBO into a stochastic objective compatible with autodiff — the engine inside every VAE.

PDE and ML (4): Variational Inference and the Fokker-Planck Equation

Sat, 15 Jun 2024 09:00:00 +0000

Why do variational inference (a method that looks purely optimization) and Langevin MCMC (a method that looks purely sampling) end up at the same partial differential equation?

That is the heart of this article. In continuous time, they are two faces of the same Fokker–Planck PDE: one face is the evolution of a density, the other is the Wasserstein gradient flow of KL divergence. Once you see this, several seemingly unrelated tools — the SVGD particle algorithm, the exponential convergence rate from a log-Sobolev inequality, the training of Bayesian neural networks — all snap onto a single picture.

ELBO on Chen Kai Blog

ML Math Derivations (14): Variational Inference and Variational EM

PDE and ML (4): Variational Inference and the Fokker-Planck Equation