PDE and ML (4): Variational Inference and the Fokker-Planck Equation

Sat, 15 Jun 2024 09:00:00 +0000

Why do variational inference (a method that looks purely optimization) and Langevin MCMC (a method that looks purely sampling) end up at the same partial differential equation?

That is the heart of this article. In continuous time, they are two faces of the same Fokker–Planck PDE: one face is the evolution of a density, the other is the Wasserstein gradient flow of KL divergence. Once you see this, several seemingly unrelated tools — the SVGD particle algorithm, the exponential convergence rate from a log-Sobolev inequality, the training of Bayesian neural networks — all snap onto a single picture.

Langevin Dynamics on Chen Kai Blog

PDE and ML (4): Variational Inference and the Fokker-Planck Equation