<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>ELBO on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/elbo/</link><description>Recent content in ELBO on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 02 Feb 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/elbo/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (14): Variational Inference and Variational EM</title><link>https://www.chenk.top/en/ml-math-derivations/14-variational-inference-and-variational-em/</link><pubDate>Mon, 02 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/14-variational-inference-and-variational-em/</guid><description>&lt;p>When the posterior &lt;span class="math-inline">$p(\mathbf{z}\mid\mathbf{x})$&lt;/span>
 is intractable, you have two roads. &lt;strong>Sampling&lt;/strong> (MCMC) walks a Markov chain whose stationary distribution is the posterior — eventually exact, but slow and hard to diagnose. &lt;strong>Variational inference&lt;/strong> (VI) instead picks a simple family &lt;span class="math-inline">$\mathcal{Q}$&lt;/span>
 of distributions and finds the member &lt;span class="math-inline">$q^\star\in\mathcal{Q}$&lt;/span>
 that lies closest to the true posterior. Inference becomes optimization, and the same machinery that fits a neural network now fits a Bayesian model.&lt;/p>
&lt;p>This post derives VI from a single identity, builds the mean-field algorithm and CAVI from that identity, connects EM and variational EM as special cases, and ends with the reparameterization trick that turns the ELBO into a stochastic objective compatible with autodiff — the engine inside every VAE.&lt;/p></description></item><item><title>PDE and ML (4): Variational Inference and the Fokker-Planck Equation</title><link>https://www.chenk.top/en/pde-ml/04-variational-inference/</link><pubDate>Sat, 15 Jun 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/04-variational-inference/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/04-Variational-Inference/illustration_1.png" alt="PDE and ML (4): Variational Inference and the Fokker-Planck Equation — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>Why do variational inference (a method that looks purely optimization) and Langevin MCMC (a method that looks purely sampling) end up at the same partial differential equation?&lt;/p>
&lt;p>That is the heart of this article. In continuous time, they are &lt;strong>two faces of the same Fokker–Planck PDE&lt;/strong>: one face is the evolution of a density, the other is the Wasserstein gradient flow of KL divergence. Once you see this, several seemingly unrelated tools — the SVGD particle algorithm, the exponential convergence rate from a log-Sobolev inequality, the training of Bayesian neural networks — all snap onto a single picture.&lt;/p></description></item></channel></rss>