<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mean-Field on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/mean-field/</link><description>Recent content in Mean-Field on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 02 Feb 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/mean-field/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (14): Variational Inference and Variational EM</title><link>https://www.chenk.top/en/ml-math-derivations/14-variational-inference-and-variational-em/</link><pubDate>Mon, 02 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/14-variational-inference-and-variational-em/</guid><description>&lt;p>When the posterior &lt;span class="math-inline">$p(\mathbf{z}\mid\mathbf{x})$&lt;/span>
 is intractable, you have two roads. &lt;strong>Sampling&lt;/strong> (MCMC) walks a Markov chain whose stationary distribution is the posterior — eventually exact, but slow and hard to diagnose. &lt;strong>Variational inference&lt;/strong> (VI) instead picks a simple family &lt;span class="math-inline">$\mathcal{Q}$&lt;/span>
 of distributions and finds the member &lt;span class="math-inline">$q^\star\in\mathcal{Q}$&lt;/span>
 that lies closest to the true posterior. Inference becomes optimization, and the same machinery that fits a neural network now fits a Bayesian model.&lt;/p>
&lt;p>This post derives VI from a single identity, builds the mean-field algorithm and CAVI from that identity, connects EM and variational EM as special cases, and ends with the reparameterization trick that turns the ELBO into a stochastic objective compatible with autodiff — the engine inside every VAE.&lt;/p></description></item><item><title>PDE and ML (3): Variational Principles and Optimization</title><link>https://www.chenk.top/en/pde-ml/03-variational-principles/</link><pubDate>Fri, 31 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/03-variational-principles/</guid><description>&lt;p>What is the essence of neural-network training? When we run gradient descent in a high-dimensional parameter space, is there a deeper continuous-time dynamics at work? As the network width tends to infinity, does discrete parameter updating converge to some elegant partial differential equation? The answers live at the intersection of the calculus of variations, optimal transport, and PDE theory.&lt;/p>
&lt;p>The last decade of deep-learning success has rested mostly on engineering intuition. Recently, however, mathematicians have made a striking observation: &lt;strong>viewing a neural network as a particle system on the space of probability measures&lt;/strong>, and studying its evolution under Wasserstein geometry, exposes the global structure of training — convergence guarantees, the role of over-parameterization, the meaning of initialization. The tool that makes this visible is &lt;strong>the variational principle&lt;/strong> — from least action in physics, through the JKO scheme of modern optimal transport, to the mean-field limit of neural networks.&lt;/p></description></item></channel></rss>