<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Langevin Dynamics on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/langevin-dynamics/</link><description>Recent content in Langevin Dynamics on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 15 Jun 2024 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/langevin-dynamics/index.xml" rel="self" type="application/rss+xml"/><item><title>PDE and ML (4): Variational Inference and the Fokker-Planck Equation</title><link>https://www.chenk.top/en/pde-ml/04-variational-inference/</link><pubDate>Sat, 15 Jun 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/04-variational-inference/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/04-Variational-Inference/illustration_1.png" alt="PDE and ML (4): Variational Inference and the Fokker-Planck Equation — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>Why do variational inference (a method that looks purely optimization) and Langevin MCMC (a method that looks purely sampling) end up at the same partial differential equation?&lt;/p>
&lt;p>That is the heart of this article. In continuous time, they are &lt;strong>two faces of the same Fokker–Planck PDE&lt;/strong>: one face is the evolution of a density, the other is the Wasserstein gradient flow of KL divergence. Once you see this, several seemingly unrelated tools — the SVGD particle algorithm, the exponential convergence rate from a log-Sobolev inequality, the training of Bayesian neural networks — all snap onto a single picture.&lt;/p></description></item></channel></rss>