<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>SDE on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/sde/</link><description>Recent content in SDE on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 30 Jul 2024 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/sde/index.xml" rel="self" type="application/rss+xml"/><item><title>PDE and ML (7): Diffusion Models and Score Matching</title><link>https://www.chenk.top/en/pde-ml/07-diffusion-models/</link><pubDate>Tue, 30 Jul 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/07-diffusion-models/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/07-Diffusion-Models/illustration_1.png" alt="PDE and ML (7): Diffusion Models and Score Matching — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>The output side of a diffusion model is familiar: a high-quality image. The training objective, on the other hand, looks counter-intuitive at first sight — &lt;strong>add noise to the data until it is fully Gaussian, then learn to denoise step by step&lt;/strong>. Why is this detour more effective than learning the data distribution directly?&lt;/p>
&lt;p>The answer is hidden in PDEs. The forward noising process is a &lt;strong>heat equation&lt;/strong> (or, more generally, a Fokker–Planck equation), and it admits a reverse-time version — provided we know the score (the gradient of the log-density) at every time. &lt;strong>Score matching&lt;/strong> is the standard way to learn that score. From this angle, DDPM, DDIM, and score-based SDEs are not three different algorithms but three discretizations of the same PDE story.&lt;/p></description></item></channel></rss>