<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Generative Models on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/generative-models/</link><description>Recent content in Generative Models on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 30 Jul 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/generative-models/index.xml" rel="self" type="application/rss+xml"/><item><title>Reparameterization Trick &amp; Gumbel-Softmax: A Deep Dive</title><link>https://www.chenk.top/en/standalone/reparameterization-gumbel-softmax/</link><pubDate>Wed, 30 Jul 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/reparameterization-gumbel-softmax/</guid><description>&lt;p>The moment your model contains a sampling step, training hits a hard wall: &lt;strong>how do gradients flow through a random node?&lt;/strong>&lt;/p>
&lt;p>The reparameterization trick has a clean answer — rewrite &lt;span class="math-inline">$z\sim p_\theta(z)$&lt;/span>
 as &lt;span class="math-inline">$z=g_\theta(\epsilon)$&lt;/span>
, isolating the randomness in a parameter-free noise variable &lt;span class="math-inline">$\epsilon$&lt;/span>
, so backprop can flow through &lt;span class="math-inline">$g_\theta$&lt;/span>
. The trouble starts with discrete variables: operations like &lt;span class="math-inline">$\arg\max$&lt;/span>
 are not differentiable. &lt;strong>Gumbel-Softmax&lt;/strong> (a.k.a. the Concrete distribution) replaces the discrete sample with a tempered softmax over Gumbel-perturbed logits, giving you a smooth, differentiable surrogate that you can train end-to-end.&lt;/p></description></item><item><title>PDE and ML (7): Diffusion Models and Score Matching</title><link>https://www.chenk.top/en/pde-ml/07-diffusion-models/</link><pubDate>Tue, 30 Jul 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/07-diffusion-models/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/07-Diffusion-Models/illustration_1.png" alt="PDE and ML (7): Diffusion Models and Score Matching — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>The output side of a diffusion model is familiar: a high-quality image. The training objective, on the other hand, looks counter-intuitive at first sight — &lt;strong>add noise to the data until it is fully Gaussian, then learn to denoise step by step&lt;/strong>. Why is this detour more effective than learning the data distribution directly?&lt;/p>
&lt;p>The answer is hidden in PDEs. The forward noising process is a &lt;strong>heat equation&lt;/strong> (or, more generally, a Fokker–Planck equation), and it admits a reverse-time version — provided we know the score (the gradient of the log-density) at every time. &lt;strong>Score matching&lt;/strong> is the standard way to learn that score. From this angle, DDPM, DDIM, and score-based SDEs are not three different algorithms but three discretizations of the same PDE story.&lt;/p></description></item><item><title>Variational Autoencoder (VAE): From Intuition to Implementation and Troubleshooting</title><link>https://www.chenk.top/en/standalone/vae-guide/</link><pubDate>Tue, 27 Jun 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/vae-guide/</guid><description>&lt;p>A plain autoencoder compresses and reconstructs. A variational autoencoder learns something far more useful: a smooth, structured latent space you can &lt;em>sample&lt;/em> from to generate genuinely new data. That single change — making the encoder output a &lt;em>distribution&lt;/em> instead of a vector — turns the network from a fancy compressor into a generative model with a tractable likelihood lower bound.&lt;/p>
&lt;p>This guide walks the full path: why autoencoders fail at generation, how the ELBO derivation gets you to the loss function, why the reparameterization trick is the trick that makes everything trainable, a complete PyTorch implementation, and a tour of every common failure mode with concrete fixes.&lt;/p></description></item></channel></rss>