Generative Models on Chen Kai Blog

Reparameterization Trick & Gumbel-Softmax: A Deep Dive

Wed, 30 Jul 2025 09:00:00 +0000

The moment your model contains a sampling step, training hits a hard wall: how do gradients flow through a random node?

The reparameterization trick has a clean answer — rewrite $z\sim p_\theta(z)$ as $z=g_\theta(\epsilon)$ , isolating the randomness in a parameter-free noise variable $\epsilon$ , so backprop can flow through $g_\theta$ . The trouble starts with discrete variables: operations like $\arg\max$ are not differentiable. Gumbel-Softmax (a.k.a. the Concrete distribution) replaces the discrete sample with a tempered softmax over Gumbel-perturbed logits, giving you a smooth, differentiable surrogate that you can train end-to-end.

PDE and ML (7): Diffusion Models and Score Matching

Tue, 30 Jul 2024 09:00:00 +0000

The output side of a diffusion model is familiar: a high-quality image. The training objective, on the other hand, looks counter-intuitive at first sight — add noise to the data until it is fully Gaussian, then learn to denoise step by step. Why is this detour more effective than learning the data distribution directly?

The answer is hidden in PDEs. The forward noising process is a heat equation (or, more generally, a Fokker–Planck equation), and it admits a reverse-time version — provided we know the score (the gradient of the log-density) at every time. Score matching is the standard way to learn that score. From this angle, DDPM, DDIM, and score-based SDEs are not three different algorithms but three discretizations of the same PDE story.

Variational Autoencoder (VAE): From Intuition to Implementation and Troubleshooting

Tue, 27 Jun 2023 09:00:00 +0000

A plain autoencoder compresses and reconstructs. A variational autoencoder learns something far more useful: a smooth, structured latent space you can sample from to generate genuinely new data. That single change — making the encoder output a distribution instead of a vector — turns the network from a fancy compressor into a generative model with a tractable likelihood lower bound.

This guide walks the full path: why autoencoders fail at generation, how the ELBO derivation gets you to the loss function, why the reparameterization trick is the trick that makes everything trainable, a complete PyTorch implementation, and a tour of every common failure mode with concrete fixes.