PDE & ML on Chen Kai Blog

PDE and Machine Learning (8): Reaction-Diffusion Systems and Graph Neural Networks

Wed, 14 Aug 2024 09:00:00 +0000

What This Article Covers

Stack 32 layers of GCN on a citation graph and accuracy collapses from 81 % to 20 %. Every node converges to the same vector. This is over-smoothing, the GNN equivalent of heat death — and the diagnosis comes straight from PDE theory. A GCN layer is one explicit-Euler step of the heat equation on a graph, and the heat equation has exactly one fixed point: the constant. The cure was published in 1952. Alan Turing showed that adding a reaction term to a diffusion equation can make a uniform state spontaneously break apart into stripes, spots, or labyrinths. The same trick — a learned reaction term — keeps deep GNNs alive.

PDE and Machine Learning (7): Diffusion Models and Score Matching

Tue, 30 Jul 2024 09:00:00 +0000

What This Article Covers

Since 2020, diffusion models have become the dominant paradigm in generative AI. From DALL·E 2 to Stable Diffusion to Sora, their generation quality and training stability are unmatched by GANs and VAEs. Beneath this success lies a remarkably clean mathematical structure: diffusion models are numerical solvers for partial differential equations.

Adding Gaussian noise corresponds to integrating the Fokker–Planck equation forward in time.
Learning to denoise is equivalent to learning the score function $\nabla\log p_t$.
DDPM is a discretised reverse SDE; DDIM is the corresponding probability-flow ODE.
Stable Diffusion is the same machinery, executed in a low-dimensional latent space.

What you will learn

PDE and Machine Learning (6): Continuous Normalizing Flows and Neural ODE

Mon, 15 Jul 2024 09:00:00 +0000

What This Article Covers

Generative modeling reduces to one geometric question: how do you transform a simple distribution (a Gaussian) into a complex one (faces, molecules, motion)? Discrete normalizing flows stack invertible blocks, but each block needs a Jacobian determinant at $O(d^3)$ cost. Neural ODEs replace discrete depth with a continuous ODE; Continuous Normalizing Flows (CNF) then push densities through that ODE using the instantaneous change-of-variables formula, dropping density computation to $O(d)$. Flow Matching removes the divergence integral altogether and turns training into plain regression on a target velocity field.

PDE and Machine Learning (5): Symplectic Geometry and Structure-Preserving Networks

Sun, 30 Jun 2024 09:00:00 +0000

What this article covers

Train an unconstrained neural network on pendulum data and ask it to extrapolate. After a few seconds of integration the prediction is fine; after a minute the pendulum has either crept to a halt or, more often, accelerated to escape velocity. Energy was supposed to be conserved, but the network has no idea what energy is. The bug is not in the data, the optimizer, or the depth of the network. The bug is in the architecture. A standard MLP can represent any vector field, including unphysical ones, and a tiny systematic bias in that vector field is amplified into macroscopic energy drift over a long rollout.

PDE and Machine Learning (4): Variational Inference and the Fokker-Planck Equation

Sat, 15 Jun 2024 09:00:00 +0000

Seven Dimensions of This Article

Motivation: why VI and MCMC look different but solve the same PDE.
Theory: derivation of the Fokker-Planck equation from the SDE.
Geometry: KL divergence as a Wasserstein gradient flow.
Algorithms: Langevin Monte Carlo, mean-field VI, and SVGD.
Convergence: log-Sobolev inequality and exponential KL decay.
Numerical experiments: 7 figures with reproducible code.
Application: Bayesian neural networks via posterior sampling.

What You Will Learn

How the Fokker-Planck equation governs probability density evolution from any Itô SDE.
Langevin dynamics as a practical sampling algorithm and its discretization error.
Why minimizing $\mathrm{KL}(q\|p^\star)$ in Wasserstein space is the Fokker-Planck PDE.
The deep equivalence between variational inference and Langevin MCMC in continuous time.
Stein Variational Gradient Descent (SVGD): a deterministic particle method that bridges both worlds.
Practical posterior inference for Bayesian neural networks.

Prerequisites

Probability theory (Bayes’ rule, KL divergence, expectations).
Wasserstein gradient flows from Part 3.
Light stochastic calculus intuition (Brownian motion, Itô integral).
Python / PyTorch for the experiments.

1. The Inference Problem

Bayesian inference asks for the posterior

PDE and Machine Learning (3): Variational Principles and Optimization

Fri, 31 May 2024 09:00:00 +0000

What is the essence of neural-network training? When we run gradient descent in a high-dimensional parameter space, is there a deeper continuous-time dynamics at work? As the network width tends to infinity, does discrete parameter updating converge to some elegant partial differential equation? The answers live at the intersection of the calculus of variations, optimal transport, and PDE theory.

The last decade of deep-learning success has rested mostly on engineering intuition. Recently, however, mathematicians have made a striking observation: viewing a neural network as a particle system on the space of probability measures, and studying its evolution under Wasserstein geometry, exposes the global structure of training — convergence guarantees, the role of over-parameterization, the meaning of initialization. The tool that makes this visible is the variational principle — from least action in physics, through the JKO scheme of modern optimal transport, to the mean-field limit of neural networks.

PDE and Machine Learning (2) — Neural Operator Theory

Thu, 16 May 2024 09:00:00 +0000

A classical PDE solver — finite difference, finite element, spectral — is a function: feed it one initial condition and one set of coefficients, get back one solution. A PINN is the same kind of object dressed in neural-network clothes: each new initial condition demands a fresh round of training. Switch the inflow velocity on a wing or move a single sensor reading in a forecast and you reset the clock.

PDE and Machine Learning (1): Physics-Informed Neural Networks

Wed, 01 May 2024 09:00:00 +0000

Series chapter 1 — about a 35-minute read. This is the foundation of the entire series. Neural operators, variational principles, score matching — every later chapter is, at heart, the same idea: how do we encode physical or mathematical constraints directly into the optimisation objective of a neural network? Get PINNs right and the rest is “swap one constraint for another”.

1 Prologue: a metal rod

Suppose you want the temperature distribution $u(x,t)$ along a metal rod. Half a century of numerical analysis offers two standard answers: