PDE on Chen Kai Blog

Symplectic Geometry and Structure-Preserving Neural Networks

Mon, 28 Jul 2025 09:00:00 +0000

Train a vanilla feedforward network to predict a one-dimensional harmonic oscillator. Validate it on the next ten time steps — the error is fine. Now roll it out for a thousand steps. The orbit no longer closes, the energy creeps upward, and what should be periodic motion turns into a slow spiral. The network learned to fit data points but never learned the physics. Structure-preserving networks fix this by incorporating geometric invariants — energy conservation, the symplectic 2-form, and the Euler-Lagrange equations — directly into the architecture, ensuring the learned model cannot violate them no matter how long you integrate.

PDE and ML (8): Reaction-Diffusion Systems and Graph Neural Networks

Wed, 14 Aug 2024 09:00:00 +0000

Anyone who has trained a deep GNN has seen it collapse — past a dozen or so layers, every node’s embedding becomes nearly identical and the model goes mush. There is a name for this — over-smoothing — and the underlying math is surprisingly clean: GNN message passing is essentially a diffusion equation on the graph, and diffusion’s long-time behavior is to flatten everything to a constant.

PDE and ML (7): Diffusion Models and Score Matching

Tue, 30 Jul 2024 09:00:00 +0000

The output side of a diffusion model is familiar: a high-quality image. The training objective, on the other hand, looks counter-intuitive at first sight — add noise to the data until it is fully Gaussian, then learn to denoise step by step. Why is this detour more effective than learning the data distribution directly?

The answer is hidden in PDEs. The forward noising process is a heat equation (or, more generally, a Fokker–Planck equation), and it admits a reverse-time version — provided we know the score (the gradient of the log-density) at every time. Score matching is the standard way to learn that score. From this angle, DDPM, DDIM, and score-based SDEs are not three different algorithms but three discretizations of the same PDE story.

PDE and ML (6): Continuous Normalizing Flows and Neural ODE

Mon, 15 Jul 2024 09:00:00 +0000

How do you turn an isotropic Gaussian into a photograph of a cat?

Normalizing flows give the most direct answer: stack a sequence of invertible transformations and let them push the simple distribution into the complex one. This article’s continuous version (CNF) takes that idea to the limit — let the step size go to zero and the discrete chain becomes an ODE. Invertibility is automatic, and the change of density is governed by the instantaneous change of variables formula.

PDE and ML (5): Symplectic Geometry and Structure-Preserving Networks

Sun, 30 Jun 2024 09:00:00 +0000

A pendulum keeps swinging for a very long time without slowly winding down — energy is conserved. The Earth orbits the Sun for billions of years without flying off — angular momentum is conserved. Behind every “this quantity stays constant” lurks a piece of geometry called symplectic structure.

Train a vanilla Neural ODE on pendulum data: after a few hundred steps the energy drifts. The network can fit the short-term trajectory just fine; what it can’t fit is the long-time conservation law. Structure-preserving networks (HNN, LNN, SympNet) take a different approach: bake the conservation law into the architecture so the network cannot violate it.

PDE and ML (4): Variational Inference and the Fokker-Planck Equation

Sat, 15 Jun 2024 09:00:00 +0000

Why do variational inference (a method that looks purely optimization) and Langevin MCMC (a method that looks purely sampling) end up at the same partial differential equation?

That is the heart of this article. In continuous time, they are two faces of the same Fokker–Planck PDE: one face is the evolution of a density, the other is the Wasserstein gradient flow of KL divergence. Once you see this, several seemingly unrelated tools — the SVGD particle algorithm, the exponential convergence rate from a log-Sobolev inequality, the training of Bayesian neural networks — all snap onto a single picture.

PDE and ML (3): Variational Principles and Optimization

Fri, 31 May 2024 09:00:00 +0000

What is the essence of neural-network training? When we run gradient descent in a high-dimensional parameter space, is there a deeper continuous-time dynamics at work? As the network width tends to infinity, does discrete parameter updating converge to some elegant partial differential equation? The answers live at the intersection of the calculus of variations, optimal transport, and PDE theory.

The last decade of deep-learning success has rested mostly on engineering intuition. Recently, however, mathematicians have made a striking observation: viewing a neural network as a particle system on the space of probability measures, and studying its evolution under Wasserstein geometry, exposes the global structure of training — convergence guarantees, the role of over-parameterization, the meaning of initialization. The tool that makes this visible is the variational principle — from least action in physics, through the JKO scheme of modern optimal transport, to the mean-field limit of neural networks.

PDE and ML (2): Neural Operator Theory

Thu, 16 May 2024 09:00:00 +0000

A classical PDE solver — finite difference, finite element, spectral — is a function: feed it one initial condition and one set of coefficients, get back one solution. A PINN is the same kind of object dressed in neural-network clothes: each new initial condition demands a fresh round of training. Switch the inflow velocity on a wing or move a single sensor reading in a forecast and you reset the clock.

PDE and ML (1): Physics-Informed Neural Networks

Wed, 01 May 2024 09:00:00 +0000

Series chapter 1 — about a 35-minute read. This is the foundation of the entire series. Neural operators, variational principles, score matching — every later chapter is, at heart, the same idea: how to encode physical or mathematical constraints directly into the neural network’s optimization objective. Master PINNs, and the rest is just swapping one constraint for another.

Prologue: a metal rod#

Suppose you want the temperature distribution $$u(x,t)$$ along a metal rod. Half a century of numerical analysis offers two standard answers:

Functional Analysis (12): Functional Analysis in Action — PDE and Quantum Mechanics

Sat, 23 Oct 2021 09:00:00 +0000

Eleven articles is a long time to spend on infrastructure. Normed spaces, Banach and Hilbert structure, dual spaces, weak topologies, bounded and unbounded operators, the spectral theorem, semigroups, distributions, Sobolev spaces — every one of those chapters paid for itself with a clean abstract result, but a reader could be forgiven for wondering when the abstraction was going to do anything. This final article is where I make good on the implicit promise of the series: every theorem we built was built because some concrete problem demanded it, and pulling those threads together gives us the modern toolkit for partial differential equations and quantum mechanics.

Functional Analysis (10): Semigroups of Operators — Evolution Equations in Infinite Dimensions

Tue, 19 Oct 2021 09:00:00 +0000

The simplest interesting differential equation is $$u' = a u$$ , with $a \in \mathbb{R}$ . The solution $u(t) = e^{at} u_0$ is so familiar that it is easy to forget it is a piece of structure: the map $T(t) = e^{at}$ is a one-parameter family of operators on $\mathbb{R}$ satisfying $$T(0) = I$$ , $$T(t + s) = T(t) T(s)$$ , and continuity in $$t$$ . Replace $$a$$ with a self-adjoint matrix $$A$$ and you have $T(t) = e^{tA}$ , the matrix exponential, which solves the system $$u' = Au$$ . Replace $$A$$ with an unbounded operator on a Hilbert space — the Laplacian, the Schrödinger Hamiltonian, a Fokker-Planck operator — and you would like to do the same thing. But the matrix-exponential power series may not converge, the operator may not be defined on all of $$H$$ , and ordinary calculus stops working.