<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Pde-Ml on Chen Kai Blog</title><link>https://www.chenk.top/en/series/pde-ml/</link><description>Recent content in Pde-Ml on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 14 Aug 2024 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/series/pde-ml/index.xml" rel="self" type="application/rss+xml"/><item><title>PDE and ML (8): Reaction-Diffusion Systems and Graph Neural Networks</title><link>https://www.chenk.top/en/pde-ml/08-reaction-diffusion-systems/</link><pubDate>Wed, 14 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/08-reaction-diffusion-systems/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/08-Reaction-Diffusion-Systems/illustration_1.png" alt="PDE and ML (8): Reaction-Diffusion Systems and Graph Neural Networks — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>Anyone who has trained a deep GNN has seen it collapse — past a dozen or so layers, every node&amp;rsquo;s embedding becomes nearly identical and the model goes mush. There is a name for this — &lt;strong>over-smoothing&lt;/strong> — and the underlying math is surprisingly clean: &lt;strong>GNN message passing is essentially a diffusion equation on the graph&lt;/strong>, and diffusion&amp;rsquo;s long-time behavior is to flatten everything to a constant.&lt;/p></description></item><item><title>PDE and ML (7): Diffusion Models and Score Matching</title><link>https://www.chenk.top/en/pde-ml/07-diffusion-models/</link><pubDate>Tue, 30 Jul 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/07-diffusion-models/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/07-Diffusion-Models/illustration_1.png" alt="PDE and ML (7): Diffusion Models and Score Matching — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>The output side of a diffusion model is familiar: a high-quality image. The training objective, on the other hand, looks counter-intuitive at first sight — &lt;strong>add noise to the data until it is fully Gaussian, then learn to denoise step by step&lt;/strong>. Why is this detour more effective than learning the data distribution directly?&lt;/p>
&lt;p>The answer is hidden in PDEs. The forward noising process is a &lt;strong>heat equation&lt;/strong> (or, more generally, a Fokker–Planck equation), and it admits a reverse-time version — provided we know the score (the gradient of the log-density) at every time. &lt;strong>Score matching&lt;/strong> is the standard way to learn that score. From this angle, DDPM, DDIM, and score-based SDEs are not three different algorithms but three discretizations of the same PDE story.&lt;/p></description></item><item><title>PDE and ML (6): Continuous Normalizing Flows and Neural ODE</title><link>https://www.chenk.top/en/pde-ml/06-continuous-normalizing-flows/</link><pubDate>Mon, 15 Jul 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/06-continuous-normalizing-flows/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/06-Continuous-Normalizing-Flows/illustration_1.png" alt="PDE and ML (6): Continuous Normalizing Flows and Neural ODE — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>How do you turn an isotropic Gaussian into a photograph of a cat?&lt;/p>
&lt;p>Normalizing flows give the most direct answer: stack a sequence of invertible transformations and let them push the simple distribution into the complex one. This article&amp;rsquo;s continuous version (CNF) takes that idea to the limit — let the step size go to zero and the discrete chain becomes an ODE. Invertibility is automatic, and the change of density is governed by the instantaneous change of variables formula.&lt;/p></description></item><item><title>PDE and ML (5): Symplectic Geometry and Structure-Preserving Networks</title><link>https://www.chenk.top/en/pde-ml/05-symplectic-geometry/</link><pubDate>Sun, 30 Jun 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/05-symplectic-geometry/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/05-Symplectic-Geometry/illustration_1.png" alt="PDE and ML (5): Symplectic Geometry and Structure-Preserving Networks — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>A pendulum keeps swinging for a very long time without slowly winding down — energy is conserved. The Earth orbits the Sun for billions of years without flying off — angular momentum is conserved. Behind every &amp;ldquo;this quantity stays constant&amp;rdquo; lurks a piece of geometry called &lt;strong>symplectic structure&lt;/strong>.&lt;/p>
&lt;p>Train a vanilla Neural ODE on pendulum data: after a few hundred steps the energy drifts. The network can fit the short-term trajectory just fine; what it can&amp;rsquo;t fit is the long-time conservation law. &lt;strong>Structure-preserving networks&lt;/strong> (HNN, LNN, SympNet) take a different approach: bake the conservation law into the architecture so the network &lt;em>cannot&lt;/em> violate it.&lt;/p></description></item><item><title>PDE and ML (4): Variational Inference and the Fokker-Planck Equation</title><link>https://www.chenk.top/en/pde-ml/04-variational-inference/</link><pubDate>Sat, 15 Jun 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/04-variational-inference/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/04-Variational-Inference/illustration_1.png" alt="PDE and ML (4): Variational Inference and the Fokker-Planck Equation — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;p>Why do variational inference (a method that looks purely optimization) and Langevin MCMC (a method that looks purely sampling) end up at the same partial differential equation?&lt;/p>
&lt;p>That is the heart of this article. In continuous time, they are &lt;strong>two faces of the same Fokker–Planck PDE&lt;/strong>: one face is the evolution of a density, the other is the Wasserstein gradient flow of KL divergence. Once you see this, several seemingly unrelated tools — the SVGD particle algorithm, the exponential convergence rate from a log-Sobolev inequality, the training of Bayesian neural networks — all snap onto a single picture.&lt;/p></description></item><item><title>PDE and ML (3): Variational Principles and Optimization</title><link>https://www.chenk.top/en/pde-ml/03-variational-principles/</link><pubDate>Fri, 31 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/03-variational-principles/</guid><description>&lt;p>What is the essence of neural-network training? When we run gradient descent in a high-dimensional parameter space, is there a deeper continuous-time dynamics at work? As the network width tends to infinity, does discrete parameter updating converge to some elegant partial differential equation? The answers live at the intersection of the calculus of variations, optimal transport, and PDE theory.&lt;/p>
&lt;p>The last decade of deep-learning success has rested mostly on engineering intuition. Recently, however, mathematicians have made a striking observation: &lt;strong>viewing a neural network as a particle system on the space of probability measures&lt;/strong>, and studying its evolution under Wasserstein geometry, exposes the global structure of training — convergence guarantees, the role of over-parameterization, the meaning of initialization. The tool that makes this visible is &lt;strong>the variational principle&lt;/strong> — from least action in physics, through the JKO scheme of modern optimal transport, to the mean-field limit of neural networks.&lt;/p></description></item><item><title>PDE and ML (2): Neural Operator Theory</title><link>https://www.chenk.top/en/pde-ml/02-neural-operator-theory/</link><pubDate>Thu, 16 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/02-neural-operator-theory/</guid><description>&lt;p>A classical PDE solver — finite difference, finite element, spectral — is a function: feed it one initial condition and one set of coefficients, get back one solution. A PINN is the same kind of object dressed in neural-network clothes: each new initial condition demands a fresh round of training. Switch the inflow velocity on a wing or move a single sensor reading in a forecast and you reset the clock.&lt;/p></description></item><item><title>PDE and ML (1): Physics-Informed Neural Networks</title><link>https://www.chenk.top/en/pde-ml/01-physics-informed-neural-networks/</link><pubDate>Wed, 01 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/01-physics-informed-neural-networks/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Series chapter 1 — about a 35-minute read.&lt;/strong> This is the foundation of the entire series. Neural operators, variational principles, score matching — every later chapter is, at heart, &lt;em>the same idea&lt;/em>: how to encode physical or mathematical constraints directly into the neural network&amp;rsquo;s optimization objective. Master PINNs, and the rest is just swapping one constraint for another.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/01-Physics-Informed-Neural-Networks/illustration_1.png" alt="PDE and ML (1): Physics-Informed Neural Networks — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;h2 id="prologue-a-metal-rod" class="heading-anchor">Prologue: a metal rod&lt;a href="#prologue-a-metal-rod" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>Suppose you want the temperature distribution &lt;span class="math-inline">$u(x,t)$&lt;/span>
 along a metal rod. Half a century of numerical analysis offers two standard answers:&lt;/p></description></item></channel></rss>