Neural Networks on Chen Kai Blog

ML Math Derivations (19): Neural Networks and Backpropagation

Sat, 07 Feb 2026 09:00:00 +0000

Hook. In 1969 Minsky and Papert proved that a single perceptron could not learn XOR, and connectionist research went into a fifteen-year freeze. The thaw came when Rumelhart, Hinton and Williams realised that stacking perceptrons makes the problem disappear — and that the same chain rule everyone learns in calculus, applied carefully, computes every gradient in a multilayer network for the cost of a single extra forward pass. That algorithm is backpropagation. Every gradient in every Transformer, every diffusion model, every GPT trained today still runs on it.

Recommendation Systems (3): Deep Learning Foundations

Sun, 07 Dec 2025 09:00:00 +0000

In June 2016, Google published a one-page paper that quietly redrew the map of recommendation systems. The paper described Wide & Deep Learning, the model then powering app recommendations inside Google Play — a billion-user product. Within a year, every major tech company had a deep model in production. By 2019, the industry standard had shifted: matrix factorization was a baseline, not a system.

What changed? Multi-layer neural networks brought four capabilities classical methods could not deliver:

Essence of Linear Algebra (16): Linear Algebra in Deep Learning

Wed, 16 Apr 2025 09:00:00 +0000

Strip away the marketing and a deep network is one thing: a long pipeline of matrix multiplications glued together by elementwise nonlinearities. Forward pass, backward pass, convolution, attention, normalization, fine-tuning — every “trick” is a small twist on the same algebraic theme. Once you see the matrices, the field stops looking like a bag of recipes and starts looking like a single language.

This chapter rebuilds the modern stack from that single language. We follow one signal — a vector $\mathbf{x}$ — as it flows through linear layers, gets convolved, gets attended to, gets normalized, and gets adapted by a low-rank update. At each step we name the matrix that does the work and the property of that matrix (rank, conditioning, transpose) that makes the trick succeed.

PDE and ML (1): Physics-Informed Neural Networks

Wed, 01 May 2024 09:00:00 +0000

Series chapter 1 — about a 35-minute read. This is the foundation of the entire series. Neural operators, variational principles, score matching — every later chapter is, at heart, the same idea: how to encode physical or mathematical constraints directly into the neural network’s optimization objective. Master PINNs, and the rest is just swapping one constraint for another.

Prologue: a metal rod#

Suppose you want the temperature distribution $$u(x,t)$$ along a metal rod. Half a century of numerical analysis offers two standard answers: