<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Neural Networks on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/neural-networks/</link><description>Recent content in Neural Networks on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 07 Feb 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/neural-networks/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (19): Neural Networks and Backpropagation</title><link>https://www.chenk.top/en/ml-math-derivations/19-neural-networks-and-backpropagation/</link><pubDate>Sat, 07 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/19-neural-networks-and-backpropagation/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/ml-math-derivations/19-Neural-Networks-and-Backpropagation/illustration_1.png" alt="ML Math Derivations (19): Neural Networks and Backpropagation — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> In 1969 Minsky and Papert proved that a single perceptron could not learn XOR, and connectionist research went into a fifteen-year freeze. The thaw came when Rumelhart, Hinton and Williams realised that &lt;em>stacking&lt;/em> perceptrons makes the problem disappear — and that the same chain rule everyone learns in calculus, applied carefully, computes every gradient in a multilayer network for the cost of a single extra forward pass. That algorithm is backpropagation. Every gradient in every Transformer, every diffusion model, every GPT trained today still runs on it.&lt;/p></description></item><item><title>Recommendation Systems (3): Deep Learning Foundations</title><link>https://www.chenk.top/en/recommendation-systems/03-deep-learning-basics/</link><pubDate>Sun, 07 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/03-deep-learning-basics/</guid><description>&lt;p>In June 2016, Google published a one-page paper that quietly redrew the map of recommendation systems. The paper described &lt;strong>Wide &amp;amp; Deep Learning&lt;/strong>, the model then powering app recommendations inside Google Play — a billion-user product. Within a year, every major tech company had a deep model in production. By 2019, the industry standard had shifted: matrix factorization was a baseline, not a system.&lt;/p>
&lt;p>What changed? Multi-layer neural networks brought four capabilities classical methods could not deliver:&lt;/p></description></item><item><title>Essence of Linear Algebra (16): Linear Algebra in Deep Learning</title><link>https://www.chenk.top/en/linear-algebra/16-linear-algebra-in-deep-learning/</link><pubDate>Wed, 16 Apr 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/16-linear-algebra-in-deep-learning/</guid><description>&lt;p>Strip away the marketing and a deep network is one thing: a long pipeline of matrix multiplications glued together by elementwise nonlinearities. Forward pass, backward pass, convolution, attention, normalization, fine-tuning — every &amp;ldquo;trick&amp;rdquo; is a small twist on the same algebraic theme. Once you see the matrices, the field stops looking like a bag of recipes and starts looking like a single language.&lt;/p>
&lt;p>This chapter rebuilds the modern stack from that single language. We follow one signal — a vector &lt;span class="math-inline">$\mathbf{x}$&lt;/span>
 — as it flows through linear layers, gets convolved, gets attended to, gets normalized, and gets adapted by a low-rank update. At each step we name the matrix that does the work and the property of that matrix (rank, conditioning, transpose) that makes the trick succeed.&lt;/p></description></item><item><title>PDE and ML (1): Physics-Informed Neural Networks</title><link>https://www.chenk.top/en/pde-ml/01-physics-informed-neural-networks/</link><pubDate>Wed, 01 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/01-physics-informed-neural-networks/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Series chapter 1 — about a 35-minute read.&lt;/strong> This is the foundation of the entire series. Neural operators, variational principles, score matching — every later chapter is, at heart, &lt;em>the same idea&lt;/em>: how to encode physical or mathematical constraints directly into the neural network&amp;rsquo;s optimization objective. Master PINNs, and the rest is just swapping one constraint for another.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/pde-ml/01-Physics-Informed-Neural-Networks/illustration_1.png" alt="PDE and ML (1): Physics-Informed Neural Networks — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;h2 id="prologue-a-metal-rod" class="heading-anchor">Prologue: a metal rod&lt;a href="#prologue-a-metal-rod" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>Suppose you want the temperature distribution &lt;span class="math-inline">$u(x,t)$&lt;/span>
 along a metal rod. Half a century of numerical analysis offers two standard answers:&lt;/p></description></item></channel></rss>