<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Regularization on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/regularization/</link><description>Recent content in Regularization on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 08 Feb 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/regularization/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (20): Regularization and Model Selection</title><link>https://www.chenk.top/en/ml-math-derivations/20-regularization-and-model-selection/</link><pubDate>Sun, 08 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/20-regularization-and-model-selection/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/ml-math-derivations/20-Regularization-and-Model-Selection/illustration_1.png" alt="ML Math Derivations (20): Regularization and Model Selection — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;h2 id="what-you-will-learn" class="heading-anchor">What You Will Learn&lt;a href="#what-you-will-learn" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>A 100-million-parameter network trained on 50,000 images &lt;em>should&lt;/em> overfit catastrophically. Modern deep networks generalise anyway. &lt;strong>Why?&lt;/strong> Two ingredients: &lt;em>regularisation&lt;/em> (techniques that constrain capacity) and &lt;em>generalisation theory&lt;/em> (mathematics that says when learning works at all). This article is the closing chapter of the series, and we use it to gather every tool we have built — least squares, MAP estimation, optimisation, EM, neural networks — and turn them on the deepest open question in the field: &lt;em>why does learning generalise?&lt;/em>&lt;/p></description></item><item><title>ML Math Derivations (12): XGBoost and LightGBM</title><link>https://www.chenk.top/en/ml-math-derivations/12-xgboost-and-lightgbm/</link><pubDate>Sat, 31 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/12-xgboost-and-lightgbm/</guid><description>&lt;p>XGBoost and LightGBM are the two libraries that quietly win most tabular-data battles &amp;mdash; on Kaggle leaderboards, in fraud-detection pipelines, in ad ranking, in churn models. They share the same backbone (gradient-boosted trees, &lt;a href="https://www.chenk.top/en/ml-math-derivations/11-ensemble-learning/">Part 11&lt;/a>
) but make very different engineering bets:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>XGBoost&lt;/strong> sharpens the &lt;em>math&lt;/em>: it brings the second derivative of the loss into the objective, regularises the tree itself, and turns split selection into a closed-form score.&lt;/li>
&lt;li>&lt;strong>LightGBM&lt;/strong> sharpens the &lt;em>systems&lt;/em>: it bins features into a small histogram, grows trees leaf-by-leaf, throws away uninformative samples (GOSS) and bundles mutually exclusive sparse features (EFB).&lt;/li>
&lt;/ul>
&lt;p>The result is two tools that look interchangeable from the API but behave very differently when &lt;span class="math-inline">$N$&lt;/span>
 or &lt;span class="math-inline">$d$&lt;/span>
 becomes large. This post derives every formula behind those choices so you can read a tuning guide and know &lt;em>why&lt;/em> each knob exists.&lt;/p></description></item></channel></rss>