ML Math Derivations (20): Regularization and Model Selection

Sun, 08 Feb 2026 09:00:00 +0000

What You Will Learn#

A 100-million-parameter network trained on 50,000 images should overfit catastrophically. Modern deep networks generalise anyway. Why? Two ingredients: regularisation (techniques that constrain capacity) and generalisation theory (mathematics that says when learning works at all). This article is the closing chapter of the series, and we use it to gather every tool we have built — least squares, MAP estimation, optimisation, EM, neural networks — and turn them on the deepest open question in the field: why does learning generalise?

ML Math Derivations (12): XGBoost and LightGBM

Sat, 31 Jan 2026 09:00:00 +0000

XGBoost and LightGBM are the two libraries that quietly win most tabular-data battles — on Kaggle leaderboards, in fraud-detection pipelines, in ad ranking, in churn models. They share the same backbone (gradient-boosted trees, Part 11 ) but make very different engineering bets:

XGBoost sharpens the math: it brings the second derivative of the loss into the objective, regularises the tree itself, and turns split selection into a closed-form score.
LightGBM sharpens the systems: it bins features into a small histogram, grows trees leaf-by-leaf, throws away uninformative samples (GOSS) and bundles mutually exclusive sparse features (EFB).

The result is two tools that look interchangeable from the API but behave very differently when $$N$$ or $$d$$ becomes large. This post derives every formula behind those choices so you can read a tuning guide and know why each knob exists.

Regularization on Chen Kai Blog

ML Math Derivations (20): Regularization and Model Selection

What You Will Learn#

ML Math Derivations (12): XGBoost and LightGBM