ML Math Derivations (20): Regularization and Model Selection

Sun, 08 Feb 2026 09:00:00 +0000

What You Will Learn#

A 100-million-parameter network trained on 50,000 images should overfit catastrophically. Modern deep networks generalise anyway. Why? Two ingredients: regularisation (techniques that constrain capacity) and generalisation theory (mathematics that says when learning works at all). This article is the closing chapter of the series, and we use it to gather every tool we have built — least squares, MAP estimation, optimisation, EM, neural networks — and turn them on the deepest open question in the field: why does learning generalise?

ML Math Derivations (1): Introduction and Mathematical Foundations

Tue, 20 Jan 2026 09:00:00 +0000

What this chapter does#

In 2005 Google Research showed, on a public benchmark, that a statistical translation model trained on raw bilingual text could outperform decades of carefully engineered linguistic rules. The conclusion was uncomfortable for the experts of the day, but mathematically liberating: a system that has never been told the rules of a language can still recover them, given enough examples. Why?

VC Dimension on Chen Kai Blog

ML Math Derivations (20): Regularization and Model Selection

What You Will Learn#

ML Math Derivations (1): Introduction and Mathematical Foundations

What this chapter does#