ML Math Derivations (11): Ensemble Learning

Fri, 30 Jan 2026 09:00:00 +0000

Why do mediocre classifiers in a committee outperform a single brilliant one? The answer is straightforward: averaging reduces variance, sequential reweighting reduces bias, and a bit of randomization breaks the correlation that would otherwise negate these benefits. This post delves into the math behind this — bias-variance decomposition, bootstrap aggregating, AdaBoost as forward stagewise minimization of exponential loss, and gradient boosting as gradient descent in function space.

By the end, you should be able to look at any ensemble method and say what it reduces, why it works, and when it fails.

Ensemble Learning on Chen Kai Blog

ML Math Derivations (11): Ensemble Learning