<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ensemble Learning on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/ensemble-learning/</link><description>Recent content in Ensemble Learning on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 30 Jan 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/ensemble-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (11): Ensemble Learning</title><link>https://www.chenk.top/en/ml-math-derivations/11-ensemble-learning/</link><pubDate>Fri, 30 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/11-ensemble-learning/</guid><description>&lt;p>Why do mediocre classifiers in a committee outperform a single brilliant one? The answer is straightforward: averaging reduces variance, sequential reweighting reduces bias, and a bit of randomization breaks the correlation that would otherwise negate these benefits. This post delves into the math behind this — bias-variance decomposition, bootstrap aggregating, AdaBoost as forward stagewise minimization of exponential loss, and gradient boosting as gradient descent in function space.&lt;/p>
&lt;p>By the end, you should be able to look at any ensemble method and say what it reduces, why it works, and when it fails.&lt;/p></description></item></channel></rss>