ML Math Derivations (3): Probability Theory and Statistical Inference

Thu, 22 Jan 2026 09:00:00 +0000

What You Will Learn#

In 1912, Ronald Fisher introduced maximum likelihood estimation in a short paper that quietly redefined statistics. His insight was almost embarrassingly simple: if a parameter setting makes the observed data extremely likely, it is probably correct. Almost every modern learning algorithm — from logistic regression to large language models — descends from this idea.

Kernel Methods (6): Gaussian Processes — When Kernels Meet Bayesian Inference

Sun, 19 Dec 2021 09:00:00 +0000

Kernel ridge regression gives you a number. You feed it $$x_*$$ , it returns $\hat{y}_* = 23.7$ . End of story. But you wanted to act on that prediction — maybe schedule a delivery, dose a patient, place a bet — and the single number is not enough. Tomorrow’s temperature being “25°C” is useful; “very likely 25°C, 95% chance between 22 and 28” is actionable. Every decision under uncertainty needs the second one. Gaussian Processes are the cleanest way to upgrade a kernel method from “point predictor” to “distribution predictor”, and they do it without abandoning a single line of the kernel math from the previous five parts.

Bayesian on Chen Kai Blog

ML Math Derivations (3): Probability Theory and Statistical Inference

What You Will Learn#

Kernel Methods (6): Gaussian Processes — When Kernels Meet Bayesian Inference