<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bayesian on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/bayesian/</link><description>Recent content in Bayesian on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 22 Jan 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/bayesian/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (3): Probability Theory and Statistical Inference</title><link>https://www.chenk.top/en/ml-math-derivations/03-probability-theory-and-statistical-inference/</link><pubDate>Thu, 22 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/03-probability-theory-and-statistical-inference/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/ml-math-derivations/03-Probability-Theory-and-Statistical-Inference/illustration_1.png" alt="ML Math Derivations (3): Probability Theory and Statistical Inference — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;h2 id="what-you-will-learn" class="heading-anchor">What You Will Learn&lt;a href="#what-you-will-learn" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>In 1912, Ronald Fisher introduced &lt;strong>maximum likelihood estimation&lt;/strong> in a short paper that quietly redefined statistics. His insight was almost embarrassingly simple: &lt;em>if a parameter setting makes the observed data extremely likely, it is probably correct&lt;/em>. Almost every modern learning algorithm — from logistic regression to large language models — descends from this idea.&lt;/p></description></item><item><title>Kernel Methods (6): Gaussian Processes — When Kernels Meet Bayesian Inference</title><link>https://www.chenk.top/en/kernel-methods/06-gaussian-processes/</link><pubDate>Sun, 19 Dec 2021 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/kernel-methods/06-gaussian-processes/</guid><description>&lt;p>Kernel ridge regression gives you a number. You feed it &lt;span class="math-inline">$x_*$&lt;/span>
, it returns &lt;span class="math-inline">$\hat{y}_* = 23.7$&lt;/span>
. End of story. But you wanted to &lt;em>act&lt;/em> on that prediction — maybe schedule a delivery, dose a patient, place a bet — and the single number is not enough. Tomorrow&amp;rsquo;s temperature being &amp;ldquo;25°C&amp;rdquo; is useful; &amp;ldquo;very likely 25°C, 95% chance between 22 and 28&amp;rdquo; is &lt;em>actionable&lt;/em>. Every decision under uncertainty needs the second one. Gaussian Processes are the cleanest way to upgrade a kernel method from &amp;ldquo;point predictor&amp;rdquo; to &amp;ldquo;distribution predictor&amp;rdquo;, and they do it without abandoning a single line of the kernel math from the previous five parts.&lt;/p></description></item></channel></rss>