<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Probability-Statistics on Chen Kai Blog</title><link>https://www.chenk.top/en/series/probability-statistics/</link><description>Recent content in Probability-Statistics on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 30 Aug 2024 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/series/probability-statistics/index.xml" rel="self" type="application/rss+xml"/><item><title>Probability and Statistics (8): Bayesian Statistics — Priors, Posteriors, and Why Frequentists Argue</title><link>https://www.chenk.top/en/probability-statistics/08-bayesian-thinking/</link><pubDate>Fri, 30 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/08-bayesian-thinking/</guid><description>&lt;p>Two statisticians walk into a bar. One says: &amp;ldquo;The probability of rain tomorrow is 30%.&amp;rdquo; The other replies: &amp;ldquo;Probability is a long-run frequency. Since tomorrow only happens once, that statement is meaningless.&amp;rdquo; The first one says: &amp;ldquo;It quantifies my uncertainty about a unique event.&amp;rdquo; They proceed to argue for the rest of the evening.&lt;/p>
&lt;p>This, roughly, is the Bayesian-frequentist debate. It&amp;rsquo;s not about who&amp;rsquo;s right — both frameworks are mathematically consistent. It&amp;rsquo;s about what &amp;ldquo;probability&amp;rdquo; means and how that interpretation shapes the tools you use. Having worked through six articles of largely frequentist reasoning, we now develop the Bayesian perspective: parameters are random, data update our beliefs, and uncertainty is quantified through distributions rather than confidence intervals.&lt;/p></description></item><item><title>Probability and Statistics (7): Hypothesis Testing — p-Values, Confidence Intervals, and All Their Pitfalls</title><link>https://www.chenk.top/en/probability-statistics/07-hypothesis-testing/</link><pubDate>Wed, 28 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/07-hypothesis-testing/</guid><description>&lt;p>You&amp;rsquo;ve estimated a parameter. You&amp;rsquo;ve quantified the bias-variance tradeoff. Now comes the question that drives most applied statistics: &amp;ldquo;Is this effect real, or just noise?&amp;rdquo;&lt;/p>
&lt;p>Hypothesis testing is the formal framework for answering this question. It&amp;rsquo;s also the most widely misunderstood part of statistics. Entire papers have been written about how researchers misinterpret p-values, how significance thresholds are arbitrary, and how the multiple testing problem inflates false discoveries. Understanding both the theory and the pitfalls is essential for anyone who works with data.&lt;/p></description></item><item><title>Probability and Statistics (6): Estimation — MLE, MAP, and the Bias-Variance Story</title><link>https://www.chenk.top/en/probability-statistics/06-estimation-theory/</link><pubDate>Mon, 26 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/06-estimation-theory/</guid><description>&lt;p>Everything we&amp;rsquo;ve built so far — distributions, expectations, limit theorems — assumed we knew the parameters. The Gaussian has mean &lt;span class="math-inline">$\mu$&lt;/span>
 and variance &lt;span class="math-inline">$\sigma^2$&lt;/span>
. The Binomial has &lt;span class="math-inline">$n$&lt;/span>
 trials with success probability &lt;span class="math-inline">$p$&lt;/span>
. But in practice, you don&amp;rsquo;t know &lt;span class="math-inline">$\mu$&lt;/span>
 or &lt;span class="math-inline">$p$&lt;/span>
. You observe data and try to figure them out.&lt;/p>
&lt;p>This is &lt;strong>estimation theory&lt;/strong>: the bridge between probability (where parameters are given) and statistics (where parameters are inferred). It&amp;rsquo;s also where the foundations of machine learning live. Every time you train a model, you are estimating parameters from data. The quality of that estimation determines whether your model generalizes or overfits.&lt;/p></description></item><item><title>Probability and Statistics (5): Law of Large Numbers and the Central Limit Theorem</title><link>https://www.chenk.top/en/probability-statistics/05-limit-theorems/</link><pubDate>Sat, 24 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/05-limit-theorems/</guid><description>&lt;p>If you had to choose just two theorems from all of probability theory, you&amp;rsquo;d choose these: the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). Together, they answer two fundamental questions. The LLN says: &amp;ldquo;Yes, your sample average will converge to the true mean.&amp;rdquo; The CLT says: &amp;ldquo;And here&amp;rsquo;s exactly what the fluctuations look like.&amp;rdquo; Without these theorems, there&amp;rsquo;s no justification for opinion polls, no reason to trust clinical trials, and no explanation for why stochastic gradient descent converges.&lt;/p></description></item><item><title>Probability and Statistics (4): Joint Distributions, Marginalization, and Independence</title><link>https://www.chenk.top/en/probability-statistics/04-joint-distributions/</link><pubDate>Fri, 23 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/04-joint-distributions/</guid><description>&lt;p>Until now, every distribution we&amp;rsquo;ve studied described a single quantity: one die roll, one waiting time, one measurement. But interesting problems involve relationships between variables. Does studying more hours predict a higher exam score? Are stock returns correlated across sectors? How does the sum of two random variables behave?&lt;/p>
&lt;p>Answering these questions requires &lt;strong>joint distributions&lt;/strong> — the mathematical framework for describing multiple random variables simultaneously. This is where probability theory starts connecting directly to regression, multivariate statistics, and the high-dimensional spaces of machine learning.&lt;/p></description></item><item><title>Probability and Statistics (3): Expectation, Variance, and the Moment-Generating Trick</title><link>https://www.chenk.top/en/probability-statistics/03-expectation-and-moments/</link><pubDate>Wed, 21 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/03-expectation-and-moments/</guid><description>&lt;p>A probability distribution is a complete description of a random variable — it tells you the probability of every possible outcome. But complete descriptions are unwieldy. When someone asks &amp;ldquo;how tall are people in this city?&amp;rdquo;, you don&amp;rsquo;t hand them a density function; you say &amp;ldquo;about 170 cm on average, give or take 10 cm.&amp;rdquo; The average and the spread capture most of what matters in practice.&lt;/p>
&lt;p>This article develops the mathematical framework for summarizing distributions. We start with expectation (the center), build up to variance (the spread), and then introduce moment-generating functions — a single formula that encodes every moment of a distribution and, remarkably, uniquely determines the distribution itself.&lt;/p></description></item><item><title>Probability and Statistics (2): Random Variables and the Distributions That Matter</title><link>https://www.chenk.top/en/probability-statistics/02-random-variables/</link><pubDate>Tue, 20 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/02-random-variables/</guid><description>&lt;p>After building the axiomatic foundation in the previous article, you might feel like we spent a lot of time talking about sets and subsets. That&amp;rsquo;s because we did. The machinery of events and sigma-algebras is necessary but austere — it doesn&amp;rsquo;t give us a natural way to compute averages, measure spread, or fit models to data.&lt;/p>
&lt;p>The bridge between abstract probability and applied statistics is the &lt;strong>random variable&lt;/strong>. Once we assign numerical values to outcomes, the entire toolkit of calculus — derivatives, integrals, series — becomes available. And with calculus comes the ability to characterize randomness through a small set of named distributions, each encoding specific assumptions about how the world generates data.&lt;/p></description></item><item><title>Probability and Statistics (1): Probability Spaces — Why We Need Axioms (But Won't Overdo It)</title><link>https://www.chenk.top/en/probability-statistics/01-probability-foundations/</link><pubDate>Sun, 18 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/probability-statistics/01-probability-foundations/</guid><description>&lt;p>Every time you check the weather forecast, run an A/B test, or train a neural network, you are standing on a foundation laid in 1933 by a Russian mathematician named Andrey Kolmogorov. Before him, probability was a grab bag of tricks for gamblers and actuaries. After him, it became a branch of mathematics as rigorous as calculus or algebra.&lt;/p>
&lt;p>The good news: you don&amp;rsquo;t need to become a measure theorist to understand modern probability. The axioms are simple. What takes work is building the right intuitions around them — and learning to recognize when those intuitions fail.&lt;/p></description></item></channel></rss>