Probability-Statistics on Chen Kai Blog

Probability and Statistics (8): Bayesian Statistics — Priors, Posteriors, and Why Frequentists Argue

Fri, 30 Aug 2024 09:00:00 +0000

Two statisticians walk into a bar. One says: “The probability of rain tomorrow is 30%.” The other replies: “Probability is a long-run frequency. Since tomorrow only happens once, that statement is meaningless.” The first one says: “It quantifies my uncertainty about a unique event.” They proceed to argue for the rest of the evening.

This, roughly, is the Bayesian-frequentist debate. It’s not about who’s right — both frameworks are mathematically consistent. It’s about what “probability” means and how that interpretation shapes the tools you use. Having worked through six articles of largely frequentist reasoning, we now develop the Bayesian perspective: parameters are random, data update our beliefs, and uncertainty is quantified through distributions rather than confidence intervals.

Probability and Statistics (7): Hypothesis Testing — p-Values, Confidence Intervals, and All Their Pitfalls

Wed, 28 Aug 2024 09:00:00 +0000

You’ve estimated a parameter. You’ve quantified the bias-variance tradeoff. Now comes the question that drives most applied statistics: “Is this effect real, or just noise?”

Hypothesis testing is the formal framework for answering this question. It’s also the most widely misunderstood part of statistics. Entire papers have been written about how researchers misinterpret p-values, how significance thresholds are arbitrary, and how the multiple testing problem inflates false discoveries. Understanding both the theory and the pitfalls is essential for anyone who works with data.

Probability and Statistics (6): Estimation — MLE, MAP, and the Bias-Variance Story

Mon, 26 Aug 2024 09:00:00 +0000

Everything we’ve built so far — distributions, expectations, limit theorems — assumed we knew the parameters. The Gaussian has mean $\mu$ and variance $\sigma^2$ . The Binomial has $$n$$ trials with success probability $$p$$ . But in practice, you don’t know $\mu$ or $$p$$ . You observe data and try to figure them out.

This is estimation theory: the bridge between probability (where parameters are given) and statistics (where parameters are inferred). It’s also where the foundations of machine learning live. Every time you train a model, you are estimating parameters from data. The quality of that estimation determines whether your model generalizes or overfits.

Probability and Statistics (5): Law of Large Numbers and the Central Limit Theorem

Sat, 24 Aug 2024 09:00:00 +0000

If you had to choose just two theorems from all of probability theory, you’d choose these: the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). Together, they answer two fundamental questions. The LLN says: “Yes, your sample average will converge to the true mean.” The CLT says: “And here’s exactly what the fluctuations look like.” Without these theorems, there’s no justification for opinion polls, no reason to trust clinical trials, and no explanation for why stochastic gradient descent converges.

Probability and Statistics (4): Joint Distributions, Marginalization, and Independence

Fri, 23 Aug 2024 09:00:00 +0000

Until now, every distribution we’ve studied described a single quantity: one die roll, one waiting time, one measurement. But interesting problems involve relationships between variables. Does studying more hours predict a higher exam score? Are stock returns correlated across sectors? How does the sum of two random variables behave?

Answering these questions requires joint distributions — the mathematical framework for describing multiple random variables simultaneously. This is where probability theory starts connecting directly to regression, multivariate statistics, and the high-dimensional spaces of machine learning.

Probability and Statistics (3): Expectation, Variance, and the Moment-Generating Trick

Wed, 21 Aug 2024 09:00:00 +0000

A probability distribution is a complete description of a random variable — it tells you the probability of every possible outcome. But complete descriptions are unwieldy. When someone asks “how tall are people in this city?”, you don’t hand them a density function; you say “about 170 cm on average, give or take 10 cm.” The average and the spread capture most of what matters in practice.

This article develops the mathematical framework for summarizing distributions. We start with expectation (the center), build up to variance (the spread), and then introduce moment-generating functions — a single formula that encodes every moment of a distribution and, remarkably, uniquely determines the distribution itself.

Probability and Statistics (2): Random Variables and the Distributions That Matter

Tue, 20 Aug 2024 09:00:00 +0000

After building the axiomatic foundation in the previous article, you might feel like we spent a lot of time talking about sets and subsets. That’s because we did. The machinery of events and sigma-algebras is necessary but austere — it doesn’t give us a natural way to compute averages, measure spread, or fit models to data.

The bridge between abstract probability and applied statistics is the random variable. Once we assign numerical values to outcomes, the entire toolkit of calculus — derivatives, integrals, series — becomes available. And with calculus comes the ability to characterize randomness through a small set of named distributions, each encoding specific assumptions about how the world generates data.

Probability and Statistics (1): Probability Spaces — Why We Need Axioms (But Won't Overdo It)

Sun, 18 Aug 2024 09:00:00 +0000

Every time you check the weather forecast, run an A/B test, or train a neural network, you are standing on a foundation laid in 1933 by a Russian mathematician named Andrey Kolmogorov. Before him, probability was a grab bag of tricks for gamblers and actuaries. After him, it became a branch of mathematics as rigorous as calculus or algebra.

The good news: you don’t need to become a measure theorist to understand modern probability. The axioms are simple. What takes work is building the right intuitions around them — and learning to recognize when those intuitions fail.