ML Math Derivations (6): Logistic Regression and Classification

Sun, 25 Jan 2026 09:00:00 +0000

Hook. Linear regression maps inputs to any real number — but what if the output has to be a probability between 0 and 1? Logistic regression solves this with one elegant trick: a sigmoid squashing function. Despite its name, logistic regression is a classification algorithm, and its math underpins every neuron in every modern neural network.

What You Will Learn#

Why sigmoid is the natural way to turn a real-valued score into a probability, and why its derivative is so clean.
How cross-entropy loss falls out of maximum likelihood estimation in two lines.
Why cross-entropy beats MSE for classification — a vanishing-gradient argument made visible.
The full gradient and Hessian for both binary and multi-class (softmax) cases, and why the loss is convex.
L1, L2 and elastic-net regularization, and the Bayesian priors hiding behind them.
Decision-boundary geometry and the threshold-free metrics (ROC / PR / AUC) that you actually need under class imbalance.

Prerequisites#

Calculus: chain rule, partial derivatives.
Linear algebra: matrix multiplication, transpose.
Probability: Bernoulli and categorical distributions, likelihood.
Familiarity with Part 5: Linear Regression .

From Linear Models to Probabilistic Classification#

The Problem with Raw Linear Output#

Linear regression gives us $\hat y = \mathbf{w}^\top \mathbf{x}$ , which is unbounded. For classification, two things go wrong:

ML Math Derivations (3): Probability Theory and Statistical Inference

Thu, 22 Jan 2026 09:00:00 +0000

What You Will Learn#

In 1912, Ronald Fisher introduced maximum likelihood estimation in a short paper that quietly redefined statistics. His insight was almost embarrassingly simple: if a parameter setting makes the observed data extremely likely, it is probably correct. Almost every modern learning algorithm — from logistic regression to large language models — descends from this idea.

Maximum Likelihood Estimation on Chen Kai Blog

ML Math Derivations (6): Logistic Regression and Classification

What You Will Learn#

Prerequisites#

From Linear Models to Probabilistic Classification#

The Problem with Raw Linear Output#

ML Math Derivations (3): Probability Theory and Statistical Inference

What You Will Learn#