ML Math Derivations

Deriving the algorithms — no hand-waving.

20 articles

01
ML Math Derivations (1): Introduction and Mathematical Foundations
Why can machines learn from data at all? This first chapter builds the mathematical theory of learning from first …
2026-01-20 42 min
02
ML Math Derivations (2): Linear Algebra and Matrix Theory
The language of machine learning is linear algebra. This article derives vector spaces, eigendecomposition, SVD, and …
2026-01-21 30 min
03
ML Math Derivations (3): Probability Theory and Statistical Inference
Machine learning is uncertainty modeling. This article derives probability spaces, common distributions, MLE, Bayesian …
2026-01-22 28 min
04
ML Math Derivations (4): Convex Optimization Theory
Nearly every ML algorithm is an optimization problem. This article derives convex sets, convex functions, gradient …
2026-01-23 44 min
05
ML Math Derivations (5): Linear Regression
A complete derivation of linear regression from three perspectives -- algebra (the normal equation), geometry …
2026-01-24 32 min
06
ML Math Derivations (6): Logistic Regression and Classification
Complete derivation of logistic regression from sigmoid to softmax, cross-entropy loss, gradient computation, …
2026-01-25 32 min
07
ML Math Derivations (7): Decision Trees
From information entropy to the Gini index, from ID3 to CART — a complete derivation of decision-tree mathematics: split …
2026-01-26 38 min
08
ML Math Derivations (8): Support Vector Machines
Complete SVM derivation from maximum margin to Lagrangian duality, KKT conditions, soft margin, kernel trick, and SMO …
2026-01-27 28 min
09
ML Math Derivations (9): Naive Bayes
Rigorous derivation of Naive Bayes from Bayes theorem through conditional independence, parameter estimation, Laplace …
2026-01-28 34 min
10
ML Math Derivations (10): Semi-Naive Bayes and Bayesian Networks
From SPODE, TAN and AODE to full Bayesian networks: how relaxing the conditional-independence assumption -- through …
2026-01-29 28 min
11
ML Math Derivations (11): Ensemble Learning
Derive why combining weak learners produces strong ones. Covers bias-variance decomposition, Bagging/Random Forest …
2026-01-30 36 min
12
ML Math Derivations (12): XGBoost and LightGBM
Derive XGBoost's second-order Taylor expansion, regularised objective and split-gain formula, then explore LightGBM's …
2026-01-31 28 min
13
ML Math Derivations (13): EM Algorithm and GMM
Derive the EM algorithm from Jensen's inequality and the ELBO, prove its monotone-ascent guarantee, and apply it to …
2026-02-01 24 min
14
ML Math Derivations (14): Variational Inference and Variational EM
A first-principles derivation of variational inference. From the ELBO identity and the mean-field assumption to the CAVI …
2026-02-02 28 min
15
ML Math Derivations (15): Hidden Markov Models
Derive the three classical HMM algorithms from one principle (factorising the joint, then sharing sub-computations …
2026-02-03 24 min
16
ML Math Derivations (16): Conditional Random Fields
Why do CRFs outperform HMMs on sequence labeling? This article derives linear-chain CRF from the ground up -- potential …
2026-02-04 26 min
17
ML Math Derivations (17): Dimensionality Reduction and PCA
High-dimensional spaces are hostile to distance-based learning. This article derives PCA from two equivalent angles (max …
2026-02-05 26 min
18
ML Math Derivations (18): Clustering Algorithms
How do you find groups in unlabeled data? This article derives K-means (Lloyd + K-means++), hierarchical, DBSCAN, …
2026-02-06 32 min
19
ML Math Derivations (19): Neural Networks and Backpropagation
How does a neural network learn? This article derives forward propagation, the chain rule mechanics of backpropagation, …
2026-02-07 32 min
20
ML Math Derivations (20): Regularization and Model Selection
The series finale: from the bias-variance decomposition to L1/L2 geometry, dropout as a sub-network sampler, k-fold CV, …
2026-02-08 28 min

ML Math Derivations

ML Math Derivations (1): Introduction and Mathematical Foundations

ML Math Derivations (2): Linear Algebra and Matrix Theory

ML Math Derivations (3): Probability Theory and Statistical Inference

ML Math Derivations (4): Convex Optimization Theory

ML Math Derivations (5): Linear Regression

ML Math Derivations (6): Logistic Regression and Classification

ML Math Derivations (7): Decision Trees

ML Math Derivations (8): Support Vector Machines

ML Math Derivations (9): Naive Bayes

ML Math Derivations (10): Semi-Naive Bayes and Bayesian Networks

ML Math Derivations (11): Ensemble Learning

ML Math Derivations (12): XGBoost and LightGBM

ML Math Derivations (13): EM Algorithm and GMM

ML Math Derivations (14): Variational Inference and Variational EM

ML Math Derivations (15): Hidden Markov Models

ML Math Derivations (16): Conditional Random Fields

ML Math Derivations (17): Dimensionality Reduction and PCA

ML Math Derivations (18): Clustering Algorithms

ML Math Derivations (19): Neural Networks and Backpropagation

ML Math Derivations (20): Regularization and Model Selection