
ML Math Derivations
Deriving the algorithms — no hand-waving.
01ML Math Derivations (1): Introduction and Mathematical Foundations
Why can machines learn from data at all? This first chapter builds the mathematical theory of learning from first …
02ML Math Derivations (2): Linear Algebra and Matrix Theory
The language of machine learning is linear algebra. This article derives vector spaces, eigendecomposition, SVD, and …
03ML Math Derivations (3): Probability Theory and Statistical Inference
Machine learning is uncertainty modeling. This article derives probability spaces, common distributions, MLE, Bayesian …
04ML Math Derivations (4): Convex Optimization Theory
Nearly every ML algorithm is an optimization problem. This article derives convex sets, convex functions, gradient …
05ML Math Derivations (5): Linear Regression
A complete derivation of linear regression from three perspectives -- algebra (the normal equation), geometry …
06ML Math Derivations (6): Logistic Regression and Classification
Complete derivation of logistic regression from sigmoid to softmax, cross-entropy loss, gradient computation, …
07ML Math Derivations (7): Decision Trees
From information entropy to the Gini index, from ID3 to CART — a complete derivation of decision-tree mathematics: split …
08ML Math Derivations (8): Support Vector Machines
Complete SVM derivation from maximum margin to Lagrangian duality, KKT conditions, soft margin, kernel trick, and SMO …
09ML Math Derivations (9): Naive Bayes
Rigorous derivation of Naive Bayes from Bayes theorem through conditional independence, parameter estimation, Laplace …
10ML Math Derivations (10): Semi-Naive Bayes and Bayesian Networks
From SPODE, TAN and AODE to full Bayesian networks: how relaxing the conditional-independence assumption -- through …
11ML Math Derivations (11): Ensemble Learning
Derive why combining weak learners produces strong ones. Covers bias-variance decomposition, Bagging/Random Forest …
12ML Math Derivations (12): XGBoost and LightGBM
Derive XGBoost's second-order Taylor expansion, regularised objective and split-gain formula, then explore LightGBM's …
13ML Math Derivations (13): EM Algorithm and GMM
Derive the EM algorithm from Jensen's inequality and the ELBO, prove its monotone-ascent guarantee, and apply it to …
14ML Math Derivations (14): Variational Inference and Variational EM
A first-principles derivation of variational inference. From the ELBO identity and the mean-field assumption to the CAVI …
15ML Math Derivations (15): Hidden Markov Models
Derive the three classical HMM algorithms from one principle (factorising the joint, then sharing sub-computations …
16ML Math Derivations (16): Conditional Random Fields
Why do CRFs outperform HMMs on sequence labeling? This article derives linear-chain CRF from the ground up -- potential …
17ML Math Derivations (17): Dimensionality Reduction and PCA
High-dimensional spaces are hostile to distance-based learning. This article derives PCA from two equivalent angles (max …
18ML Math Derivations (18): Clustering Algorithms
How do you find groups in unlabeled data? This article derives K-means (Lloyd + K-means++), hierarchical, DBSCAN, …
19ML Math Derivations (19): Neural Networks and Backpropagation
How does a neural network learn? This article derives forward propagation, the chain rule mechanics of backpropagation, …
20ML Math Derivations (20): Regularization and Model Selection
The series finale: from the bias-variance decomposition to L1/L2 geometry, dropout as a sub-network sampler, k-fold CV, …