Series

ML Math Derivations

Feb 8, 2026 ML Math Derivations 28 min read

ML Math Derivations (20): Regularization and Model Selection

The series finale: from the bias-variance decomposition to L1/L2 geometry, dropout as a sub-network sampler, k-fold CV, AIC/BIC, VC bounds, and the modern double-descent phenomenon that broke classical theory.

Feb 7, 2026 ML Math Derivations 32 min read

ML Math Derivations (19): Neural Networks and Backpropagation

How does a neural network learn? This article derives forward propagation, the chain rule mechanics of backpropagation, vanishing/exploding gradients, and initialization strategies (Xavier, He).

Feb 6, 2026 ML Math Derivations 32 min read

ML Math Derivations (18): Clustering Algorithms

How do you find groups in unlabeled data? This article derives K-means (Lloyd + K-means++), hierarchical, DBSCAN, spectral, and GMM clustering from their mathematical foundations, with seven figures that show why each …

Feb 5, 2026 ML Math Derivations 26 min read

ML Math Derivations (17): Dimensionality Reduction and PCA

High-dimensional spaces are hostile to distance-based learning. This article derives PCA from two equivalent angles (max variance and min reconstruction error), and extends to kernel PCA, LDA, t-SNE, and ICA -- with …

Feb 4, 2026 ML Math Derivations 26 min read

ML Math Derivations (16): Conditional Random Fields

Why do CRFs outperform HMMs on sequence labeling? This article derives linear-chain CRF from the ground up -- potential functions, the forward-backward algorithm, gradient computation, and Viterbi decoding.

Feb 3, 2026 ML Math Derivations 24 min read

ML Math Derivations (15): Hidden Markov Models

Derive the three classical HMM algorithms from one principle (factorising the joint, then sharing sub-computations across time): Forward-Backward for evaluation and smoothing, Viterbi for MAP decoding, and Baum-Welch …

Feb 2, 2026 ML Math Derivations 28 min read

ML Math Derivations (14): Variational Inference and Variational EM

A first-principles derivation of variational inference. From the ELBO identity and the mean-field assumption to the CAVI updates, variational EM, and the reparameterization trick that powers VAEs.

Feb 1, 2026 ML Math Derivations 24 min read

ML Math Derivations (13): EM Algorithm and GMM

Derive the EM algorithm from Jensen's inequality and the ELBO, prove its monotone-ascent guarantee, and apply it to Gaussian Mixture Models with full E-step / M-step formulas, model selection via BIC/AIC, and the K-means …

Jan 31, 2026 ML Math Derivations 28 min read

ML Math Derivations (12): XGBoost and LightGBM

Derive XGBoost's second-order Taylor expansion, regularised objective and split-gain formula, then explore LightGBM's histogram algorithm, GOSS sampling and EFB bundling for industrial-scale gradient boosting.

Jan 30, 2026 ML Math Derivations 36 min read

ML Math Derivations (11): Ensemble Learning

Derive why combining weak learners produces strong ones. Covers bias-variance decomposition, Bagging/Random Forest variance reduction, AdaBoost exponential loss, and GBDT gradient optimization in function space.

Jan 29, 2026 ML Math Derivations 28 min read

ML Math Derivations (10): Semi-Naive Bayes and Bayesian Networks

From SPODE, TAN and AODE to full Bayesian networks: how relaxing the conditional-independence assumption -- through one-dependence trees, ensembles of super-parents and graphical structure learning -- closes the gap …

Jan 28, 2026 ML Math Derivations 34 min read

ML Math Derivations (9): Naive Bayes

Rigorous derivation of Naive Bayes from Bayes theorem through conditional independence, parameter estimation, Laplace smoothing, three model variants, and why it works despite violated assumptions.

Jan 27, 2026 ML Math Derivations 28 min read

ML Math Derivations (8): Support Vector Machines

Complete SVM derivation from maximum margin to Lagrangian duality, KKT conditions, soft margin, kernel trick, and SMO algorithm with step-by-step proofs and Python code.

Jan 26, 2026 ML Math Derivations 38 min read

ML Math Derivations (7): Decision Trees

From information entropy to the Gini index, from ID3 to CART — a complete derivation of decision-tree mathematics: split criteria, continuous and missing values, pruning, and feature importance, with sklearn-verified …

Jan 25, 2026 ML Math Derivations 32 min read

ML Math Derivations (6): Logistic Regression and Classification

Complete derivation of logistic regression from sigmoid to softmax, cross-entropy loss, gradient computation, regularization, and multi-class extension with Python verification.

Jan 24, 2026 ML Math Derivations 32 min read

ML Math Derivations (5): Linear Regression

A complete derivation of linear regression from three perspectives -- algebra (the normal equation), geometry (orthogonal projection), and probability (maximum likelihood) -- followed by Ridge, Lasso, gradient methods, …

Jan 23, 2026 ML Math Derivations 44 min read

ML Math Derivations (4): Convex Optimization Theory

Nearly every ML algorithm is an optimization problem. This article derives convex sets, convex functions, gradient descent, Newton's method, KKT conditions, and ADMM -- the optimization toolkit for machine learning.

Jan 22, 2026 ML Math Derivations 28 min read

ML Math Derivations (3): Probability Theory and Statistical Inference

Machine learning is uncertainty modeling. This article derives probability spaces, common distributions, MLE, Bayesian estimation, limit theorems and information theory -- the statistical engine behind every ML model.

Jan 21, 2026 ML Math Derivations 30 min read

ML Math Derivations (2): Linear Algebra and Matrix Theory

The language of machine learning is linear algebra. This article derives vector spaces, eigendecomposition, SVD, and matrix calculus from first principles -- every tool you need for ML optimization.

Jan 20, 2026 ML Math Derivations 42 min read

ML Math Derivations (1): Introduction and Mathematical Foundations

Why can machines learn from data at all? This first chapter builds the mathematical theory of learning from first principles -- problem formalization, loss surrogates, PAC learning, VC dimension, the bias-variance …