ML Math Derivations
ML Math Derivations (20): Regularization and Model Selection
The series finale: from the bias-variance decomposition to L1/L2 geometry, dropout as a sub-network sampler, k-fold CV, AIC/BIC, VC bounds, and the modern double-descent phenomenon that broke classical theory.
ML Math Derivations (19): Neural Networks and Backpropagation
How does a neural network learn? This article derives forward propagation, the chain rule mechanics of backpropagation, vanishing/exploding gradients, and initialization strategies (Xavier, He).
ML Math Derivations (18): Clustering Algorithms
How do you find groups in unlabeled data? This article derives K-means (Lloyd + K-means++), hierarchical, DBSCAN, spectral, and GMM clustering from their mathematical foundations, with seven figures that show why each …
ML Math Derivations (17): Dimensionality Reduction and PCA
High-dimensional spaces are hostile to distance-based learning. This article derives PCA from two equivalent angles (max variance and min reconstruction error), and extends to kernel PCA, LDA, t-SNE, and ICA -- with …
ML Math Derivations (16): Conditional Random Fields
Why do CRFs outperform HMMs on sequence labeling? This article derives linear-chain CRF from the ground up -- potential functions, the forward-backward algorithm, gradient computation, and Viterbi decoding.
Machine Learning Mathematical Derivations (15): Hidden Markov Models
Derive the three classical HMM algorithms from one principle (factorising the joint, then sharing sub-computations across time): Forward-Backward for evaluation and smoothing, Viterbi for MAP decoding, and Baum-Welch …
Machine Learning Mathematical Derivations (14): Variational Inference and Variational EM
A first-principles derivation of variational inference. From the ELBO identity and the mean-field assumption to the CAVI updates, variational EM, and the reparameterization trick that powers VAEs.
Machine Learning Mathematical Derivations (13): EM Algorithm and GMM
Derive the EM algorithm from Jensen's inequality and the ELBO, prove its monotone-ascent guarantee, and apply it to Gaussian Mixture Models with full E-step / M-step formulas, model selection via BIC/AIC, and the K-means …
Machine Learning Mathematical Derivations (12): XGBoost and LightGBM
Derive XGBoost's second-order Taylor expansion, regularised objective and split-gain formula, then explore LightGBM's histogram algorithm, GOSS sampling and EFB bundling for industrial-scale gradient boosting.
Machine Learning Mathematical Derivations (11): Ensemble Learning
Derive why combining weak learners produces strong ones. Covers bias-variance decomposition, Bagging/Random Forest variance reduction, AdaBoost exponential loss, and GBDT gradient optimization in function space.
Machine Learning Mathematical Derivations (10): Semi-Naive Bayes and Bayesian Networks
From SPODE, TAN and AODE to full Bayesian networks: how relaxing the conditional-independence assumption -- through one-dependence trees, ensembles of super-parents and graphical structure learning -- closes the gap …
Machine Learning Mathematical Derivations (9): Naive Bayes
Rigorous derivation of Naive Bayes from Bayes theorem through conditional independence, parameter estimation, Laplace smoothing, three model variants, and why it works despite violated assumptions.
Machine Learning Mathematical Derivations (8): Support Vector Machines
Complete SVM derivation from maximum margin to Lagrangian duality, KKT conditions, soft margin, kernel trick, and SMO algorithm with step-by-step proofs and Python code.
Machine Learning Mathematical Derivations (7): Decision Trees
From information entropy to the Gini index, from ID3 to CART — a complete derivation of decision-tree mathematics: split criteria, continuous and missing values, pruning, and feature importance, with sklearn-verified …
Machine Learning Mathematical Derivations (6): Logistic Regression and Classification
Complete derivation of logistic regression from sigmoid to softmax, cross-entropy loss, gradient computation, regularization, and multi-class extension with Python verification.
Mathematical Derivation of Machine Learning (5): Linear Regression
A complete derivation of linear regression from three perspectives -- algebra (the normal equation), geometry (orthogonal projection), and probability (maximum likelihood) -- followed by Ridge, Lasso, gradient methods, …
ML Math Derivations (4): Convex Optimization Theory
Nearly every ML algorithm is an optimization problem. This article derives convex sets, convex functions, gradient descent, Newton's method, KKT conditions, and ADMM -- the optimization toolkit for machine learning.
ML Math Derivations (3): Probability Theory and Statistical Inference
Machine learning is uncertainty modeling. This article derives probability spaces, common distributions, MLE, Bayesian estimation, limit theorems and information theory -- the statistical engine behind every ML model.
ML Math Derivations (2): Linear Algebra and Matrix Theory
The language of machine learning is linear algebra. This article derives vector spaces, eigendecomposition, SVD, and matrix calculus from first principles -- every tool you need for ML optimization.
ML Math Derivations (1): Introduction and Mathematical Foundations
Why can machines learn from data at all? This first chapter builds the mathematical theory of learning from first principles -- problem formalization, loss surrogates, PAC learning, VC dimension, the bias-variance …