Mathematical Derivations
ML Math Derivations (20): Regularization and Model Selection
The series finale: from the bias-variance decomposition to L1/L2 geometry, dropout as a sub-network sampler, k-fold CV, AIC/BIC, VC bounds, and the modern double-descent phenomenon that broke classical theory.
ML Math Derivations (19): Neural Networks and Backpropagation
How does a neural network learn? This article derives forward propagation, the chain rule mechanics of backpropagation, vanishing/exploding gradients, and initialization strategies (Xavier, He).
ML Math Derivations (18): Clustering Algorithms
How do you find groups in unlabeled data? This article derives K-means (Lloyd + K-means++), hierarchical, DBSCAN, spectral, and GMM clustering from their mathematical foundations, with seven figures that show why each …
ML Math Derivations (17): Dimensionality Reduction and PCA
High-dimensional spaces are hostile to distance-based learning. This article derives PCA from two equivalent angles (max variance and min reconstruction error), and extends to kernel PCA, LDA, t-SNE, and ICA -- with …
ML Math Derivations (16): Conditional Random Fields
Why do CRFs outperform HMMs on sequence labeling? This article derives linear-chain CRF from the ground up -- potential functions, the forward-backward algorithm, gradient computation, and Viterbi decoding.
Machine Learning Mathematical Derivations (7): Decision Trees
From information entropy to the Gini index, from ID3 to CART — a complete derivation of decision-tree mathematics: split criteria, continuous and missing values, pruning, and feature importance, with sklearn-verified …
ML Math Derivations (1): Introduction and Mathematical Foundations
Why can machines learn from data at all? This first chapter builds the mathematical theory of learning from first principles -- problem formalization, loss surrogates, PAC learning, VC dimension, the bias-variance …