Tagged

Mathematical Derivations

Feb 8, 2026 ML Math Derivations 13 min read

ML Math Derivations (20): Regularization and Model Selection

The series finale: from the bias-variance decomposition to L1/L2 geometry, dropout as a sub-network sampler, k-fold CV, AIC/BIC, VC bounds, and the modern double-descent phenomenon that broke classical theory.

Feb 7, 2026 ML Math Derivations 12 min read

ML Math Derivations (19): Neural Networks and Backpropagation

How does a neural network learn? This article derives forward propagation, the chain rule mechanics of backpropagation, vanishing/exploding gradients, and initialization strategies (Xavier, He).

Feb 6, 2026 ML Math Derivations 13 min read

ML Math Derivations (18): Clustering Algorithms

How do you find groups in unlabeled data? This article derives K-means (Lloyd + K-means++), hierarchical, DBSCAN, spectral, and GMM clustering from their mathematical foundations, with seven figures that show why each …

Feb 5, 2026 ML Math Derivations 15 min read

ML Math Derivations (17): Dimensionality Reduction and PCA

High-dimensional spaces are hostile to distance-based learning. This article derives PCA from two equivalent angles (max variance and min reconstruction error), and extends to kernel PCA, LDA, t-SNE, and ICA -- with …

Feb 4, 2026 ML Math Derivations 14 min read

ML Math Derivations (16): Conditional Random Fields

Why do CRFs outperform HMMs on sequence labeling? This article derives linear-chain CRF from the ground up -- potential functions, the forward-backward algorithm, gradient computation, and Viterbi decoding.

Jan 26, 2026 ML Math Derivations 21 min read

Machine Learning Mathematical Derivations (7): Decision Trees

From information entropy to the Gini index, from ID3 to CART — a complete derivation of decision-tree mathematics: split criteria, continuous and missing values, pruning, and feature importance, with sklearn-verified …

Jan 20, 2026 ML Math Derivations 20 min read

ML Math Derivations (1): Introduction and Mathematical Foundations

Why can machines learn from data at all? This first chapter builds the mathematical theory of learning from first principles -- problem formalization, loss surrogates, PAC learning, VC dimension, the bias-variance …