Kernel Methods
ML Math Derivations (8): Support Vector Machines
Complete SVM derivation from maximum margin to Lagrangian duality, KKT conditions, soft margin, kernel trick, and SMO algorithm with step-by-step proofs and Python code.
Kernel Methods (8): Deep Kernel Learning vs Deep Learning — A Practitioner's Guide
Deep kernel learning combines neural feature extractors with kernel methods. When to pick kernels over deep nets, hyperparameter tuning playbook, common failure modes, and a final 5-step kernel decision flowchart.
Kernel Methods (7): Large-Scale Kernels — Nystrom Approximation and Random Fourier Features
Kernel methods are O(n^3). Nystrom approximation and Random Fourier Features pull them back to linear time without giving up the kernel trick's expressive power.
Kernel Methods (6): Gaussian Processes — When Kernels Meet Bayesian Inference
Gaussian Processes turn kernels into a Bayesian model — posterior with uncertainty, marginal likelihood for hyperparameters, and the kernel as a prior over functions.
Kernel Methods (5): Kernel SVM, Kernel PCA, and Kernel Ridge Regression
The classic algorithms, kernelized — SVM's dual form, Kernel PCA's eigendecomposition in feature space, and Kernel Ridge's closed-form solution. With sklearn code and worked examples.
Kernel Methods (4): Common Kernel Families — RBF, Matern, Polynomial, Periodic, and More
A tour of the kernels you'll actually use: RBF (Gaussian), polynomial, linear, Matern, periodic, sigmoid. When to pick which, hyperparameter intuition, and how kernels combine.
Kernel Methods (3): RKHS — The Theoretical Soul of Kernel Methods
Reproducing Kernel Hilbert Space — the function space where kernel methods live. The reproducing property, the representer theorem, and why finite-data optimization works in infinite dimensions.
Kernel Methods (2): Mathematical Foundations — Positive-Definite Kernels and Mercer's Theorem
What makes a function a valid kernel? Positive-definiteness, the Gram matrix test, and Mercer's theorem — the spectral decomposition that justifies the kernel trick.
Kernel Methods (1): Why We Need Them — Hitting the Ceiling of Linear Algorithms
Linear algorithms can't capture non-linear patterns. The kernel trick lets you keep the linear algorithm's elegance AND model non-linear relationships — without writing the high-dimensional feature map. Part 1 of an …








