Kernel Methods (7): Large-Scale Kernels — Nystrom Approximation and Random Fourier Features

Fri, 24 Dec 2021 09:00:00 +0000

You want to train an RBF SVM on a million-image classification set. The Gram matrix is $10^6 \times 10^6$ doubles, which is 8 TB. That number alone — eight terabytes of RAM, just to store the kernel — is why most working data scientists who learned kernel methods in a stats class quietly never reach for them on real production workloads. The kernel trick gives you an infinite-dimensional feature space for the cost of one dot product per pair; the bill arrives when you have $$n^2$$ pairs.

Scalability on Chen Kai Blog

Kernel Methods (7): Large-Scale Kernels — Nystrom Approximation and Random Fourier Features