<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Scalability on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/scalability/</link><description>Recent content in Scalability on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 24 Dec 2021 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/scalability/index.xml" rel="self" type="application/rss+xml"/><item><title>Kernel Methods (7): Large-Scale Kernels — Nystrom Approximation and Random Fourier Features</title><link>https://www.chenk.top/en/kernel-methods/07-large-scale-kernels/</link><pubDate>Fri, 24 Dec 2021 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/kernel-methods/07-large-scale-kernels/</guid><description>&lt;p>You want to train an RBF SVM on a million-image classification set. The Gram matrix is &lt;span class="math-inline">$10^6 \times 10^6$&lt;/span>
 doubles, which is &lt;strong>8 TB&lt;/strong>. That number alone — eight terabytes of RAM, just to &lt;em>store&lt;/em> the kernel — is why most working data scientists who learned kernel methods in a stats class quietly never reach for them on real production workloads. The kernel trick gives you an infinite-dimensional feature space for the cost of one dot product per pair; the bill arrives when you have &lt;span class="math-inline">$n^2$&lt;/span>
 pairs.&lt;/p></description></item></channel></rss>