PCA on Chen Kai Blog

ML Math Derivations (17): Dimensionality Reduction and PCA

Thu, 05 Feb 2026 09:00:00 +0000

What You Will Learn#

Feed a clustering algorithm $$10{,}000$$ -dimensional data and it will most likely fail — not because the algorithm is broken, but because high-dimensional space is a hostile environment for distance-based learning. Volumes evaporate into thin shells, the ratio of nearest- to farthest-neighbour distances tends to $$1$$ , and “closeness” stops carrying information. Dimensionality reduction is the response: project the data into a lower-dimensional space while keeping the structure that actually matters.

Essence of Linear Algebra (15): Linear Algebra in Machine Learning

Wed, 09 Apr 2025 09:00:00 +0000

Ask any senior ML engineer “what math do you actually use day to day?” and the answer is almost always linear algebra. Calculus shows up in derivations; probability shows up in modeling; but the runtime of a real ML system is dominated by matrix-vector multiplies, decompositions, and projections. PyTorch’s Linear, scikit-learn’s PCA, Spark MLlib’s ALS, and a Transformer’s attention head are all the same primitive in different costumes.

This chapter covers the algorithms used in production ML systems — PCA, LDA, SVM with kernels, matrix factorization for recommenders, regularized linear regression, neural network layers, and attention — and explains the linear algebra behind each. We focus on intuition first, then geometry, and finally formulas.

Essence of Linear Algebra (9): Singular Value Decomposition — The Crown Jewel of Linear Algebra

Wed, 26 Feb 2025 09:00:00 +0000

Why SVD Earns the Crown#

The spectral theorem of Chapter 8 gave us $A = Q\Lambda Q^T$ — a beautifully clean factorisation, but only for symmetric matrices. Most matrices that show up in practice are not symmetric, and many are not even square:

a photograph stored as a $1920 \times 1080$ pixel matrix,
a Netflix-style user–movie rating matrix (millions of rows, thousands of columns),
a document–term matrix in NLP (documents by vocabulary),
a gene-expression matrix in bioinformatics.

A = U\,\Sigma\,V^{\!\top}.

This is the most powerful, most universally applicable decomposition in all of linear algebra.

Kernel Methods (5): Kernel SVM, Kernel PCA, and Kernel Ridge Regression

Tue, 14 Dec 2021 09:00:00 +0000

Your features are two-dimensional, your data is clearly a circle inside a circle, and LinearSVC is at 50% accuracy with the wide-eyed look of an algorithm that genuinely believes a straight line is the answer. You stare at the scatter plot, you stare at the model, and somewhere in the back of your head the words kernel SVM surface. You type kernel='rbf', the accuracy jumps to 0.98, and the rest of the afternoon you wonder what exactly just happened — and why the same trick also gives you a Kernel PCA that unfolds a Swiss roll and a Kernel Ridge regressor that fits a sine wave with three lines of code.