Linear Algebra on Chen Kai Blog

Essence of Linear Algebra (18): Frontiers and Summary

Wed, 30 Apr 2025 09:00:00 +0000

We have walked the long road of linear algebra together. We started with arrows in the plane and ended at the gates of quantum computers, the inner workings of large language models, and the topology of data clouds. The remarkable thing – the thing this series has tried to make visible – is that the same handful of ideas keeps coming back. A vector is a state. A matrix is a transformation. A decomposition is the structure hiding inside the transformation. A norm tells you when you can trust your computation. Once you internalise that loop, every “frontier” looks less like a foreign country and more like another dialect of a language you already speak.

Essence of Linear Algebra (17): Linear Algebra in Computer Vision

Wed, 23 Apr 2025 09:00:00 +0000

Computer vision is the science of teaching machines to see. What is striking is how thoroughly the whole field reduces to linear algebra: an image is a matrix, a geometric transformation is a matrix product, a camera is a $3 \times 4$ projection matrix, two-view geometry is the equation $\mathbf{x}_2^\top \mathbf{F}\, \mathbf{x}_1 = 0$, and 3D reconstruction is a sparse linear least-squares problem. Once you see the field through that lens, what once looked like a zoo of algorithms turns out to be a small set of linear-algebraic ideas applied repeatedly.

Essence of Linear Algebra (16): Linear Algebra in Deep Learning

Wed, 16 Apr 2025 09:00:00 +0000

Strip away the marketing and a deep network is one thing: a long pipeline of matrix multiplications glued together by elementwise nonlinearities. Forward pass, backward pass, convolution, attention, normalization, fine-tuning – every “trick” is a small twist on the same algebraic theme. Once you see the matrices, the field stops looking like a bag of recipes and starts looking like a single language.

This chapter rebuilds the modern stack from that single language. We follow one signal – a vector $\mathbf{x}$ – as it flows through linear layers, gets convolved, gets attended to, gets normalized, and gets adapted by a low-rank update. At each step we name the matrix that does the work and the property of that matrix (rank, conditioning, transpose) that makes the trick succeed.

Essence of Linear Algebra (15): Linear Algebra in Machine Learning

Wed, 09 Apr 2025 09:00:00 +0000

Ask any senior ML engineer “what math do you actually use day to day?” and the answer is almost always linear algebra. Calculus shows up in derivations; probability shows up in modeling; but the runtime of a real ML system is dominated by matrix-vector multiplies, decompositions, and projections. PyTorch’s Linear, scikit-learn’s PCA, Spark MLlib’s ALS, and a Transformer’s attention head are all the same primitive in different costumes.

This chapter walks through the algorithms that production ML systems actually run – PCA, LDA, SVM with kernels, matrix factorization for recommenders, regularized linear regression, neural network layers, attention – and shows the linear algebra that makes each of them tick. We focus on intuition first, geometry second, formulas third.

Essence of Linear Algebra (14): Random Matrix Theory

Wed, 02 Apr 2025 09:00:00 +0000

A million i.i.d. coin flips, arranged into a thousand-by-thousand symmetric matrix, somehow produce eigenvalues that fill a perfect semicircle. A noisy sample covariance matrix that should be the identity instead spreads its eigenvalues across an interval whose width you can predict before seeing a single number. The largest eigenvalue of a Wigner matrix has a tail distribution that turns up everywhere – in growing crystals, in the longest increasing subsequence of a random permutation, in the energy levels of heavy nuclei. Random matrix theory (RMT) is the study of why these regularities appear, and how to use them.

Essence of Linear Algebra (13): Tensors and Multilinear Algebra

Wed, 26 Mar 2025 09:00:00 +0000

If you’ve used PyTorch or TensorFlow, you’ve met the word “tensor” hundreds of times. PyTorch calls every array torch.Tensor; TensorFlow puts it in the product name. But what is a tensor, and why did frameworks borrow this physics-flavored word for what looks like a multi-dimensional array?

The short answer of this chapter:

A tensor is the natural generalization of a scalar, vector, and matrix to arbitrary dimensions. Everything you know about matrices either lifts cleanly to tensors, or breaks in instructive ways.

Sparse Matrices and Compressed Sensing -- Less Is More

Wed, 19 Mar 2025 09:00:00 +0000

The “Less Is More” Miracle

A raw 24-megapixel photograph weighs in at roughly 70 MB. JPEG compresses it to a few hundred kilobytes – a 100$\times$reduction – and you cannot tell the difference. A traditional MRI scan takes thirty minutes; a modern compressed sensing MRI gets the same image in five.

Both miracles run on the same engine: sparsity. Most natural signals, written in the right basis, have only a handful of meaningful coefficients. Everything else is essentially zero.

Matrix Calculus and Optimization -- The Engine Behind Machine Learning

Wed, 12 Mar 2025 09:00:00 +0000

From Shower Knobs to Neural Networks

Every morning you train a tiny neural network. The water comes out too cold, so you nudge the knob – a parameter – in some direction. A second later you observe a new temperature – the error signal – and nudge again. After three or four iterations you have converged.

Modern deep learning is the same loop, scaled up by seven orders of magnitude. The “knob” is a matrix$W$with hundreds of millions of entries. The “error” is a scalar loss$L$. And the question is the same: for each parameter, in which direction should I push, and by how much? The answer lives in a single object: the gradient$\partial L / \partial W$.

Matrix Norms and Condition Numbers -- Is Your Linear System Healthy?

Wed, 05 Mar 2025 09:00:00 +0000

The Question That Haunts Engineers

The equations are right. The algorithm is right. So why is the computed answer completely wrong?

The culprit is usually a single number called the condition number. It measures how sensitive a linear system is — whether a tiny wobble in the input gets amplified into a catastrophic error in the output. To talk about condition numbers we first need a way to measure the “size” of vectors and matrices. That is what norms do.

Singular Value Decomposition -- The Crown Jewel of Linear Algebra

Wed, 26 Feb 2025 09:00:00 +0000

Why SVD Earns the Crown

The spectral theorem of Chapter 8 gave us $A = Q\Lambda Q^T$ – a beautifully clean factorisation, but only for symmetric matrices. Most matrices that show up in practice are not symmetric, and many are not even square:

a photograph stored as a $1920 \times 1080$ pixel matrix,
a Netflix-style user–movie rating matrix (millions of rows, thousands of columns),
a document–term matrix in NLP (documents by vocabulary),
a gene-expression matrix in bioinformatics.

$$A = U\,\Sigma\,V^{\!\top}.$$

This is the most powerful, most universally applicable decomposition in all of linear algebra.

Symmetric Matrices and Quadratic Forms -- The Best Matrices in Town

Wed, 19 Feb 2025 09:00:00 +0000

Why Symmetric Matrices Are the “Best”

Of all the matrices you will ever meet, symmetric matrices are the most well-behaved. They have:

only real eigenvalues,
a complete set of orthogonal eigenvectors,
and a perfect diagonalization $A = Q\Lambda Q^T$ that costs nothing to invert.

This is not a curiosity. Almost every important matrix you actually compute with in physics, optimization, statistics, or machine learning is symmetric:

A covariance matrix $\Sigma = \tfrac{1}{n}X^TX$ records how features vary together. It is symmetric by construction.
A Hessian matrix $H_{ij} = \partial^2 f / \partial x_i \partial x_j$ records second derivatives. By Clairaut’s theorem, mixed partials commute, so $H$ is symmetric.
A stiffness matrix $K$ encodes how connected springs push on each other. Newton’s third law forces $K = K^T$.
A kernel or Gram matrix $G_{ij} = \langle x_i, x_j \rangle$ measures pairwise similarity. Inner products are symmetric, so $G$ is too.

This chapter explains why symmetry buys you so much, and how the geometry of quadratic forms lets you read off the behaviour of a symmetric matrix at a glance.

Orthogonality and Projections -- When Vectors Mind Their Own Business

Wed, 12 Feb 2025 09:00:00 +0000

Why Orthogonality Matters

Two vectors are orthogonal when they “do not interfere” with one another. That single idea – one direction tells you nothing about the other – powers GPS positioning, noise-canceling headphones, JPEG compression, recommendation systems, and most of numerical linear algebra.

Orthogonality is the single biggest computational shortcut in linear algebra. With a generic basis, finding coordinates is solving a linear system. With an orthogonal basis, finding coordinates is one dot product per axis. Hard problem, easy problem, same problem – just a better basis.

Eigenvalues and Eigenvectors

Wed, 05 Feb 2025 09:00:00 +0000

The Big Question

Apply a matrix to a vector and almost anything can happen. Most vectors get rotated and stretched, landing in a brand new direction. But scattered among them are a few special vectors that refuse to leave their span. They come out of the transformation pointing exactly the way they went in – only longer, shorter, or flipped.

These survivors are eigenvectors. The factor by which they get scaled is the eigenvalue.

Linear Systems and Column Space

Wed, 29 Jan 2025 09:00:00 +0000

The Central Question

Almost everything in applied mathematics eventually lands on the same question:

Given a matrix $A$ and a vector $\vec{b}$, does the equation $A\vec{x} = \vec{b}$ have a solution? If so, how many?

The mechanical answer is “row-reduce and look.” The structural answer is far more interesting – and it is the goal of this chapter. Three geometric objects tell you everything:

Column space $C(A)$ – the set of vectors $A$ can reach. It decides whether a solution exists.
Null space $N(A)$ – the set of vectors $A$ crushes to zero. It decides how many solutions exist.
Rank $r$ – the dimension of the column space. It quantifies how much information $A$ preserves.

Once these three are clear, every linear-systems result – existence, uniqueness, least squares, the four fundamental subspaces – becomes the same story told from different angles.

The Secrets of Determinants

Wed, 22 Jan 2025 09:00:00 +0000

Beyond the Formula

In most classrooms, determinants are introduced as a formula to memorize:

$$\det\begin{pmatrix}a & b\\ c & d\end{pmatrix} = ad - bc$$

You plug in numbers, compute, and move on. That misses the point entirely.

Here is the real meaning, in one sentence:

The determinant of $A$ is the factor by which $A$ scales area (in 2D) or volume (in 3D).

Once you internalize this, every property of determinants stops being a rule to memorize and starts being something you can see. The product rule $\det(AB) = \det(A)\det(B)$ becomes obvious – two scalings compose multiplicatively. $\det(A) = 0$ means space gets crushed flat. $\det(A^{-1}) = 1/\det(A)$ says the inverse must undo the scaling. The sign of the determinant tells you whether orientation was preserved or flipped.

Matrices as Linear Transformations

Wed, 15 Jan 2025 09:00:00 +0000

The Big Idea

Open a traditional textbook and matrices show up as “rectangular arrays of numbers.” You learn rules for adding and multiplying them, but no one explains why the multiplication rule looks the way it does, or why $AB \neq BA$ in general.

Here is the secret the symbol-pushing version hides: a matrix is a function that transforms space. Every $m \times n$ matrix is a machine that eats an $n$-dimensional vector and spits out an $m$-dimensional one. Once you can see that, the strange rules stop being strange. They are simply the bookkeeping for what happens to the basis vectors.

Linear Combinations and Vector Spaces

Wed, 08 Jan 2025 09:00:00 +0000

Why This Chapter Matters

Open a box of crayons that contains only red, green, and blue. How many colors can you draw? The honest answer is infinitely many — every shade you have ever seen on a screen is just a different mix of those three. Three “ingredients” produce an entire universe.

That recipe — take a few vectors, scale them, add them up — is called a linear combination. The whole of linear algebra is built on this one move. Once you understand it deeply, you also understand:

The Essence of Vectors -- More Than Just Arrows

Wed, 01 Jan 2025 09:00:00 +0000

Why Vectors, and Why Care?

A physicist talks about a force. A data scientist talks about a feature. A game programmer talks about a velocity. A quantum theorist talks about a state. Different worlds, different languages – but the same underlying object: a vector.

That is not a coincidence. A vector is the smallest piece of mathematics flexible enough to describe anything you can add together and scale. Once you spot that pattern, you spot it everywhere.