Series · Linear Algebra · Chapter 2

Essence of Linear Algebra (2): Linear Combinations and Vector Spaces

If vectors are building blocks, linear combinations are the blueprint. This chapter develops the five concepts that the rest of linear algebra is built on: span, linear independence, basis, dimension, and subspaces.

Essence of Linear Algebra (2): Linear Combinations and Vector Spaces — Chapter overview


Why This Chapter Matters#

Open a box of crayons that contains only red, green, and blue. How many colors can you draw? The honest answer is infinitely many — every shade you have ever seen on a screen is just a different mix of those three. Three “ingredients” produce an entire universe.

That recipe — take a few vectors, scale them, add them up — is called a linear combination. The whole of linear algebra is built on this one move. Once you understand it deeply, you also understand:

  • span — every place a set of vectors can reach,
  • linear independence — when none of the ingredients are wasted,
  • basis — the smallest complete set of ingredients,
  • dimension — how many independent ingredients a space requires,
  • subspaces — smaller worlds living inside bigger ones.

These five words are the working vocabulary of linear algebra. Every later chapter — matrices, determinants, eigenvalues, SVD — is a sentence written using them.

Prerequisites#

  • Chapter 1 : vectors, addition, scalar multiplication, and the geometric picture of $\mathbb{R}^2$ and $\mathbb{R}^3$ .

What Is a Linear Combination?#

The recipe#

$$c_1 \vec{v}_1 + c_2 \vec{v}_2 + \cdots + c_k \vec{v}_k.$$

Two operations, nothing more: scale each vector, then add. The word linear means no squares, no products of components, no nonlinear functions — just the two basic operations of a vector space.

Three everyday pictures#

Mixing cocktails. Two base spirits sit on the shelf:

  • Spirit $\vec{a}=(0.40,\,10)$ : 40 % alcohol, 10 g/L sugar
  • Spirit $\vec{b}=(0.20,\,30)$ : 20 % alcohol, 30 g/L sugar

You want a drink with profile $\vec{t}=(0.30,\,20)$ . Solving $x\vec{a}+y\vec{b}=\vec{t}$ gives $x=y=0.5$ . The target is the linear combination $0.5\vec{a}+0.5\vec{b}$ .

Walking directions. “300 m east, then 400 m north.” Your displacement is the linear combination $300\,\vec{e}_\text{east}+400\,\vec{e}_\text{north}$ .

$$\text{color}=r\!\begin{pmatrix}255\\0\\0\end{pmatrix}+g\!\begin{pmatrix}0\\255\\0\end{pmatrix}+b\!\begin{pmatrix}0\\0\\255\end{pmatrix}.$$

Three primary colors, infinitely many results.

Why the word “linear”?#

Take a single nonzero $\vec{v}\in\mathbb{R}^2$ . As $c$ sweeps over $\mathbb{R}$ , the multiples $c\vec{v}$ trace out a line through the origin. That straight line — the geometric shadow of scalar multiplication — is where the word linear comes from.

Add a second non-parallel vector $\vec{w}$ and the picture explodes from a line into the whole plane: every point in $\mathbb{R}^2$ can be written as $a\vec{v}+b\vec{w}$ for exactly one pair $(a,b)$ .

Linear combination of two vectors and the lattice they generate

The left panel shows one specific combination $1.5\vec{v}+1.2\vec{w}$ built by the parallelogram rule. The right panel shows what happens when $a$ and $b$ are allowed to roam: the dots tile the entire plane.


Span — Everywhere the Vectors Can Reach#

Definition#

$$\operatorname{span}(\vec{v}_1,\ldots,\vec{v}_k)=\{c_1\vec{v}_1+\cdots+c_k\vec{v}_k\mid c_i\in\mathbb{R}\}.$$

Imagine each vector as a dial on a remote control. Turn the dials however you like; the set of all positions you can reach is the span.

A catalogue of shapes#

VectorsSpan
One nonzero vector in $\mathbb{R}^2$ or $\mathbb{R}^3$A line through the origin
Two parallel vectorsStill just that line — the second one adds no direction
Two non-parallel vectors in $\mathbb{R}^2$All of $\mathbb{R}^2$
Two non-parallel vectors in $\mathbb{R}^3$A plane through the origin
Three coplanar vectors in $\mathbb{R}^3$Still just that plane
Three non-coplanar vectors in $\mathbb{R}^3$All of $\mathbb{R}^3$

Three structural facts hold no matter what:

  • The span always passes through the origin — set every $c_i=0$ .
  • The span is closed: combine any two of its points and you stay inside.
  • Adding a vector that is already reachable never enlarges the span.

Span: line, repeated line, plane

Left: one vector spans a line. Middle: a parallel partner contributes no new direction, the span is the same line. Right: two truly different directions sweep out the whole plane.

Practical question — can I mix it?#

A lab has three solutions:

  • $\vec{A}=(5\%,\,10\%)$ acid/salt
  • $\vec{B}=(10\%,\,5\%)$
  • $\vec{C}=(2\%,\,2\%)$

Can you produce the target $(15\%,\,12\%)$ ? Since $\vec{A}$ and $\vec{B}$ are not parallel, $\operatorname{span}(\vec{A},\vec{B})=\mathbb{R}^2$ already. The target is therefore reachable — and adding $\vec{C}$ does not help us do anything new, it just makes the recipe non-unique.


Linear Independence — No Wasted Vectors#

The core idea#

Sometimes adding a vector buys you nothing because it was already in the span. Linear independence says: every vector pulls its own weight.

Definition#

$$c_1\vec{v}_1+\cdots+c_k\vec{v}_k=\vec{0}\;\;\Longrightarrow\;\; c_1=\cdots=c_k=0.$$

If some non-trivial combination produces $\vec{0}$ , the set is linearly dependent — at least one vector is a combination of the others, so it is redundant.

Geometric translation#

  • In $\mathbb{R}^2$ : two vectors are independent $\iff$ they are not parallel.
  • In $\mathbb{R}^3$ : three vectors are independent $\iff$ they are not coplanar.
  • In $\mathbb{R}^n$ : more than $n$ vectors must be dependent — there are only $n$ truly different directions to go.

Independent vs dependent vectors

On the right, $\vec{v}_3=0.8\vec{v}_1+0.6\vec{v}_2$ . The dashed parallelogram exhibits the dependence — $\vec{v}_3$ adds no new direction, so the three vectors are dependent.

Three ways to test it#

  1. Definition. Solve $c_1\vec{v}_1+\cdots+c_k\vec{v}_k=\vec{0}$ . If the only solution is the trivial one, you have independence.
  2. Determinant (when you have $n$ vectors in $\mathbb{R}^n$ ). Stack them as columns of a square matrix and compute $\det$ . Nonzero $\Rightarrow$ independent.
  3. Rank. Stack them as columns and compute the rank. Equal to the number of vectors $\Rightarrow$ independent.
1
2
3
4
5
6
7
8
import numpy as np

v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
v3 = np.array([7, 8, 9])      # = 2*v2 - v1, so dependent

A = np.column_stack([v1, v2, v3])
print(np.linalg.matrix_rank(A))   # 2, not 3 -> dependent

Why independence is non-negotiable#

If $\{\vec{v}_1,\ldots,\vec{v}_k\}$ is independent, then every vector in their span has exactly one representation as a combination of them. That uniqueness is what makes coordinates well-defined: the pair $(3,5)$ wouldn’t mean anything if there were two different ways to build the same point.


Basis — The Smallest Complete Toolbox#

Essence of Linear Algebra (2): Linear Combinations and Vector Spaces — Chapter summary

Definition#

A basis of a vector space $V$ is a set of vectors that is

  1. linearly independent (no redundancy), and
  2. spans $V$ (covers every vector in the space).

Remove anything and you lose coverage. Add anything and you gain redundancy. A basis is the minimal spanning set, simultaneously the maximal independent set.

The standard basis of $\mathbb{R}^n$ #

$$ \vec{e}_1=\!\begin{pmatrix}1\\0\\\vdots\\0\end{pmatrix},\; \vec{e}_2=\!\begin{pmatrix}0\\1\\\vdots\\0\end{pmatrix},\;\ldots,\; \vec{e}_n=\!\begin{pmatrix}0\\\vdots\\0\\1\end{pmatrix}. $$

When you write $\vec{v}=(3,5)$ , what you really mean is $\vec{v}=3\vec{e}_1+5\vec{e}_2$ . The standard basis is so familiar that we forget it is a choice.

Standard basis vs a rotated basis

The same point $\vec{u}=(3,2)$ has coordinates $(3,2)$ in the standard basis on the left, and different coordinates in the rotated basis on the right. The arrow itself never moved; the grid under it changed.

Bases are not unique#

$$ \left\{\!\begin{pmatrix}1\\0\end{pmatrix},\!\begin{pmatrix}0\\1\end{pmatrix}\!\right\},\quad \left\{\!\begin{pmatrix}1\\1\end{pmatrix},\!\begin{pmatrix}1\\-1\end{pmatrix}\!\right\},\quad \left\{\!\begin{pmatrix}2\\0\end{pmatrix},\!\begin{pmatrix}0\\3\end{pmatrix}\!\right\}. $$

Different bases give different coordinates for the same vector — but the vector (the geometric arrow) is the same in all of them.

Coordinates depend on the basis#

$$4\!\begin{pmatrix}1\\1\end{pmatrix}+(-1)\!\begin{pmatrix}1\\-1\end{pmatrix}=\begin{pmatrix}3\\5\end{pmatrix}.$$

A “vector” is the geometric object. A “coordinate tuple” is what the vector looks like after you commit to a basis. This distinction is one of the most freeing ideas in linear algebra — and it’s exactly what change of basis is about.

The same point, two coordinate grids


Dimension — Counting the Degrees of Freedom#

Definition#

The dimension of $V$ , written $\dim(V)$ , is the number of vectors in any basis of $V$ . A theorem (which we’ll take on faith for now) guarantees that every basis of $V$ has the same size, so the definition makes sense.

Three equivalent intuitions#

Dimension counts:

  • The number of independent parameters needed to pinpoint a vector,
  • The number of independent directions of motion,
  • The maximum number of linearly independent vectors that fit in the space.
SpaceDimensionWhy
$\{\vec{0}\}$0Nowhere to go
A line through origin1Forward / backward
A plane through origin2Forward/back + left/right
$\mathbb{R}^3$3Add up/down
$\mathbb{R}^n$$n$$n$ independent directions

The dimension theorem#

In an $n$ -dimensional space:

  • More than $n$ vectors are always dependent.
  • Exactly $n$ independent vectors form a basis.
  • Fewer than $n$ vectors cannot span the whole space.

This is why dimension feels like the capacity of a space — it is the upper bound on how many independent things can coexist inside it.


Subspaces — Spaces Inside Spaces#

Definition#

A subspace $W$ of $V$ is a non-empty subset that is itself a vector space. Concretely, $W\subseteq V$ is a subspace iff:

  1. $\vec{0}\in W$ ,
  2. $\vec{u},\vec{v}\in W \implies \vec{u}+\vec{v}\in W$ (closed under addition),
  3. $\vec{v}\in W,\,c\in\mathbb{R} \implies c\vec{v}\in W$ (closed under scaling).

Conditions 2 and 3 say: you cannot escape the subspace by adding or scaling. Condition 1 is automatic if 2 and 3 hold and $W$ is non-empty (set $c=0$ ), but stating it explicitly avoids subtle edge cases.

The complete list in $\mathbb{R}^3$ #

There are only four kinds of subspaces of $\mathbb{R}^3$ :

  • $\{\vec{0}\}$ — dimension 0,
  • any line through the origin — dimension 1,
  • any plane through the origin — dimension 2,
  • $\mathbb{R}^3$ itself — dimension 3.

A line or plane that does not pass through the origin is not a subspace — it fails condition 1, and it is also not closed under addition.

Subspaces in 3D: line through origin, plane through origin

Span always produces a subspace#

For any set of vectors $S=\{\vec{v}_1,\ldots,\vec{v}_k\}$ , $\operatorname{span}(S)$ is automatically a subspace. This gives the simplest possible recipe: pick some vectors, take their span, you have a subspace. Almost every subspace you ever meet is constructed this way.

Linear vs affine — why the origin matters#

A line that misses the origin is an affine set, not a linear subspace. The picture below makes the distinction crisp: on the left, $\vec{u}+\vec{w}$ stays on the line; on the right, $\vec{p}_1+\vec{p}_2$ jumps off the line entirely.

Subspaces must contain the origin: linear vs affine

This is why affine geometry (translations) is not the same as linear algebra (rotations and scalings). Linear maps fix the origin; affine maps do not. Whenever you read “subspace,” read “passes through the origin and is closed under +/×.”

Dimension formula for sums#

$$\dim(U+W)=\dim(U)+\dim(W)-\dim(U\cap W).$$

It is the inclusion–exclusion principle, ported to vector spaces. We will see this formula again in Chapter 5 when we count solutions of linear systems.


Case Study — RGB as a Vector Space#

The RGB color model is the cleanest real-world illustration of everything in this chapter:

  • Each color is a 3D vector $\vec{c}=(r,g,b)$ .
  • The basis is $\{\vec{r},\vec{g},\vec{b}\}$ , the three primaries.
  • $\dim(\text{RGB})=3$ — three independent channels, three dials.
1
2
3
4
5
6
7
import numpy as np

red, green, blue = np.eye(3) * 255

yellow = red + green          # [255, 255,   0]
purple = red + blue           # [255,   0, 255]
white  = red + green + blue   # [255, 255, 255]

Grayscale is a 1D subspace: $\vec{c}=k(1,1,1)$ for $k\in[0,255]$ — one dial, one degree of freedom, lying along the diagonal of the RGB cube.

Color blindness (some types) projects RGB onto a 2D subspace: the missing dimension is the one a person cannot distinguish.

Color-space conversion (RGB → HSV, LAB, …) is a change of basis: same colors, new coordinates.


Common Pitfalls#

"$\vec{v}_1=(1,2)$ and $\vec{v}_2=(2,4)$ span $\mathbb{R}^2$ ." No. $\vec{v}_2=2\vec{v}_1$ , so they span only the line $y=2x$ .

“Three vectors always span more than two.” Only if the third vector is outside the span of the first two.

“Independent vectors must be perpendicular.” No. $(1,0)$ and $(1,1)$ are independent but not orthogonal. Orthogonality is a stronger condition than independence.

“A space has a unique basis.” Every space has infinitely many bases. What is unique is the dimension — the number of vectors in any basis.

“Any subset is a subspace.” Subspaces must contain $\vec{0}$ and be closed under $+$ and scalar multiplication. Most subsets fail.


Code Lab#

Is a set linearly independent?#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import numpy as np

def is_independent(vectors):
    """Return True iff the given vectors are linearly independent."""
    A = np.column_stack(vectors)
    return np.linalg.matrix_rank(A) == len(vectors)

print(is_independent([np.array([1, 2, 3]),
                      np.array([4, 5, 6]),
                      np.array([7, 8, 9])]))   # False
print(is_independent(list(np.eye(3))))         # True

Is a target vector inside a span?#

1
2
3
4
5
6
7
8
9
def in_span(target, vectors, tol=1e-8):
    """Check whether `target` lies in span(vectors)."""
    A = np.column_stack(vectors)
    coeffs, *_ = np.linalg.lstsq(A, target, rcond=None)
    return np.allclose(A @ coeffs, target, atol=tol)

v1, v2 = np.array([1, 1]), np.array([2, 2])     # parallel
print(in_span(np.array([1, 0]), [v1, v2]))      # False
print(in_span(np.array([3, 3]), [v1, v2]))      # True

Extract a basis from a redundant set#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def extract_basis(vectors):
    """Greedy: keep a vector iff it raises the rank."""
    chosen, M = [], np.zeros((len(vectors[0]), 0))
    for v in vectors:
        Mtest = np.column_stack([M, v])
        if np.linalg.matrix_rank(Mtest) > M.shape[1]:
            chosen.append(v)
            M = Mtest
    return chosen

vs = [np.array([1, 0, 0]),
      np.array([0, 1, 0]),
      np.array([1, 1, 0]),    # = v1 + v2  -> redundant
      np.array([0, 0, 1])]
print(len(extract_basis(vs)))   # 3 -> {v1, v2, v4}

Summary#

ConceptDefinitionPicture
Linear combination$c_1\vec{v}_1+\cdots+c_k\vec{v}_k$Weighted sum of vectors
SpanAll linear combinationsEvery reachable point
Linear independenceZero combination $\Rightarrow$ all coefficients zeroNo redundant arrows
BasisIndependent and spans the spaceSmallest complete toolbox
DimensionSize of any basisDegrees of freedom
SubspaceClosed under $+$ and $\cdot$ , contains $\vec{0}$Space inside a space

These six ideas thread through everything that follows:

  • Ch 3 — A matrix’s column space is the span of its columns.
  • Ch 4 — The determinant tests linear independence in a single number.
  • Ch 5 — Solution sets of $A\vec{x}=\vec{0}$ are subspaces (the null space).
  • Ch 6Eigenvectors are special bases that diagonalize a matrix.
  • Ch 9SVD delivers an “optimal” pair of orthonormal bases.

When the Basis Assumption Breaks: Near-Dependent Columns#

The clean dichotomy “vectors are either independent or dependent” is a lie that floating-point arithmetic does not respect. In practice, columns are near-dependent, and the symptom is a basis matrix whose determinant is $10^{-15}$ instead of $0$ .

Consider three columns of length 100, where the third is the sum of the first two plus tiny noise:

1
2
3
4
5
6
7
8
9
import numpy as np
rng = np.random.default_rng(0)
v1 = rng.standard_normal(100)
v2 = rng.standard_normal(100)
v3 = v1 + v2 + 1e-10 * rng.standard_normal(100)
A = np.column_stack([v1, v2, v3])
print(np.linalg.matrix_rank(A))           # 3  -- "independent"
print(np.linalg.matrix_rank(A, tol=1e-8)) # 2  -- the truth
print(np.linalg.svd(A, compute_uv=False)) # [~14, ~10, ~1e-9]

matrix_rank defaults to a tolerance based on the largest singular value. With strict numerics it reports rank 3, but the third singular value is $10^{-9}$ — the column is effectively dependent. If you build a model that inverts $A^TA$ , the inversion blows up: the condition number is $\sim 10^{10}$ .

The right tool is the SVD (Chapter 9 ). The singular values $\sigma_1 \ge \sigma_2 \ge \cdots$ are a continuous measure of “how independent” the columns are. The ratio $\sigma_1/\sigma_n$ is the condition number; a basis where this ratio exceeds $10^{8}$ is no basis at all for double-precision work. Numerical rank is then defined as the count of singular values above some threshold, typically $\max(m,n)\cdot \varepsilon \cdot \sigma_1$ .

This matters constantly. Polynomial regression on $1, x, x^2, \ldots, x^{15}$ over $x \in [0,1]$ has condition number $> 10^{20}$ — the monomial basis is theoretically independent but practically useless. Switch to Chebyshev or Legendre polynomials and the condition number drops to $\sim 10^{2}$ . Same span, vastly different basis.

Connection to ML: Feature Redundancy and the Rank of the Data Matrix#

In a tabular ML pipeline, your design matrix $X \in \mathbb{R}^{n\times d}$ has one row per sample and one column per feature. The column space of $X$ is the set of label functions a linear model can possibly represent. Three things that this chapter’s vocabulary lets you diagnose precisely:

  1. One-hot collinearity. If you one-hot encode a categorical with $k$ levels into $k$ columns, those columns sum to the all-ones column. They are linearly dependent: the rank drops by 1. Linear regression’s normal equations $X^TX \beta = X^T y$ become singular. Fix: drop one level (the “dummy variable trap”), or add an L2 penalty.

  2. Redundant engineered features. If you add height_cm, height_m, and height_inches, you have added one feature, twice. The span has not grown. Tree models do not care; linear models silently give you nonsense coefficients because the inverse is ill-defined.

  3. PCA = picking the best low-dimensional subspace. PCA finds the $k$ -dimensional subspace of $\mathbb{R}^d$ — the span of the top $k$ principal directions — that captures maximum variance. It is exactly the question “what is the best $k$ -vector basis for my cloud of data?”, answered by SVD. Reducing 1000 features to 50 means trading a basis you wrote down for one the data chose.

So when this chapter says basis and dimension, the ML reading is: a basis is a parameterisation of your hypothesis class, and dimension is its capacity. Choosing a basis is a modelling decision, not a bookkeeping one.


What’s Next#

Chapter 3 — Matrices as Linear Transformations. A matrix is not a passive table of numbers; it is an agent of transformation. We will see that:

  • multiplying $A\vec{x}$ is geometrically the action of $A$ on $\vec{x}$ ,
  • rotation, scaling, shearing, and projection are all matrices,
  • matrix multiplication is exactly composition of transformations,
  • the column space of $A$ is precisely the span we just defined.
In this series

Linear Algebra 18 parts

  1. 01 Essence of Linear Algebra (1): The Essence of Vectors — More Than Just Arrows
  2. 02 Essence of Linear Algebra (2): Linear Combinations and Vector Spaces you are here
  3. 03 Essence of Linear Algebra (3): Matrices as Linear Transformations
  4. 04 Essence of Linear Algebra (4): The Secrets of Determinants
  5. 05 Essence of Linear Algebra (5): Linear Systems and Column Space
  6. 06 Essence of Linear Algebra (6): Eigenvalues and Eigenvectors
  7. 07 Essence of Linear Algebra (7): Orthogonality and Projections — When Vectors Mind Their Own Business
  8. 08 Essence of Linear Algebra (8): Symmetric Matrices and Quadratic Forms — The Best Matrices in Town
  9. 09 Essence of Linear Algebra (9): Singular Value Decomposition — The Crown Jewel of Linear Algebra
  10. 10 Essence of Linear Algebra (10): Matrix Norms and Condition Numbers — Is Your Linear System Healthy?
  11. 11 Essence of Linear Algebra (11): Matrix Calculus and Optimization — The Engine Behind Machine Learning
  12. 12 Essence of Linear Algebra (12): Sparse Matrices and Compressed Sensing — Less Is More
  13. 13 Essence of Linear Algebra (13): Tensors and Multilinear Algebra
  14. 14 Essence of Linear Algebra (14): Random Matrix Theory
  15. 15 Essence of Linear Algebra (15): Linear Algebra in Machine Learning
  16. 16 Essence of Linear Algebra (16): Linear Algebra in Deep Learning
  17. 17 Essence of Linear Algebra (17): Linear Algebra in Computer Vision
  18. 18 Essence of Linear Algebra (18): Frontiers and Summary

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub