Series · Linear Algebra · Chapter 1

The Essence of Vectors -- More Than Just Arrows

Vectors are everywhere -- from GPS navigation to Netflix recommendations. This chapter builds your intuition from arrows in space to abstract vector spaces, covering addition, scalar multiplication, inner products, norms, and why linearity matters.

Why Vectors, and Why Care?

A physicist talks about a force. A data scientist talks about a feature. A game programmer talks about a velocity. A quantum theorist talks about a state. Different worlds, different languages – but the same underlying object: a vector.

That is not a coincidence. A vector is the smallest piece of mathematics flexible enough to describe anything you can add together and scale. Once you spot that pattern, you spot it everywhere.

Most introductory courses give you two answers to “what is a vector?”:

  • “An arrow in space, with a length and a direction.”
  • “An ordered list of numbers.”

Both are correct. Both are also incomplete. The full story is that a vector is whatever lives in a vector space – a set of objects that play nicely with addition and scaling. Arrows and number lists are the two famous examples; functions, signals, polynomials and quantum states are equally valid.

This chapter walks that ladder, from very concrete to genuinely abstract:

  1. Geometric – arrows you can draw.
  2. Numerical – columns of numbers you can compute with.
  3. Structural – the inner product, which fuses the two.
  4. Axiomatic – the rules that make the whole thing work for any such object.

What You Will Learn

  • How to think about vectors geometrically (arrows) and numerically (lists)
  • The three operations: addition, scalar multiplication, subtraction
  • The dot product – algebra and geometry agreeing on what “alignment” means
  • Several ways to measure size (norms), and why we need more than one
  • The axiomatic vector-space definition – why functions can be vectors too

Prerequisites

  • High-school algebra
  • Coordinate geometry on the$x$–$y$plane
  • Pythagoras’ theorem

1. The Geometric Picture: Vectors as Arrows

1.1 From Walking Directions to a Vector

Stand at the centre of a park (call it the origin). A friend says: “Walk 4 steps east, then 3 steps north.”

That instruction is a vector. We write it as

$$\vec{v} = \begin{pmatrix} 4 \\ 3 \end{pmatrix},$$

with$4$pointing east (the$x$component) and$3$pointing north (the$y$component).

Two facts fall out immediately. The magnitude – how far you actually end up from the start – is just Pythagoras:

$$\|\vec{v}\| = \sqrt{4^2 + 3^2} = 5.$$

And the direction is the angle north of east:

$$\theta = \arctan\!\left(\frac{3}{4}\right) \approx 37^\circ.$$

So a vector packages two pieces of geometric information – length and direction – into a single object.

A 2D vector as a directed arrow with length, angle, and components

1.2 Translation Invariance: A Vector Has No Home

Here is the first idea that surprises beginners: a vector does not care where it starts. Whether you draw “4 east, 3 north” from the park centre or from the northeast corner, it is the same vector. Direction and length are the only invariants.

Velocity is the cleanest example. A ship sailing east at 20 knots has the same velocity vector whether it is in the middle of the Pacific or hugging the coast of Spain. The position changes from minute to minute; the velocity vector does not.

This is why, when we draw vectors, we are free to anchor them at the origin – a convenience, not a constraint.

Position vector vs free vector: same arrow, three locations

1.3 Vector Addition: Three Pictures of the Same Thing

Given two vectors$\vec{a}$and$\vec{b}$, the sum$\vec{a} + \vec{b}$can be visualised three ways. They are all equivalent.

Head-to-tail. Walk along$\vec{a}$, then walk along$\vec{b}$. The arrow from your starting point to your ending point is the sum. This is how we naturally think about successive displacements.

Parallelogram. Draw both vectors from the same point. Complete the parallelogram. The diagonal from that shared point is the sum. This is how physicists combine two forces acting at one location.

Component-wise. Just add matching coordinates:

$$\begin{pmatrix} 3 \\ 1 \end{pmatrix} + \begin{pmatrix} 1 \\ 2.5 \end{pmatrix} = \begin{pmatrix} 4 \\ 3.5 \end{pmatrix}.$$

Use the geometric pictures to understand; use components to compute. They are two faces of one coin.

Vector addition: head-to-tail and parallelogram both yield the same sum

1.4 Scalar Multiplication: Stretch, Shrink, Reverse

Multiplying a vector$\vec{v}$by a number (a scalar)$c$rescales it along its own line. The behaviour splits into four regimes:

-$c > 1$: the arrow stretches in the same direction. -$0 < c < 1$: it shrinks in the same direction. -$c = 0$: it collapses to the zero vector. -$c < 0$: it reverses direction and rescales by$|c|$.

The set of all scalar multiples of a single non-zero vector traces out a line through the origin – this is your first taste of a span, which we will meet properly in Chapter 2.

Scalar multiplication and the line of all multiples (the span)

1
2
3
4
5
6
import numpy as np

v = np.array([3, 4])
print( 2.0 * v)   # [ 6.   8. ]   stretched
print( 0.5 * v)   # [ 1.5  2. ]   shrunk
print(-1.0 * v)   # [-3.  -4. ]   reversed

A driving analogy makes this stick. If$\vec{v}$is your current velocity, then$2\vec{v}$is doubling your speed,$0.5\vec{v}$is cruising at half-speed, and$-\vec{v}$is a U-turn at the same speed.

1.5 Vector Subtraction: “From Here to There”

If$\vec{a}$and$\vec{b}$are position vectors of two points$A$and$B$(arrows from the origin), then

$$\vec{b} - \vec{a} \;=\; \text{the vector that takes you from } A \text{ to } B.$$

Game engines use this constantly: the displacement from a player to a target, the direction a bullet should travel, the offset between two waypoints – all are subtractions of position vectors.


2. The Numerical Picture: Vectors as Data

2.1 Beyond 2D and 3D

Drawing arrows works fine in two or three dimensions. The real power of the vector concept is that the algebra keeps working in any number of dimensions:

$$\vec{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} \in \mathbb{R}^n.$$

You cannot picture a 100-dimensional arrow, but every operation – addition, scaling, dot products, norms – carries over unchanged. This is the bridge from “geometry you can see” to “geometry you can only compute.”

2.2 The Same Object, Three Viewpoints

Here is the same five numbers,$\{25.3,\,65.0,\,1013,\,15.2,\,45\}$, viewed from three different traditions. They are all the same vector.

One vector, three viewpoints: arrow, feature vector, column

Physics. An arrow in physical space, e.g. a velocity or a force.

Computer science. A feature vector: a row of measurements characterising one example – here, weather conditions at one moment (temperature, humidity, pressure, wind speed, cloud cover).

Mathematics. A column of real numbers, an element of$\mathbb{R}^5$.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import numpy as np

# A 28x28 grayscale image becomes a 784-dimensional vector.
image = np.array([
    [  0, 128, 255],
    [ 64, 192,  32],
    [100,  50, 200],
])
image_vector = image.flatten()
print(image_vector.shape)   # (9,)

Once you accept the three viewpoints as one object, very different things become the same calculation. Comparing two users’ movie tastes, or comparing two images for similarity, or comparing two molecular fingerprints – all reduce to the same dot-product computation we are about to define.

1
2
3
4
5
6
7
8
import numpy as np

alice = np.array([5, 3, 0, 1, 4])
bob   = np.array([4, 0, 5, 2, 4])
carol = np.array([5, 4, 1, 1, 5])

print(f"alice . bob   = {np.dot(alice, bob)}")     # 28
print(f"alice . carol = {np.dot(alice, carol)}")   # 57  -- much closer!

A famous NLP example takes this even further: words become 300-dimensional vectors trained so that$\text{king} - \text{man} + \text{woman} \approx \text{queen}$. Vector arithmetic captures meaning.


3. The Inner Product: Where Geometry Meets Algebra

The inner product (or dot product in$\mathbb{R}^n$) is the deepest single operation in this chapter. It hides a small miracle: two definitions that look completely unrelated turn out to give the same number.

3.1 Two Definitions, One Operation

Algebraic:

$$\vec{a} \cdot \vec{b} \;=\; \sum_{i=1}^{n} a_i b_i \;=\; a_1 b_1 + a_2 b_2 + \cdots + a_n b_n.$$

Geometric (in$\mathbb{R}^2$or$\mathbb{R}^3$):

$$\vec{a} \cdot \vec{b} \;=\; \|\vec{a}\|\,\|\vec{b}\|\,\cos\theta,$$

where$\theta$is the angle between$\vec{a}$and$\vec{b}$.

The algebraic form tells you how to compute. The geometric form tells you what it means: the dot product measures how much two vectors agree in direction. Same direction$\Rightarrow$large positive number. Perpendicular$\Rightarrow$exactly zero. Opposite directions$\Rightarrow$large negative number.

3.2 Projection: The Best Approximation

A picture turns the dot product from formula into geometry. Drop a perpendicular from the tip of$\vec{a}$onto the line through$\vec{b}$. The shadow you create is the projection of$\vec{a}$onto$\vec{b}$:

$$\operatorname{proj}_{\vec{b}}\vec{a} \;=\; \frac{\vec{a}\cdot\vec{b}}{\vec{b}\cdot\vec{b}}\,\vec{b}.$$

This is the closest point on the line through$\vec{b}$to the tip of$\vec{a}$. That is not a coincidence – it is the same idea behind least-squares regression, principal components, signal filtering, and a dozen other “best linear approximation” problems we will meet again and again.

Dot product as projection: positive when angle is acute, negative when obtuse

1
2
3
4
5
6
import numpy as np

a = np.array([3, 4])
b = np.array([1, 0])
proj = (np.dot(a, b) / np.dot(b, b)) * b
print(proj)   # [3. 0.]

3.3 Orthogonality: Independence, Made Geometric

When$\vec{a} \cdot \vec{b} = 0$, the two vectors are orthogonal (perpendicular), written$\vec{a} \perp \vec{b}$. Geometrically that means a 90-degree angle. Why is that worth a name?

  • Geometry: orthogonal directions don’t interfere – moving along one does not change the projection onto the other.
  • Statistics: uncorrelated random variables correspond to orthogonal vectors (covariance$= 0$).
  • Computation: an orthogonal basis decouples a problem into independent one-dimensional pieces. Almost every “this is fast” trick in numerical linear algebra rides on orthogonality.

3.4 The Cauchy–Schwarz Inequality

For any two vectors,

$$|\vec{a}\cdot\vec{b}| \;\leq\; \|\vec{a}\|\,\|\vec{b}\|,$$

with equality only when one is a scalar multiple of the other. In words: the dot product can never exceed the product of the lengths. It is the algebraic shadow of$|\cos\theta| \leq 1$, and it is the single most-used inequality in all of analysis.


4. Norms: Several Ways to Measure Size

The familiar “length”$\sqrt{x_1^2+\cdots+x_n^2}$is one valid notion of size for a vector. It is the $L^2$norm. But it is not the only one, and which one you choose changes the geometry of your problem.

4.1 The Three Most Common Norms

NormFormulaIntuition
$L^2$(Euclidean)$\sqrt{\sum_i x_i^2}$Straight-line distance
$L^1$(Manhattan)$\sum_i \lvert x_i \rvert$City-block distance, walking only along streets
$L^\infty$(max)$\max_i \lvert x_i \rvert$Worst-case component

Why three? Because different problems care about different things:

-$L^2$is smooth, differentiable everywhere away from the origin, and rotation-invariant. It is the safe default and the natural partner of the dot product. -$L^1$has corners. That non-smoothness is a feature: it pushes optimisers towards solutions where many coordinates are exactly zero. This is the secret behind LASSO regression and compressed sensing. -$L^\infty$is what you reach for when you care about the worst component, not the average one – robust control, error bounds, max-error guarantees.

4.2 Unit Balls: The Geometric Fingerprint of a Norm

A clean way to see a norm is to draw its unit ball – the set of all vectors of length 1 in that norm. Three norms, three very different shapes:

Unit balls of L1, L2, L-infinity: diamond, circle, square

The diamond’s sharp corners on the axes are exactly why$L^1$produces sparse solutions: when you minimise an objective constrained to that diamond, the optimum tends to land on a corner, which means several coordinates are zero.

A reassuring theorem – norm equivalence – says that in finite dimensions, all of these norms are interchangeable in a qualitative sense: a sequence converges in one if and only if it converges in all of them. The numerical values differ; the topology does not.


5. The Abstract Picture: Vector Spaces from Axioms

So far, “vector” has meant “an arrow” or “a column of numbers”. But mathematicians noticed something striking: many wildly different objects – arrows, polynomials, continuous functions, random variables, quantum states – all obey the same rules. So they distilled those rules into a definition.

5.1 The Definition

A vector space$V$over a field$\mathbb{F}$(usually$\mathbb{R}$or$\mathbb{C}$) is a set with two operations,

  • vector addition,$\vec{u} + \vec{v} \in V$, and
  • scalar multiplication,$c\,\vec{v} \in V$,

satisfying ten axioms (commutativity and associativity of addition, existence of a zero vector and of additive inverses, distributivity, associativity of scalar multiplication, and a multiplicative identity). The axioms are the small print; the spirit is one sentence:

Anything you can add together and scale, while obeying the usual algebraic rules, is a vector.

5.2 Surprising Vector Spaces

Continuous functions. All continuous functions on$[0,1]$form a vector space: add them pointwise,$(f+g)(x) = f(x) + g(x)$; scale them pointwise,$(cf)(x) = c\,f(x)$; the zero vector is the constant function$0$. This space is infinite-dimensional – you can think of a function as a vector with one coordinate for every$x \in [0,1]$.

Polynomials. Polynomials of degree at most$n$form an$(n+1)$-dimensional vector space, with the natural basis$\{1, x, x^2, \ldots, x^n\}$.

Matrices. The set of$m \times n$matrices forms an$mn$-dimensional vector space; you add matrices entrywise and scale them entrywise.

Quantum states. A quantum state is a unit vector in a complex Hilbert space. The famous “superposition” of states is just vector addition wearing a tuxedo.

5.3 Why the Abstraction Pays Off

Once you prove a theorem at the level of axioms, it applies everywhere the axioms hold:

  • Cauchy–Schwarz works for column vectors, function spaces, and random variables (where it becomes the covariance inequality$|\operatorname{Cov}(X,Y)| \le \sigma_X \sigma_Y$). One theorem, three subjects.
  • Fourier analysis is “decompose a function as a linear combination of orthogonal basis vectors” – exactly what we will do with column vectors in Chapter 7.
  • Quantum mechanics is “physics, but in an infinite-dimensional inner-product space.”

This is the deepest pay-off of linear algebra: learn the structure once, apply it forever.


6. Vectors in the Wild

6.1 Game Physics in Five Lines

1
2
3
4
5
6
7
8
9
import numpy as np

position     = np.array([100.0, 200.0])
velocity     = np.array([  5.0,  -2.0])
acceleration = np.array([  0.0,  -9.8])   # gravity
dt = 1 / 60                                # one frame at 60 fps

velocity = velocity + acceleration * dt
position = position + velocity * dt

That is Newtonian mechanics, discretised. Every physics engine you have ever played in a game is built on these two lines, repeated 60 times per second.

6.2 Colour Mixing Is Vector Addition

1
2
3
4
5
6
7
import numpy as np

red   = np.array([255,   0,   0])
blue  = np.array([  0,   0, 255])

purple   = (red + blue) // 2     # midpoint -> [127,   0, 127]
dark_red = red // 2              # half-bright -> [127,   0,   0]

RGB colours live in$\mathbb{R}^3$. Mixing them is literally vector arithmetic.

6.3 GPS, Reduced to Vector Equations

GPS positioning is trilateration: your receiver measures its distance$d_i$to each of several satellites at known positions$\vec{s}_i$, giving equations

$$\|\vec{x} - \vec{s}_i\| \;=\; d_i.$$

In the plane, two such equations leave you with two intersection points; a third pins the answer down. In 3D, you need four satellites (a fourth for clock-bias correction). That is it – billion-dollar infrastructure, expressed as vector equations.


7. Common Misconceptions

“A vector must start at the origin.” No. Vectors are translation-invariant; we draw them from the origin only for convenience.

“A vector is just a list of numbers.” Half-true. A list of numbers is one representation of a vector. Continuous functions, polynomials, and quantum states are vectors that cannot be written as finite columns of numbers.

“Dot product and cross product are similar.” They are not. The dot product takes two vectors and returns a scalar; the cross product (which only exists in$\mathbb{R}^3$, with a partial cousin in$\mathbb{R}^7$) takes two vectors and returns a vector. Different inputs in the same place, totally different outputs.

“The zero vector points in some direction.” No. The zero vector is the unique vector with no well-defined direction – which is exactly why formulas like$\vec{v}/\|\vec{v}\|$need a special case.


8. Chapter Summary

ConceptKey Idea
VectorAn object with magnitude and direction – or, more generally, an element of a vector space
AdditionHead-to-tail, parallelogram, or component-wise – all equivalent
Scalar multiplicationStretch, shrink, or reverse along the same line
Inner productMeasures alignment; algebra and geometry agree
ProjectionThe best one-dimensional approximation – prototype of least-squares
NormMeasures size; different norms reflect different priorities
Vector spaceAny set where addition and scaling obey the axioms – including spaces of functions

The Big Takeaway

A vector is not an arrow, and it is not a column of numbers. It is a pattern – a small, sharp set of rules for “things that add and scale”. Once you start looking, you will find that pattern in images, signals, quantum states, financial portfolios, neural-network parameters, and probability distributions. Linear algebra is the study of that pattern.


9. What Comes Next

In Chapter 2: Linear Combinations and Vector Spaces, we ask the next natural questions:

  • How can we build an entire space from a small set of vectors? (span)
  • When are some of those vectors redundant? (linear independence)
  • What is the smallest possible “complete toolbox” for a space? (basis and dimension)

These three ideas turn the static notion of “a vector” into the dynamic notion of “a coordinate system.”


Series Navigation

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub