Series · Linear Algebra · Chapter 4

Essence of Linear Algebra (4): The Secrets of Determinants

Determinants are not just tedious calculations -- they measure how much a transformation stretches or compresses space. This chapter gives you the geometric intuition behind determinants, their key properties, and practical applications.

Essence of Linear Algebra (4): The Secrets of Determinants — Chapter overview


Beyond the Formula#

$$\det\begin{pmatrix}a & b\\ c & d\end{pmatrix} = ad - bc$$

You plug in numbers, compute, and move on. That misses the point entirely.

Here is the real meaning, in one sentence:

The determinant of $A$ is the factor by which $A$ scales area (in 2D) or volume (in 3D).

Once you internalize this, every property of determinants stops being a rule to memorize and starts being something you can see. The product rule $\det(AB) = \det(A)\det(B)$ becomes obvious — two scalings compose multiplicatively. $\det(A) = 0$ means space gets crushed flat. $\det(A^{-1}) = 1/\det(A)$ says the inverse must undo the scaling. The sign of the determinant tells you whether orientation was preserved or flipped.

What you will learn#

  • The geometric meaning of determinants in 2D and 3D
  • What the sign of the determinant tells you (orientation)
  • What $\det = 0$ means (singularity, information loss)
  • Key properties and why each one is geometrically obvious
  • Three ways to actually compute a determinant
  • Applications: Cramer’s Rule, area/volume formulas, the Jacobian

Prerequisites#


2D Determinants: An Area Scaling Factor#

Starting from the unit square#

In the plane, the unit square is the square with corners at $(0,0)$ , $(1,0)$ , $(1,1)$ , $(0,1)$ . It is built from the standard basis vectors $\vec{e}_1 = (1, 0)$ and $\vec{e}_2 = (0, 1)$ , and its area is exactly $1$ .

A $2 \times 2$ matrix $A = \begin{pmatrix}a & b\\ c & d\end{pmatrix}$ sends the basis vectors to the columns of $A$ :

  • $\vec{e}_1 \;\mapsto\; (a,\,c)$ — the first column
  • $\vec{e}_2 \;\mapsto\; (b,\,d)$ — the second column
$$\text{area} = |ad - bc| = |\det(A)|.$$

That is the whole content of the 2D determinant.

Determinant as area scaling factor: the unit square becomes a parallelogram whose area equals $|\det A|$
.

A worked example#

$$A = \begin{pmatrix}3 & 1\\ 0 & 2\end{pmatrix}, \qquad \det(A) = 3\cdot 2 - 1\cdot 0 = 6.$$

The unit square (area $1$ ) becomes a parallelogram of area $6$ . Every shape in the plane is rescaled by the same factor $6$ — a circle of area $\pi$ becomes an ellipse of area $6\pi$ , a triangle of area $0.5$ becomes a triangle of area $3$ , and so on. The matrix does not care about the shape, only about the local area element.

The photocopier analogy#

$$A = \begin{pmatrix}2 & 0\\ 0 & 2\end{pmatrix}, \qquad \det(A) = 4.$$

Width doubles, height doubles, but area quadruples (not doubles). The determinant gives the area scaling directly, and that “$4$ ” is exactly the surprise built into linear maps.

Three transformations, three determinants#

To build intuition, look at three different $A$ ’s acting on the unit square:

Same input shape, three different determinants. Shear preserves area, stretch doubles it, compression halves it.

  • Shear, $\det = 1$ : the parallelogram leans, but its area is unchanged. (Imagine pushing the top of a stack of books sideways — the volume of the stack does not change.)
  • Stretch, $\det = 2$ : one direction is doubled; area doubles.
  • Compression, $\det = 0.5$ : one direction is halved; area is halved.

The determinant captures the one number that all of these transformations agree on: how much the area changed.


The Sign of the Determinant: Orientation#

The absolute value $|\det(A)|$ tells you about size. The sign tells you about orientation.

  • $\det(A) > 0$ : the transformation preserves orientation. A counter-clockwise loop stays counter-clockwise.
  • $\det(A) < 0$ : the transformation flips orientation. A counter-clockwise loop comes out clockwise — exactly what a mirror does.

Example: reflection across the $y$ -axis#

$$A = \begin{pmatrix}-1 & 0\\ \phantom{-}0 & 1\end{pmatrix}, \qquad \det(A) = -1.$$
  • $|\det| = 1$ : area is unchanged.
  • The negative sign records the flip: write a word on a transparent sheet, hold it up to a mirror, and you see exactly what $A$ does.

Reflection sends the right-handed basis to a left-handed one; the determinant becomes $-1$
.

The glove analogy#

Take a right-hand glove. Rotate it, stretch it, squash it — it stays a right-hand glove. But turn it inside out, and it becomes a left-hand glove. That “inside-out” operation is exactly the kind of transformation a negative determinant performs in our model. Rotations and stretches keep $\det > 0$ ; reflections flip the sign.


Determinant Zero: Space Gets Crushed#

If the area scaling factor is $0$ , then area becomes $0$ . In 2D, that can only mean one thing: the entire plane is squashed onto a line (or, in degenerate cases, onto the origin).

Example#

$$A = \begin{pmatrix}1 & 2\\ 2 & 4\end{pmatrix}, \qquad \det(A) = 1\cdot 4 - 2\cdot 2 = 0.$$

The second column $(2, 4)$ is exactly twice the first column $(1, 2)$ . Both basis images lie on the same line through the origin (the line spanned by $(1,2)$ ). Every point of the plane gets sent to that line — the 2D world is collapsed into 1D.

When $\det = 0$
, the entire plane is crushed onto a one-dimensional subspace. Every distinct point in the input is squashed onto the same line.

Why this means non-invertible#

Take a 2D photo and squash it into a line — can you reconstruct the photo? No: countless input points now occupy the same output point, so the map cannot be undone. Information has been destroyed, so $A^{-1}$ does not exist.

$$\det(A) = 0 \;\Longleftrightarrow\; A\text{ is singular} \;\Longleftrightarrow\; \text{the columns of }A\text{ are linearly dependent}.$$

It also gives a fast practical test for linear dependence: just compute the determinant.


3D Determinants: A Volume Scaling Factor#

Essence of Linear Algebra (4): The Secrets of Determinants — Chapter summary

Everything we said in 2D lifts cleanly to 3D. The unit cube is built from $\vec{e}_1, \vec{e}_2, \vec{e}_3$ , and a $3 \times 3$ matrix sends it to a slanted box — a parallelepiped. The determinant gives the (signed) volume of that box.

In 3D, a $3\times 3$
 matrix takes the unit cube to a parallelepiped; $|\det A|$
 is its volume.

The formula#

$$\det\begin{pmatrix}a & b & c\\ d & e & f\\ g & h & i\end{pmatrix} = a(ei - fh) - b(di - fg) + c(dh - eg).$$ $$\det(A) = \vec{v}_1 \cdot (\vec{v}_2 \times \vec{v}_3),$$

which is one of the standard formulas for the (signed) volume of a parallelepiped.

Sign in 3D#

A negative 3D determinant means the right-handed coordinate system has been turned into a left-handed one (e.g. by reflecting one axis). Reflections, point reflections, and odd numbers of mirror flips all give $\det < 0$ .


Properties of Determinants — All Geometric#

Once you see determinants as scaling factors, the algebraic properties stop looking like a list of rules and start looking like statements about scaling.

Multiplicative: $\det(AB) = \det(A)\det(B)$ #

$B$ scales volume by $\det(B)$ ; then $A$ scales the result by $\det(A)$ . Total scaling = product. Like first running a copier at $1.5\times$ then at $3\times$ : total area scaling is $4.5\times$ .

Two transformations applied in sequence: each multiplies the area by its own determinant, so the composite multiplies them.

Transpose: $\det(A^T) = \det(A)$ #

Swapping rows for columns leaves the volume scaling unchanged. (Geometrically the parallelepipeds are different, but they have the same volume — a non-trivial fact that is one of the small miracles of the theory.)

Inverse: $\det(A^{-1}) = 1/\det(A)$ #

If $A$ multiplies volume by $k$ , then $A^{-1}$ must divide volume by $k$ . Algebraically: $\det(A)\det(A^{-1}) = \det(I) = 1$ .

Row swap changes sign#

Swapping two rows multiplies the determinant by $-1$ . Swapping basis vectors flips the handedness of the coordinate system, so the sign flips.

Row scaling scales the determinant#

Multiplying one row by $k$ multiplies the determinant by $k$ — you stretched one basis vector $k$ times, so the parallelogram is $k$ times as big.

Corollary. $\det(kA) = k^n \det(A)$ for an $n\times n$ matrix: $k$ acts on each of the $n$ rows.

Row addition leaves the determinant alone#

Adding a multiple of one row to another does not change the determinant.

This is a shear: the parallelogram changes shape, but its area does not. Picture a stack of cards; pushing the top sideways changes the silhouette but not the volume.

This single fact is why Gaussian elimination preserves determinants up to easy bookkeeping — it is the entire reason the elimination method works for computing $\det$ .

Special matrices#

Matrix typeDeterminant
Identity $I$$1$
Diagonalproduct of diagonal entries
Triangular (any kind)product of diagonal entries

The triangular case is the workhorse: any matrix can be reduced to triangular form by elimination, and once it is triangular the determinant is one multiplication.


Computing Determinants#

$2 \times 2$ : just the formula#

$$\det\begin{pmatrix}a & b\\ c & d\end{pmatrix} = ad - bc.$$

$3 \times 3$ : Sarrus’s rule#

$$\det\begin{pmatrix}1 & 2 & 3\\ 4 & 5 & 6\\ 7 & 8 & 9\end{pmatrix} = (1\cdot 5\cdot 9 + 2\cdot 6\cdot 7 + 3\cdot 4\cdot 8) - (3\cdot 5\cdot 7 + 2\cdot 4\cdot 9 + 1\cdot 6\cdot 8) = 0.$$

(The result is $0$ because each row is the previous one plus a constant — the rows are linearly dependent.)

Warning. Sarrus’s rule works only for $3 \times 3$ matrices. Do not try to extend the diagonal pattern to $4 \times 4$ — you will get a wrong answer.

General: cofactor (Laplace) expansion#

$$\det(A) = \sum_{j=1}^{n} (-1)^{i+j} a_{ij}\, M_{ij},$$

where $M_{ij}$ is the minor — the determinant of the $(n-1)\times(n-1)$ submatrix obtained by deleting row $i$ and column $j$ . The sign pattern $(-1)^{i+j}$ alternates like a checkerboard; for a $3\times 3$ the first row gets signs $+,-,+$ .

Cofactor expansion in pictures: pick a row, multiply each entry by the determinant of the submatrix you get by deleting its row and column, and alternate signs.

Practical tip. Expand along the row or column with the most zeros — those terms vanish and you do less work.

For real computation: Gaussian elimination#

Cofactor expansion has $O(n!)$ work, which is hopeless past $n = 10$ or so. In practice you reduce $A$ to upper triangular form by elementary row operations (which only multiply the determinant by predictable factors), then multiply the diagonal. That is $O(n^3)$ — this is what numpy.linalg.det actually does internally.

1
2
3
4
5
6
7
import numpy as np

A = np.array([[3, 1], [2, 4]])
print(f"det(A) = {np.linalg.det(A):.1f}")   # 10.0

B = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"det(B) = {np.linalg.det(B):.1f}")   # 0.0 (numerical noise)

Cramer’s Rule#

$$x_i = \frac{\det(A_i)}{\det(A)},$$

where $A_i$ is $A$ with its $i$ -th column replaced by $\vec{b}$ .

$$\begin{cases} 2x + y = 5 \\ 3x + 4y = 11 \end{cases}$$ $$\det(A) = 8 - 3 = 5, \quad \det(A_1) = 20 - 11 = 9, \quad \det(A_2) = 22 - 15 = 7,$$

so $x = 9/5,\; y = 7/5$ .

Caveat. Cramer’s rule is theoretically beautiful but practically slow ($O(n^4)$ at best vs. $O(n^3)$ for elimination). It is the right tool for proving things and for $2\times 2$ or $3\times 3$ symbolic problems, not for actually solving big systems.


Applications#

Area of a triangle#

$$\text{Area} = \tfrac{1}{2}\left|\det\begin{pmatrix} x_2 - x_1 & x_3 - x_1 \\ y_2 - y_1 & y_3 - y_1 \end{pmatrix}\right|.$$

You are taking half the area of the parallelogram spanned by two edges.

Cross product as a determinant#

$$\vec{a} \times \vec{b} = \det\begin{pmatrix} \vec{i} & \vec{j} & \vec{k} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{pmatrix}.$$

Its magnitude $\|\vec{a}\times\vec{b}\|$ is exactly the area of the parallelogram spanned by $\vec{a}$ and $\vec{b}$ — a $2 \times 2$ determinant in disguise.

The Jacobian determinant#

$$\iint f(x, y)\, dx\, dy = \iint f\bigl(x(u, v),\, y(u, v)\bigr) \left|\det \frac{\partial(x, y)}{\partial(u, v)}\right| du\, dv.$$

The Jacobian $\left|\det\frac{\partial(x,y)}{\partial(u,v)}\right|$ is the local area scaling factor — the determinant of the linear approximation to the change of variables at each point. Geometrically, you are using our 2D area-scaling theorem at every infinitesimal patch.

$$\left|\det \frac{\partial(x, y)}{\partial(r, \theta)}\right| = \det\begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & \phantom{-}r\cos\theta \end{pmatrix} = r.$$

That is the famous “$r$ ” in $dx\,dy = r\,dr\,d\theta$ . Calculus students often memorize it; now you can derive it.

Determinants and linear systems#

For $A\vec{x} = \vec{b}$ with $A$ square:

ConditionWhat happens
$\det(A) \neq 0$unique solution exists
$\det(A) = 0$ , system homogeneousnon-trivial solutions exist
$\det(A) = 0$ , $\vec{b} \neq \vec{0}$either no solution or infinitely many

Python: Visualizing the Determinant#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
import matplotlib.pyplot as plt

def show_determinant(A):
    """Show how A transforms the unit square, with the area change."""
    square = np.array([[0, 0], [1, 0], [1, 1], [0, 1], [0, 0]]).T
    transformed = A @ square
    det = np.linalg.det(A)

    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    axes[0].fill(square[0], square[1], alpha=0.3, color="#2563eb")
    axes[0].set_title("Unit square (area = 1)")
    axes[0].set_xlim(-3, 3); axes[0].set_ylim(-3, 3)
    axes[0].set_aspect("equal"); axes[0].grid(True, alpha=0.3)

    color = "#10b981" if det > 0 else ("#f59e0b" if det < 0 else "#94a3b8")
    axes[1].fill(transformed[0], transformed[1], alpha=0.3, color=color)
    axes[1].set_title(f"Transformed (area = {abs(det):.2f}, det = {det:.2f})")
    axes[1].set_xlim(-3, 3); axes[1].set_ylim(-3, 3)
    axes[1].set_aspect("equal"); axes[1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

show_determinant(np.array([[2, 0], [0, 1.5]]))   # stretch, det = 3
show_determinant(np.array([[1, 0.5], [0, 1]]))    # shear,   det = 1
show_determinant(np.array([[-1, 0], [0, 1]]))     # reflection, det = -1

Try a few more matrices on your own — in particular, try one with $\det = 0$ and watch the parallelogram collapse to a line.


Summary#

The mental model#

When you see a determinant, do not think “I need to compute a number.” Think:

“How does this transformation change the size and orientation of space?”

  • $|\det(A)|$ — how much area or volume is scaled
  • $\det > 0$ — orientation preserved
  • $\det < 0$ — orientation flipped (mirror image)
  • $\det = 0$ — space crushed flat, information lost, matrix not invertible

Key properties at a glance#

PropertyFormulaIntuition
Multiplicative$\det(AB) = \det(A)\det(B)$scalings multiply
Transpose$\det(A^T) = \det(A)$rows and columns equally valid
Inverse$\det(A^{-1}) = 1/\det(A)$undo the scaling
Scalar$\det(kA) = k^n \det(A)$$k$ scales each of $n$ directions

Why Nobody Computes the Determinant by Cofactor Expansion#

The cofactor formula is beautiful, recursive, and catastrophically slow. Let $T(n)$ be the number of multiplications to expand an $n\times n$ determinant by minors. The recursion $T(n) = n\cdot T(n-1)$ gives $T(n) = n!$ . For $n = 20$ that is $2.4 \times 10^{18}$ multiplications — decades on a single core. For $n = 50$ it is more multiplications than there are atoms on Earth.

$$\det A = (-1)^{\text{swaps}} \prod_{i=1}^n U_{ii}.$$

The cost of LU is $\tfrac{2}{3}n^3$ FLOPs, and the cost of multiplying the $n$ diagonal entries is linear. So determinant is $\Theta(n^3)$ , not $\Theta(n!)$ — a saving of, for $n=20$ , a factor of $3\times 10^{14}$ .

Two practical notes:

  • The same factorisation that gives you $\det A$ also gives you $A^{-1}$ and the solution to $Ax=b$ for any right-hand side. If you are computing both the determinant and the solution, do not call np.linalg.det and np.linalg.solve separately — factor once with scipy.linalg.lu_factor and reuse.
  • For symmetric positive-definite matrices, Cholesky is twice as fast: $A = LL^T$ in $\tfrac{1}{3}n^3$ FLOPs, and $\det A = (\prod L_{ii})^2$ .

So the cofactor formula stays in the textbook because it is the cleanest statement of what the determinant is. For computing it, we exploit the multiplicative property $\det(LU) = \det L \cdot \det U$ and the trivial fact that the determinant of a triangular matrix is the product of its diagonal.

The slogdet Trick: When the Determinant Itself Underflows#

Here is a problem you hit the first time you implement maximum-likelihood estimation for a Gaussian. The log-likelihood involves $\log \det \Sigma$ where $\Sigma$ is a covariance matrix. For a $200 \times 200$ covariance with eigenvalues around $0.01$ , the determinant is roughly $0.01^{200} = 10^{-400}$ — which is exactly $0$ in double precision (smallest positive double is $\approx 5\times 10^{-324}$ ). So np.log(np.linalg.det(Sigma)) returns -inf and your training crashes.

The fix is to never form the determinant. numpy provides np.linalg.slogdet, which returns the sign and the log magnitude separately:

1
2
3
4
5
import numpy as np
Sigma = 0.01 * np.eye(200)
print(np.linalg.det(Sigma))            # 0.0  -- underflow
sign, logabsdet = np.linalg.slogdet(Sigma)
print(sign, logabsdet)                 # 1.0  -921.034...
$$\log|\det A| = \sum_{i=1}^n \log|U_{ii}|.$$

The product of $n$ small numbers underflows; the sum of $n$ logs does not. The same trick reappears everywhere: log-probabilities in HMMs, the log-partition function in energy-based models, log-Jacobians in normalising flows. Whenever you see a product over many factors that each could be tiny or huge, work in log space.

A related habit: when comparing two determinants for sign or relative size, compare their logs. When you really need the value, compute $\det A = \mathrm{sign}\cdot e^{\log|\det A|}$ at the very end, ideally never.


What’s Next#

Chapter 5 : Linear Systems and Column Space. We bring together everything so far — matrices, transformations, and determinants — to understand when $A\vec{x} = \vec{b}$ has solutions, how many, and what their structure looks like. The key concepts are the column space (“what can $A$ reach?”), the null space (“what gets crushed?”), and the rank (“how many effective dimensions remain?”). Determinants will play a starring role in the square case; for non-square or rank-deficient $A$ we will need a more refined toolkit.

In this series

Linear Algebra 18 parts

  1. 01 Essence of Linear Algebra (1): The Essence of Vectors — More Than Just Arrows
  2. 02 Essence of Linear Algebra (2): Linear Combinations and Vector Spaces
  3. 03 Essence of Linear Algebra (3): Matrices as Linear Transformations
  4. 04 Essence of Linear Algebra (4): The Secrets of Determinants you are here
  5. 05 Essence of Linear Algebra (5): Linear Systems and Column Space
  6. 06 Essence of Linear Algebra (6): Eigenvalues and Eigenvectors
  7. 07 Essence of Linear Algebra (7): Orthogonality and Projections — When Vectors Mind Their Own Business
  8. 08 Essence of Linear Algebra (8): Symmetric Matrices and Quadratic Forms — The Best Matrices in Town
  9. 09 Essence of Linear Algebra (9): Singular Value Decomposition — The Crown Jewel of Linear Algebra
  10. 10 Essence of Linear Algebra (10): Matrix Norms and Condition Numbers — Is Your Linear System Healthy?
  11. 11 Essence of Linear Algebra (11): Matrix Calculus and Optimization — The Engine Behind Machine Learning
  12. 12 Essence of Linear Algebra (12): Sparse Matrices and Compressed Sensing — Less Is More
  13. 13 Essence of Linear Algebra (13): Tensors and Multilinear Algebra
  14. 14 Essence of Linear Algebra (14): Random Matrix Theory
  15. 15 Essence of Linear Algebra (15): Linear Algebra in Machine Learning
  16. 16 Essence of Linear Algebra (16): Linear Algebra in Deep Learning
  17. 17 Essence of Linear Algebra (17): Linear Algebra in Computer Vision
  18. 18 Essence of Linear Algebra (18): Frontiers and Summary

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub