Linear Algebra

Determinant

One number: how much the map scales volume, and whether it flips space

01 · First principlesForget the formula; ask what one number could summarise a map

The cofactor-expansion formula is where intuition goes to die, so start elsewhere. A matrix A moves every region of space somewhere. Here is a remarkable fact: every region, whatever its shape, has its volume multiplied by the same constant factor. A map that doubles the unit square's area doubles every area. That universal factor is the determinant.

vol( A(S) ) = |det A| · vol(S)   for every region S
one scalar summarises the map's effect on all volumes

Why is the factor universal? Because linearity has no favourites: any region can be tiled by tiny cubes, A sends every tiny cube to the same-shaped tiny parallelepiped, so every tile is scaled identically, and the sum inherits the factor. The unit square is just the convenient test particle.

02 · The pictureThe unit square becomes the column parallelogram

Apply a 2×2 matrix to the unit square. The corners (1,0) and (0,1) land on the columns of A, so the square becomes the parallelogram spanned by the columns — and det A is its signed area.

UNIT SQUARE · AREA 1 e₁ e₂ A a₁ (col 1) a₂ (col 2) COLUMN PARALLELOGRAM · AREA = |det A|

e₁ and e₂ land on the columns of A; the square they spanned becomes the parallelogram the columns span. Its signed area is the determinant.

The picture immediately explains det = 0: if the columns are linearly dependent, the parallelogram is a flat segment, area zero — the map collapses a dimension, and everything in singular matrices follows. It also explains the determinant in the denominator of the 2×2 inverse: to undo the map you divide volumes back out, and you cannot divide by zero.

03 · The algebraSign, products, and eigenvalues

The sign is orientation. det < 0 means the map flips space's handedness — a reflection is involved; the parallelogram came out "face down". |det| carries the volume, the sign carries the flip.

Composition multiplies. Apply B then A and volumes scale by both factors in turn, so the algebra is forced by the geometry:

det(AB) = det(A) · det(B),    det(A⁻¹) = 1 / det(A)

Eigenvalues factor it. Along each eigendirection the map is a pure stretch by λᵢ, and a box aligned with those directions has its volume scaled by all the stretches at once:

det A = λ₁ λ₂ ⋯ λₙ
total volume change = product of per-direction stretches

One zero eigenvalue zeroes the product — the algebraic echo of one crushed direction killing the whole volume. A triangular matrix shows the same logic nakedly: its eigenvalues sit on the diagonal, so its determinant is the product of diagonal entries. Hold this fact; it is about to pay for itself.

04 · Why ML caresNormalizing flows live on log|det J|

The named connection: change of variables. Push a density through an invertible map f and probability mass is conserved, but the volume it occupies changes — by the local volume factor of f, which is the determinant of its Jacobian. Densities must compensate:

log px(x) = log pz(f(x)) + log |det Jf(x)|
where the map compresses volume, density piles up

A normalizing flow is a neural network trained directly on this identity: an invertible f pulling data back to a Gaussian, with the log-determinant term as the price of honesty in the likelihood. (Diffusion's probability-flow ODE computes the same correction continuously, as a running trace.)

The catch is cost. A general n×n determinant is O(n³) — and worse, needed per data point per training step, with gradients. So flow architectures are an exercise in determinant-dodging: design layers whose Jacobian is triangular by construction (coupling layers, autoregressive flows), so log|det J| is just Σ log of the diagonal — O(n), read off rather than computed. The entire architecture of RealNVP-style models is the shape of this one accounting trick.

Elsewhere in ML: log det Σ in the Gaussian log-likelihood (computed via Cholesky, see PSD matrices), volume terms in variational bounds, and determinantal point processes for diverse sampling. Wherever volume meets probability, a log-determinant is the exchange rate.
Mental Model