Linear Algebra

Rank and Span

Every point you can reach, and the true information content of a map

01 · First principlesSpan: every point you can reach

Give yourself a set of vectors and two moves: scale any of them, and add results together. The span is everything reachable with those moves — every linear combination c₁v₁ + … + c_kv_k. One nonzero vector spans a line; two non-parallel vectors span a plane; and adding a vector that is already a combination of the others (see linear independence) extends the span by nothing at all.

Spans are always flats through the origin: lines, planes, hyperplanes. The question span answers is reachability. The question it cannot answer alone is how big the reachable set is — for that we count, and the count is rank.

02 · RankThe true information content of the map

View a matrix A as a machine: x goes in, Ax comes out. Since Ax is a combination of A's columns weighted by the entries of x, the set of all possible outputs — the image — is exactly the span of the columns. The rank is the dimension of that image:

rank(A) = dim(image of A) = number of independent columns = number of independent rows

column rankrow rank (always equal — not obvious, always true)

Rank is the honest size of the map, as opposed to its advertised size. A 1000 × 1000 matrix of rank 3 looks like a million numbers, but as a transformation it flattens all of ℝ¹⁰⁰⁰ onto a 3-dimensional flat: only three numbers' worth of output direction survive. Geometrically, rank is the dimension the map preserves; everything else gets crushed.

Rank counts what the matrix can actually say. An m × n matrix of rank r is, information-wise, an r-dimensional channel wearing an m × n costume. (The SVD makes this literal: r nonzero singular values.)

03 · ConservationRank–nullity: dimension is conserved

Where do the missing dimensions go? Nowhere mysterious — they are crushed to zero. For A with n input dimensions:

rank(A) + dim(null space of A) = n

dimensions that survivedimensions destroyed

Read it as a conservation law: every input dimension either makes it through to the image or dies in the null space. A 5-dimensional input mapped with rank 3 must be killing exactly a 2-dimensional family of inputs. This single identity settles most existence and uniqueness questions about Ax = b before any computation; the companion note works it from the null-space side.

04 · Reading rankWhat a given rank tells you

Situation (A is m × n)	Means	Consequence
rank = n (full column rank)	No input information lost	Ax = b has at most one solution; least squares is well-posed
rank = m (full row rank)	Every output reachable	Ax = b solvable for every b
rank = m = n (square, full)	A bijection of space	Invertible, det ≠ 0
rank < min(m, n)	Map collapses dimensions	Singular behaviour: lost information, no clean undo

Numerically, rank is read off the singular values: count how many exceed a tolerance. "Rank 50 but forty of the singular values are tiny" behaves like rank 10 plus noise — this effective rank is what matters in practice.

05 · Why ML caresLow rank is the structure ML lives on

Real-world matrices are rarely full rank in spirit: their singular values decay fast, so a low-rank approximation keeps almost everything while paying almost nothing. ML exploits this constantly.

LoRA. Fine-tuning updates a weight matrix W by ΔW = BA with B ∈ ℝ^d×r, A ∈ ℝ^r×d, r ≪ d. The bet: the change a task demands has low rank — a few new directions of behaviour, not d² new numbers. Storage and compute drop from d² to 2dr.
Embeddings and matrix factorisation. A user × item ratings matrix factored as UVᵀ with k columns is precisely a rank-k model: tastes live in a k-dimensional span.
Compressed attention. Linformer-style methods project keys and values to a low dimension, betting the attention matrix is approximately low rank; multi-query and grouped-query attention shrink the KV cache on the same instinct.
PCA. The best rank-k approximation of centred data (Eckart–Young: truncate the SVD) is PCA — choose the k-dimensional span that loses the least variance. The directions themselves come from eigenvectors of the covariance.

The recurring bet: the signal spans far fewer directions than the ambient space offers. Rank is how you state that bet precisely, and the SVD is how you cash it.

Mental Model

Span = every point reachable by scaling and adding; always a flat through the origin.
Rank = dimension of the image = the honest information content of the map, whatever its shape claims.
rank + nullity = n: every input dimension either survives into the image or is crushed to zero.
Effective rank (singular values above noise) is what numerics and ML actually respond to.
LoRA, embeddings, PCA, compressed attention: the same bet that signal is low-rank.