Linear Algebra

Rank and Span

Every point you can reach, and the true information content of a map

01 · First principlesSpan: every point you can reach

Give yourself a set of vectors and two moves: scale any of them, and add results together. The span is everything reachable with those moves — every linear combination c₁v₁ + … + ckvk. One nonzero vector spans a line; two non-parallel vectors span a plane; and adding a vector that is already a combination of the others (see linear independence) extends the span by nothing at all.

Spans are always flats through the origin: lines, planes, hyperplanes. The question span answers is reachability. The question it cannot answer alone is how big the reachable set is — for that we count, and the count is rank.

02 · RankThe true information content of the map

View a matrix A as a machine: x goes in, Ax comes out. Since Ax is a combination of A's columns weighted by the entries of x, the set of all possible outputs — the image — is exactly the span of the columns. The rank is the dimension of that image:

rank(A)  =  dim(image of A)  =  number of independent columns  =  number of independent rows
column rankrow rank (always equal — not obvious, always true)

Rank is the honest size of the map, as opposed to its advertised size. A 1000 × 1000 matrix of rank 3 looks like a million numbers, but as a transformation it flattens all of ℝ¹⁰⁰⁰ onto a 3-dimensional flat: only three numbers' worth of output direction survive. Geometrically, rank is the dimension the map preserves; everything else gets crushed.

Rank counts what the matrix can actually say. An m × n matrix of rank r is, information-wise, an r-dimensional channel wearing an m × n costume. (The SVD makes this literal: r nonzero singular values.)

03 · ConservationRank–nullity: dimension is conserved

Where do the missing dimensions go? Nowhere mysterious — they are crushed to zero. For A with n input dimensions:

rank(A) + dim(null space of A)  =  n
dimensions that survivedimensions destroyed

Read it as a conservation law: every input dimension either makes it through to the image or dies in the null space. A 5-dimensional input mapped with rank 3 must be killing exactly a 2-dimensional family of inputs. This single identity settles most existence and uniqueness questions about Ax = b before any computation; the companion note works it from the null-space side.

04 · Reading rankWhat a given rank tells you

Situation (A is m × n)MeansConsequence
rank = n (full column rank)No input information lostAx = b has at most one solution; least squares is well-posed
rank = m (full row rank)Every output reachableAx = b solvable for every b
rank = m = n (square, full)A bijection of spaceInvertible, det ≠ 0
rank < min(m, n)Map collapses dimensionsSingular behaviour: lost information, no clean undo

Numerically, rank is read off the singular values: count how many exceed a tolerance. "Rank 50 but forty of the singular values are tiny" behaves like rank 10 plus noise — this effective rank is what matters in practice.

05 · Why ML caresLow rank is the structure ML lives on

Real-world matrices are rarely full rank in spirit: their singular values decay fast, so a low-rank approximation keeps almost everything while paying almost nothing. ML exploits this constantly.

  1. LoRA. Fine-tuning updates a weight matrix W by ΔW = BA with B ∈ ℝd×r, A ∈ ℝr×d, r ≪ d. The bet: the change a task demands has low rank — a few new directions of behaviour, not d² new numbers. Storage and compute drop from d² to 2dr.
  2. Embeddings and matrix factorisation. A user × item ratings matrix factored as UVᵀ with k columns is precisely a rank-k model: tastes live in a k-dimensional span.
  3. Compressed attention. Linformer-style methods project keys and values to a low dimension, betting the attention matrix is approximately low rank; multi-query and grouped-query attention shrink the KV cache on the same instinct.
  4. PCA. The best rank-k approximation of centred data (Eckart–Young: truncate the SVD) is PCA — choose the k-dimensional span that loses the least variance. The directions themselves come from eigenvectors of the covariance.
The recurring bet: the signal spans far fewer directions than the ambient space offers. Rank is how you state that bet precisely, and the SVD is how you cash it.
Mental Model