Zero overlap, no shared information
The dot product measures agreement, and orthogonality is the special case of none: a·b = 0 means the shadow of b along a has zero length. Neither vector carries any information about the other; whatever you learn by measuring along a tells you exactly nothing about the component along b.
That sounds like a negative property — an absence of relationship. The surprise of this note is that the absence is the most valuable structure in computational linear algebra. When directions do not interfere, every question about a vector decomposes into independent one-dimensional questions, and problems that otherwise require solving systems collapse into reading off dot products.
To express x in an arbitrary basis {b₁, …, bₙ} you must solve a linear system: the coefficients are entangled, because each basis vector leaks into the others' directions. Watch what happens with an orthonormal basis {q₁, …, qₙ} (mutually orthogonal, unit length). Write x = c₁q₁ + ⋯ + cₙqₙ and take the dot product of both sides with qᵢ:
So cᵢ = x·qᵢ, full stop. Each coordinate is one dot product, computed independently of all the others; the n-dimensional problem fell apart into n one-dimensional projections. This identity — x = Σ (x·qᵢ) qᵢ — is the engine inside Fourier series, PCA coordinates, and every "project onto components" argument you have ever seen.
In an orthonormal frame, x is rebuilt from its two shadows independently. With a skewed basis, the shadows would double-count and a system would have to untangle them.
Pack an orthonormal basis into the columns of Q. The orthonormality conditions qᵢ·qⱼ = δᵢⱼ are precisely the statement QᵀQ = I, which hands us the inverse for free:
Geometrically, Q is a rigid motion — a rotation or reflection. It preserves every length and every angle, because it preserves the dot product itself: (Qx)·(Qy) = xᵀQᵀQy = x·y. Space is moved, never distorted: the unit circle stays a unit circle. Contrast this with a general invertible matrix, which shears and stretches, and whose inverse must be earned by elimination. Numerically, multiplying by Q is perfectly conditioned (κ(Q) = 1): it amplifies neither the signal nor the rounding error, which is why stable algorithms are built almost entirely out of orthogonal transformations.
When Ax = b has no solution (b outside the image), the best we can do is the x making Ax closest to b. "Closest" means the error b − Ax is orthogonal to the image — the perpendicular from b to the reachable subspace. That orthogonality condition is the normal equations: Aᵀ(b − Ax) = 0.
In practice one does not solve the normal equations directly (forming AᵀA squares the condition number). Instead, factor A = QR with Q orthonormal and R triangular — Gram–Schmidt, made industrial. Then projection onto the image is QQᵀb, and the triangular system Rx = Qᵀb finishes the job stably. Orthogonalise first, and the hard geometry becomes bookkeeping.