One number answering: how much do two directions agree
Take two vectors a and b. Without the dot product, we cannot answer the most basic geometric questions: do these two directions point the same way? At all? How long is a? What angle sits between them? Every notion of length, angle, and similarity in linear algebra is built from this single operation.
The dot product compresses two vectors into one scalar that measures agreement: large and positive when they pull together, zero when they share nothing, negative when they oppose.
There are two definitions, and the entire usefulness of the dot product is that they coincide.
Why are they the same thing? The coordinate sum is bilinear: it distributes over addition in each slot. Apply it to ‖a − b‖² = (a − b)·(a − b) and expand: ‖a‖² + ‖b‖² − 2 a·b. The law of cosines says the same squared distance equals ‖a‖² + ‖b‖² − 2‖a‖‖b‖cos θ. Equate the two and a·b = ‖a‖‖b‖cos θ falls out. The coordinate formula is the geometry, just paid for in arithmetic.
Two immediate corollaries: ‖a‖ = √(a·a), so length is a dot product with yourself; and cos θ = a·b / (‖a‖‖b‖), which is exactly cosine similarity.
Drop a perpendicular from the tip of b onto the line through a. The shadow that lands there is the projection of b onto a, and the dot product is its (signed) length times ‖a‖.
The dot product only sees the shadow. Everything in b perpendicular to a is invisible to a·b.
This is the definition worth carrying around: a·b asks "if I only cared about the direction of a, how much of b would I keep?" The perpendicular remainder b − proja(b) is the part a cannot see — the seed of orthogonality.
| Value of a·b | Geometry | Read as |
|---|---|---|
| Large positive | θ near 0°, shadows long | Strong agreement, similar directions |
| Zero | θ = 90°, shadow vanishes | No shared information — orthogonal |
| Negative | θ past 90° | Active disagreement, opposing directions |
One caution: raw magnitude conflates angle with length. Two long vectors at 80° can out-score two short vectors at 5°. When only direction matters, normalise first (cosine similarity); when magnitude is meaningful (confidence, intensity), keep it.
Strip away the framework code and almost every primitive in ML is a dot product asking "how much do these agree?"