Dependency Length Minimization #
@cite{futrell-gibson-2020} @cite{gibson-2000} @cite{zaslavsky-hu-levy-2020}
Formalizes the core claim of @cite{futrell-gibson-2020}: natural languages minimize total dependency length beyond what independent constraints predict. Across 53+ languages, observed word orders have shorter total dependency lengths than random baselines controlling for projectivity and head direction.
Definitions #
depLength: linear distance |headIdx - depIdx| for a single dependencytotalDepLength: sum of all dependency lengths in a treemeanDepLengthScaled: total × 100 / n (avoids Float, stays decidable)subtreeSize: count of descendants via transitive closure
Behaghel's Laws #
- Oberstes Gesetz (1932): "what belongs together is placed close together" — every dependency length ≤ threshold
- Gesetz der wachsenden Glieder (1909): dependents on the same side of their head are ordered by subtree size (shorter closer to head)
Key Theorem #
short_before_long_minimizes: placing the smaller subtree closer to the head
minimizes total dependency length. This derives short-before-long ordering from
DLM rather than stipulating it.
Bridges #
- →
NonProjective.lean: non-projective orderings have higher dep length - →
Core/ProcessingModel.lean: dep length maps to locality dimension - →
Formal/HeadCriteria.lean: consistent head direction reduces dep length
Linear distance between head and dependent: |headIdx - depIdx|.
This is the fundamental unit of @cite{futrell-gibson-2020}'s dependency length minimization. Corresponds to the number of intervening words + 1.
Instances For
Total dependency length of a tree: sum of all individual dependency lengths.
The key quantity minimized in DLM (@cite{futrell-gibson-2020}, eq. 1).
Equations
- DepGrammar.DependencyLength.totalDepLength t = List.foldl (fun (acc : ℕ) (d : DepGrammar.Dependency) => acc + DepGrammar.DependencyLength.depLength d) 0 t.deps
Instances For
Mean dependency length × 100 (scaled to avoid Float, stay in Nat).
For a tree with n words, this is (totalDepLength × 100) / n. Used for crosslinguistic comparison.
Equations
Instances For
Count descendants of a node via projection from Core/Basic.lean.
subtreeSize t idx counts all words transitively dominated by word at idx
(excluding the node itself). Used by Gesetz der wachsenden Glieder.
Equations
- DepGrammar.DependencyLength.subtreeSize t idx = (DepGrammar.projection t.deps idx).length - 1
Instances For
A dependency is head-final if the dependent precedes the head.
Connects to Dir.left in Core/Basic.lean: head-final ↔ dependent is to the
left of its head.
Instances For
A dependency is head-initial if the dependent follows the head.
Instances For
Count of head-final dependencies in a tree.
Equations
Instances For
Count of head-initial dependencies in a tree.
Equations
Instances For
Head direction profile for crosslinguistic comparison.
propHeadFinal is × 1000 (permille) to avoid Float.
meanDepLengths are × 100 for different sentence lengths.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
- DepGrammar.DependencyLength.instBEqHeadDirectionProfile.beq x✝¹ x✝ = false
Instances For
Behaghel's Oberstes Gesetz ("supreme law", 1932): "What belongs together mentally is placed close together syntactically."
Formalized: every dependency in the tree has length ≤ threshold.
Equations
- DepGrammar.DependencyLength.oberstesGesetz t threshold = t.deps.all fun (d : DepGrammar.Dependency) => decide (DepGrammar.DependencyLength.depLength d ≤ threshold)
Instances For
@cite{behaghel-1909}'s Gesetz der wachsenden Glieder: "Dependents on the same side of their head are ordered by increasing subtree size (shorter constituents closer to the head)."
For a given head, checks that among dependents on each side, subtree sizes increase with distance from head.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Core DLM theorem: placing a smaller subtree closer to the head produces shorter total dependency length than placing the larger subtree closer.
For two dependents on the same side of their head at distances d₁ < d₂, with subtree sizes s₁ and s₂ where s₁ ≤ s₂:
- Optimal: place s₁ at d₁, s₂ at d₂ — cost = d₁ + d₂ + d₁·s₁ + d₂·s₂ (approx)
- Suboptimal: swap them — cost increases by (d₂ - d₁)(s₂ - s₁)
Simplified to the essential arithmetic: for two subtrees on the same side, the contribution to total dep length from their internal nodes is minimized when the smaller is closer. Here we prove the base case: placing a subtree of size s₁ at distance 1 and s₂ at distance (s₁ + 2) costs less than the reverse when s₁ ≤ s₂.
"John threw out the old trash" — SVO with particle adjacent to verb. Words: John(0) threw(1) out(2) the(3) old(4) trash(5) Dependencies:
- nsubj: threw(1) ← John(0) length = 1
- compound:prt: threw(1) → out(2) length = 1
- det: trash(5) ← the(3) length = 2
- amod: trash(5) ← old(4) length = 1
- obj: threw(1) → trash(5) length = 4 -- but we use simplified dep set Actually from paper: total dep length = 6
Equations
- One or more equations did not get rendered due to their size.
Instances For
"John threw the old trash out" — particle shifted to end. Words: John(0) threw(1) the(2) old(3) trash(4) out(5) Dependencies:
- nsubj: threw(1) ← John(0) length = 1
- det: trash(4) ← the(2) length = 2
- amod: trash(4) ← old(3) length = 1
- obj: threw(1) → trash(4) length = 3
- compound:prt: threw(1) → out(5) length = 4
Equations
- One or more equations did not get rendered due to their size.
Instances For
Heavy NP shift: "John threw out the trash sitting in the kitchen" Words: John(0) threw(1) out(2) the(3) trash(4) sitting(5) in(6) the2(7) kitchen(8) Dependencies:
- nsubj: threw(1) ← John(0) length = 1
- compound:prt: threw(1) → out(2) length = 1
- det: trash(4) ← the(3) length = 1
- obj: threw(1) → trash(4) length = 3
- acl: trash(4) → sitting(5) length = 1
- case: kitchen(8) ← in(6) length = 2
- det: kitchen(8) ← the2(7) length = 1
- obl: sitting(5) → kitchen(8) length = 3
Equations
- One or more equations did not get rendered due to their size.
Instances For
Heavy NP shift suboptimal: "John threw the trash sitting in the kitchen out" Words: John(0) threw(1) the(2) trash(3) sitting(4) in(5) the2(6) kitchen(7) out(8)
Equations
- One or more equations did not get rendered due to their size.
Instances For
The non-projective example tree from NonProjective.lean. Non-projective orderings tend to have higher dependency lengths because crossing arcs span larger distances.
Equations
Instances For
A projective reordering of the same dependency structure. Same 4 words, same dependency types, but no crossing arcs.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Map a dependency's length to the locality dimension of ProcessingProfile.
This connects DLM directly to @cite{gibson-2000} DLT: dependency length IS the locality cost. The other dimensions (boundaries, referentialLoad, ease) are set to 0 since DLM abstracts away from them.
Equations
- DepGrammar.DependencyLength.depToProcessingLocality d = { locality := DepGrammar.DependencyLength.depLength d, boundaries := 0, referentialLoad := 0, ease := 0 }
Instances For
Dep length directly equals the locality dimension of the processing profile.
For a head with a single dependent, head-initial and head-final have equal dependency length — direction doesn't matter.
For a head with two dependents on the same side, consistent direction (both left or both right) allows short-before-long optimization.
Example: head at position 3, two dependents. Consistent (both left): deps at 1, 2 → lengths 2, 1 → total 3 Alternating: deps at 2, 4 → lengths 1, 1 → total 2
But we show the key point: when subtrees differ in size, consistent direction enables the short-before-long optimization.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Adjacent dependency (head and dependent next to each other) has length 1.
Self-loop has length 0 (ill-formed but the definition handles it).
Total dep length of an empty tree is 0.