Catenae: A Novel Unit of Syntactic Analysis #
@cite{osborne-gross-2012}
Formalizes the catena (Osborne, Putnam & Groß 2012, Syntax 15:4, 354–396).
A catena (Latin: "chain") is a connected subgraph of a dependency tree — any word or combination of words that is continuous with respect to the dominance relation. Catenae strictly generalize constituents: every constituent is a catena, but not every catena is a constituent.
Mathlib Integration #
The dependency tree is converted to a mathlib SimpleGraph (Fin n) via
depsToSimpleGraph, bridging linglib's DepTree/Dependency types to
mathlib's graph theory infrastructure. The Prop-level IsCatena is defined
using SimpleGraph.Preconnected on induced subgraphs. Computable Bool
functions (isCatena, isConstituent) enable native_decide proofs.
Key Results #
constituent_is_catena: every constituent is a catena (p. 360)- For n words: n constituents ≤ catenae ≤ 2^n - 1 total combinations
- Flatter trees have more catenae than chain-shaped trees (p. 371)
- Catena ratio varies systematically with tree shape
Bridges #
- →
Core/Basic.lean: usesDepTree,DepGraph,Dependencytypes - → mathlib
SimpleGraph:depsToSimpleGraphconverts dependency edges - →
DependencyLength.lean:catenaTotalDepLengthmeasures catena spread
The undirected simple graph underlying dependency edges over n nodes.
Forgets edge direction and labels: i ~ j iff some dependency connects them.
Uses mathlib's SimpleGraph (Fin n) — the fundamental bridge from
linglib's DepTree/Dependency types to mathlib's graph theory.
Equations
Instances For
Convert a DepTree to a mathlib SimpleGraph on its node set.
Equations
Instances For
A catena is a non-empty subset S of tree nodes where the induced subgraph on S is preconnected. Equivalently: a word or combination of words that is continuous with respect to dominance.
Uses mathlib's SimpleGraph.induce and SimpleGraph.Preconnected.
Equations
- DepGrammar.Catena.IsCatena G S = (S.Nonempty ∧ (SimpleGraph.induce (↑S) G).Preconnected)
Instances For
Check if a node set is connected within the dependency graph. Uses BFS from the first node and checks all others are reached.
Equations
- DepGrammar.Catena.isConnected deps [] = true
- DepGrammar.Catena.isConnected deps (start :: tail) = (start :: tail).all (DepGrammar.Catena.bfsReachable✝ deps (start :: tail) start).contains
Instances For
Computable catena check: non-empty and connected in the tree.
Equations
- DepGrammar.Catena.isCatena deps nodes = (!nodes.isEmpty && DepGrammar.Catena.isConnected deps nodes)
Instances For
Bidirectional reachability within a restricted node set.
BidirReachable deps allowed u v holds when there is a path from u to v
using dependency edges (in either direction) where all nodes are in allowed.
- here {deps : List Dependency} {allowed : List ℕ} (v : ℕ) : v ∈ allowed → BidirReachable deps allowed v v
- step {deps : List Dependency} {allowed : List ℕ} (u v w : ℕ) : u ∈ allowed → v ∈ allowed → (∃ d ∈ deps, d.headIdx = u ∧ d.depIdx = v ∨ d.depIdx = u ∧ d.headIdx = v) → BidirReachable deps allowed v w → BidirReachable deps allowed u w
Instances For
Append a step to the end of a bidirectional path.
Bidirectional reachability is symmetric (reverse the path, flip edges).
Bidirectional reachability is transitive.
BFS completeness: every bidirectionally reachable node appears in the
bfsReachable output.
Proved by showing the output contains start and is closed under
edges within allowed, then applying induction on BidirReachable.
Any singleton is a catena: non-empty and trivially connected.
Convenience: check catena on a DepTree directly.
Equations
- DepGrammar.Catena.DepTree.isCatena' t nodes = DepGrammar.Catena.isCatena t.deps nodes
Instances For
Check if a node set equals the complete subtree (projection) rooted at
some node. Uses projection from Core/Basic.lean.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All non-empty subsets of {0,..., n-1}.
Equations
- DepGrammar.Catena.allNonEmptySubsets n = List.filter (fun (x : List ℕ) => !x.isEmpty) (DepGrammar.Catena.allNonEmptySubsets.powerset (List.range n))
Instances For
Count catenae for a tree with n nodes and given dependency edges.
Equations
Instances For
Count constituents for a tree with n nodes.
Equations
Instances For
Total non-empty subsets of n elements: 2^n - 1.
Equations
- DepGrammar.Catena.totalCombinations n = 2 ^ n - 1
Instances For
Catena ratio as (catenae, non-catenae). Flatter trees → higher ratio.
Equations
Instances For
Tree (9), p. 359: 4 abstract nodes.
a(0)
/
b(1) c(2)
|
d(3)
10 catenae, 5 non-catenae, 4 constituents out of 15 total. Catenae: {a},{b},{c},{d},{a,b},{a,c},{b,d},{a,b,c},{a,b,d},{a,b,c,d} Constituents: {d},{c},{b,d},{a,b,c,d}
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tree (22), p. 371: 3-node flat tree.
a(0)
/
b(1) c(2)
6 catenae, 1 non-catena, 3 constituents out of 7 total.
Equations
- DepGrammar.Catena.tree22 = [{ headIdx := 0, depIdx := 1, depType := UD.DepRel.dep }, { headIdx := 0, depIdx := 2, depType := UD.DepRel.dep }]
Instances For
4-node chain: a(0) → b(1) → c(2) → d(3). 10 catenae (only contiguous intervals are connected).
Equations
- One or more equations did not get rendered due to their size.
Instances For
4-node star: a(0) → {b(1), c(2), d(3)}. 11 catenae (every root-containing subset is connected).
Equations
- One or more equations did not get rendered due to their size.
Instances For
3-node chain: a(0) → b(1) → c(2).
Equations
- DepGrammar.Catena.chain3 = [{ headIdx := 0, depIdx := 1, depType := UD.DepRel.dep }, { headIdx := 1, depIdx := 2, depType := UD.DepRel.dep }]
Instances For
"pulled some strings" — the idiom {pulled, strings} forms a catena but not a constituent.
Words: pulled(0) some(1) strings(2) UD: pulled → strings (obj), strings → some (det).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Flatter trees have strictly more catenae than chain-shaped trees. (@cite{osborne-gross-2012}, p. 371: the catena ratio increases with flatness)
Every constituent is a catena — verified exhaustively for tree (9). (@cite{osborne-gross-2012}, p. 360: "every 'constituent' is also a catena")
n constituents ≤ catenae count ≤ 2^n - 1 total combinations.
The idiom "pulled strings" is a catena (connected via obj edge)...
...but NOT a constituent (subtree of "pulled" includes "some").
The full phrase "pulled some strings" IS both a constituent and a catena.
Every singleton is a catena in any SimpleGraph (mathlib Prop-level). Proof: the induced subgraph on {v} has a single vertex, so it's trivially preconnected.
The computable isCatena agrees with the Prop-level IsCatena.
Forward (isCatena = true → IsCatena): BFS from the start node
reaches all nodes in the list. BFS soundness gives BidirReachable
from start to every node; symmetry + transitivity gives connectivity
between any pair; the bridge converts to SimpleGraph.Reachable.
Backward (IsCatena → isCatena = true): Preconnected gives
Reachable start v for every v in the set. The bridge converts each
Reachable path to BidirReachable, and BFS completeness ensures
every such node appears in the output.
Total dependency length restricted to edges within a catena. Measures the linear spread of the catena.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The idiom catena {pulled, strings} has dep length 2.
The full constituent {pulled, some, strings} has dep length 3.