Documentation

Linglib.Theories.Syntax.DependencyGrammar.Formal.Catena

Catenae: A Novel Unit of Syntactic Analysis #

@cite{osborne-gross-2012}

Formalizes the catena (Osborne, Putnam & Groß 2012, Syntax 15:4, 354–396).

A catena (Latin: "chain") is a connected subgraph of a dependency tree — any word or combination of words that is continuous with respect to the dominance relation. Catenae strictly generalize constituents: every constituent is a catena, but not every catena is a constituent.

Mathlib Integration #

The dependency tree is converted to a mathlib SimpleGraph (Fin n) via depsToSimpleGraph, bridging linglib's DepTree/Dependency types to mathlib's graph theory infrastructure. The Prop-level IsCatena is defined using SimpleGraph.Preconnected on induced subgraphs. Computable Bool functions (isCatena, isConstituent) enable native_decide proofs.

Key Results #

constituent_is_catena: every constituent is a catena (p. 360)
For n words: n constituents ≤ catenae ≤ 2^n - 1 total combinations
Flatter trees have more catenae than chain-shaped trees (p. 371)
Catena ratio varies systematically with tree shape

Bridges #

→ Core/Basic.lean: uses DepTree, DepGraph, Dependency types
→ mathlib SimpleGraph: depsToSimpleGraph converts dependency edges
→ DependencyLength.lean: catenaTotalDepLength measures catena spread

def DepGrammar.Catena.depsToSimpleGraph (n : ℕ) (deps : List Dependency) :

SimpleGraph (Fin n)

The undirected simple graph underlying dependency edges over n nodes. Forgets edge direction and labels: i ~ j iff some dependency connects them. Uses mathlib's SimpleGraph (Fin n) — the fundamental bridge from linglib's DepTree/Dependency types to mathlib's graph theory.

Equations

DepGrammar.Catena.depsToSimpleGraph n deps = { Adj := fun (i j : Fin n) => i ≠ j ∧ ∃ d ∈ deps, d.headIdx = ↑i ∧ d.depIdx = ↑j ∨ d.headIdx = ↑j ∧ d.depIdx = ↑i, symm := ⋯, loopless := ⋯ }

Instances For

def DepGrammar.Catena.DepTree.asSimpleGraph (t : DepTree) :

SimpleGraph (Fin t.words.length)

Convert a DepTree to a mathlib SimpleGraph on its node set.

Equations

DepGrammar.Catena.DepTree.asSimpleGraph t = DepGrammar.Catena.depsToSimpleGraph t.words.length t.deps

Instances For

def DepGrammar.Catena.IsCatena {n : ℕ} (G : SimpleGraph (Fin n)) (S : Finset (Fin n)) :

A catena is a non-empty subset S of tree nodes where the induced subgraph on S is preconnected. Equivalently: a word or combination of words that is continuous with respect to dominance.

Uses mathlib's SimpleGraph.induce and SimpleGraph.Preconnected.

Equations

DepGrammar.Catena.IsCatena G S = (S.Nonempty ∧ (SimpleGraph.induce (↑S) G).Preconnected)

Instances For

def DepGrammar.Catena.isConnected (deps : List Dependency) (nodes : List ℕ) :

Check if a node set is connected within the dependency graph. Uses BFS from the first node and checks all others are reached.

Equations

DepGrammar.Catena.isConnected deps [] = true
DepGrammar.Catena.isConnected deps (start :: tail) = (start :: tail).all (DepGrammar.Catena.bfsReachable✝ deps (start :: tail) start).contains

Instances For

def DepGrammar.Catena.isCatena (deps : List Dependency) (nodes : List ℕ) :

Computable catena check: non-empty and connected in the tree.

Equations

DepGrammar.Catena.isCatena deps nodes = (!nodes.isEmpty && DepGrammar.Catena.isConnected deps nodes)

Instances For

inductive DepGrammar.Catena.BidirReachable (deps : List Dependency) (allowed : List ℕ) :

ℕ → ℕ → Prop

Bidirectional reachability within a restricted node set. BidirReachable deps allowed u v holds when there is a path from u to v using dependency edges (in either direction) where all nodes are in allowed.

here {deps : List Dependency} {allowed : List ℕ} (v : ℕ) : v ∈ allowed → BidirReachable deps allowed v v
step {deps : List Dependency} {allowed : List ℕ} (u v w : ℕ) : u ∈ allowed → v ∈ allowed → (∃ d ∈ deps, d.headIdx = u ∧ d.depIdx = v ∨ d.depIdx = u ∧ d.headIdx = v) → BidirReachable deps allowed v w → BidirReachable deps allowed u w

Instances For

theorem DepGrammar.Catena.bidir_step_append {deps : List Dependency} {allowed : List ℕ} {u v w : ℕ} (h : BidirReachable deps allowed u v) (hv : v ∈ allowed) (hw : w ∈ allowed) (hedge : ∃ d ∈ deps, d.headIdx = v ∧ d.depIdx = w ∨ d.depIdx = v ∧ d.headIdx = w) :

BidirReachable deps allowed u w

Append a step to the end of a bidirectional path.

theorem DepGrammar.Catena.bidir_symm {deps : List Dependency} {allowed : List ℕ} {u v : ℕ} (h : BidirReachable deps allowed u v) :

BidirReachable deps allowed v u

Bidirectional reachability is symmetric (reverse the path, flip edges).

theorem DepGrammar.Catena.bidir_trans {deps : List Dependency} {allowed : List ℕ} {u v w : ℕ} (h1 : BidirReachable deps allowed u v) (h2 : BidirReachable deps allowed v w) :

BidirReachable deps allowed u w

Bidirectional reachability is transitive.

theorem DepGrammar.Catena.bfsReachable_complete (deps : List Dependency) (allowed : List ℕ) (start target : ℕ) (h : BidirReachable deps allowed start target) :

target ∈ DepGrammar.Catena.bfsReachable✝ deps allowed start

BFS completeness: every bidirectionally reachable node appears in the bfsReachable output.

Proved by showing the output contains start and is closed under edges within allowed, then applying induction on BidirReachable.

theorem DepGrammar.Catena.singleton_isCatena (deps : List Dependency) (v : ℕ) :

isCatena deps [v] = true

Any singleton is a catena: non-empty and trivially connected.

def DepGrammar.Catena.DepTree.isCatena' (t : DepTree) (nodes : List ℕ) :

Convenience: check catena on a DepTree directly.

Equations

DepGrammar.Catena.DepTree.isCatena' t nodes = DepGrammar.Catena.isCatena t.deps nodes

Instances For

def DepGrammar.Catena.isConstituent (deps : List Dependency) (n : ℕ) (nodes : List ℕ) :

Check if a node set equals the complete subtree (projection) rooted at some node. Uses projection from Core/Basic.lean.

Equations

One or more equations did not get rendered due to their size.

Instances For

def DepGrammar.Catena.allNonEmptySubsets (n : ℕ) :

List (List ℕ)

All non-empty subsets of {0,..., n-1}.

Equations

DepGrammar.Catena.allNonEmptySubsets n = List.filter (fun (x : List ℕ) => !x.isEmpty) (DepGrammar.Catena.allNonEmptySubsets.powerset (List.range n))

Instances For

def DepGrammar.Catena.allNonEmptySubsets.powerset (remaining : List ℕ) :

List (List ℕ)

Equations

One or more equations did not get rendered due to their size.
DepGrammar.Catena.allNonEmptySubsets.powerset [] = [[]]

Instances For

def DepGrammar.Catena.catenaeCount (n : ℕ) (deps : List Dependency) :

Count catenae for a tree with n nodes and given dependency edges.

Equations

DepGrammar.Catena.catenaeCount n deps = (List.filter (DepGrammar.Catena.isCatena deps) (DepGrammar.Catena.allNonEmptySubsets n)).length

Instances For

def DepGrammar.Catena.constituentCount (n : ℕ) (deps : List Dependency) :

Count constituents for a tree with n nodes.

Equations

DepGrammar.Catena.constituentCount n deps = (List.filter (DepGrammar.Catena.isConstituent deps n) (DepGrammar.Catena.allNonEmptySubsets n)).length

Instances For

def DepGrammar.Catena.totalCombinations (n : ℕ) :

Total non-empty subsets of n elements: 2^n - 1.

Equations

DepGrammar.Catena.totalCombinations n = 2 ^ n - 1

Instances For

def DepGrammar.Catena.catenaRatio (n : ℕ) (deps : List Dependency) :

Catena ratio as (catenae, non-catenae). Flatter trees → higher ratio.

Equations

DepGrammar.Catena.catenaRatio n deps = (DepGrammar.Catena.catenaeCount n deps, DepGrammar.Catena.totalCombinations n - DepGrammar.Catena.catenaeCount n deps)

Instances For

def DepGrammar.Catena.tree9 :

List Dependency

Tree (9), p. 359: 4 abstract nodes. a(0) /
b(1) c(2) | d(3)

10 catenae, 5 non-catenae, 4 constituents out of 15 total. Catenae: {a},{b},{c},{d},{a,b},{a,c},{b,d},{a,b,c},{a,b,d},{a,b,c,d} Constituents: {d},{c},{b,d},{a,b,c,d}

Equations

One or more equations did not get rendered due to their size.

Instances For

def DepGrammar.Catena.tree22 :

List Dependency

Tree (22), p. 371: 3-node flat tree. a(0) /
b(1) c(2)

6 catenae, 1 non-catena, 3 constituents out of 7 total.

Equations

DepGrammar.Catena.tree22 = [{ headIdx := 0, depIdx := 1, depType := UD.DepRel.dep }, { headIdx := 0, depIdx := 2, depType := UD.DepRel.dep }]

Instances For

def DepGrammar.Catena.chain4 :

List Dependency

4-node chain: a(0) → b(1) → c(2) → d(3). 10 catenae (only contiguous intervals are connected).

Equations

One or more equations did not get rendered due to their size.

Instances For

def DepGrammar.Catena.star4 :

List Dependency

4-node star: a(0) → {b(1), c(2), d(3)}. 11 catenae (every root-containing subset is connected).

Equations

One or more equations did not get rendered due to their size.

Instances For

def DepGrammar.Catena.chain3 :

List Dependency

3-node chain: a(0) → b(1) → c(2).

Equations

DepGrammar.Catena.chain3 = [{ headIdx := 0, depIdx := 1, depType := UD.DepRel.dep }, { headIdx := 1, depIdx := 2, depType := UD.DepRel.dep }]

Instances For

def DepGrammar.Catena.pulledSomeStrings :

"pulled some strings" — the idiom {pulled, strings} forms a catena but not a constituent.

Words: pulled(0) some(1) strings(2) UD: pulled → strings (obj), strings → some (det).

Equations

One or more equations did not get rendered due to their size.

Instances For

theorem DepGrammar.Catena.total_3 :

totalCombinations 3 = 7

theorem DepGrammar.Catena.total_4 :

totalCombinations 4 = 15

theorem DepGrammar.Catena.tree9_catenae :

catenaeCount 4 tree9 = 10

theorem DepGrammar.Catena.tree9_constituents :

constituentCount 4 tree9 = 4

theorem DepGrammar.Catena.tree9_ratio :

catenaRatio 4 tree9 = (10, 5)

theorem DepGrammar.Catena.tree22_catenae :

catenaeCount 3 tree22 = 6

theorem DepGrammar.Catena.tree22_constituents :

constituentCount 3 tree22 = 3

theorem DepGrammar.Catena.tree22_ratio :

catenaRatio 3 tree22 = (6, 1)

theorem DepGrammar.Catena.chain4_catenae :

catenaeCount 4 chain4 = 10

theorem DepGrammar.Catena.chain4_constituents :

constituentCount 4 chain4 = 4

theorem DepGrammar.Catena.star4_catenae :

catenaeCount 4 star4 = 11

theorem DepGrammar.Catena.star4_constituents :

constituentCount 4 star4 = 4

theorem DepGrammar.Catena.three_nodes_shape_invariant :

catenaeCount 3 chain3 = catenaeCount 3 tree22

theorem DepGrammar.Catena.flatter_more_catenae :

catenaeCount 4 star4 > catenaeCount 4 chain4

Flatter trees have strictly more catenae than chain-shaped trees. (@cite{osborne-gross-2012}, p. 371: the catena ratio increases with flatness)

theorem DepGrammar.Catena.constituent_is_catena_tree9 :

((allNonEmptySubsets 4).all fun (nodes : List ℕ) => if isConstituent tree9 4 nodes = true then isCatena tree9 nodes else true) = true

Every constituent is a catena — verified exhaustively for tree (9). (@cite{osborne-gross-2012}, p. 360: "every 'constituent' is also a catena")

theorem DepGrammar.Catena.constituent_is_catena_star4 :

((allNonEmptySubsets 4).all fun (nodes : List ℕ) => if isConstituent star4 4 nodes = true then isCatena star4 nodes else true) = true

Every constituent is a catena — verified for star4.

theorem DepGrammar.Catena.constituent_is_catena_chain4 :

((allNonEmptySubsets 4).all fun (nodes : List ℕ) => if isConstituent chain4 4 nodes = true then isCatena chain4 nodes else true) = true

Every constituent is a catena — verified for chain4.

theorem DepGrammar.Catena.counting_hierarchy_tree9 :

constituentCount 4 tree9 ≤ catenaeCount 4 tree9 ∧ catenaeCount 4 tree9 ≤ totalCombinations 4

n constituents ≤ catenae count ≤ 2^n - 1 total combinations.

theorem DepGrammar.Catena.counting_hierarchy_star4 :

constituentCount 4 star4 ≤ catenaeCount 4 star4 ∧ catenaeCount 4 star4 ≤ totalCombinations 4

theorem DepGrammar.Catena.singleton_catena_0 :

isCatena tree9 [0] = true

Every singleton is a catena.

theorem DepGrammar.Catena.singleton_catena_1 :

isCatena tree9 [1] = true

theorem DepGrammar.Catena.singleton_catena_2 :

isCatena tree9 [2] = true

theorem DepGrammar.Catena.singleton_catena_3 :

isCatena tree9 [3] = true

theorem DepGrammar.Catena.not_catena_ad :

isCatena tree9 [0, 3] = false

{a, d} is NOT a catena — a and d aren't connected without b.

theorem DepGrammar.Catena.not_catena_bc :

isCatena tree9 [1, 2] = false

{b, c} is NOT a catena — b and c aren't connected without a.

theorem DepGrammar.Catena.idiom_is_catena :

isCatena pulledSomeStrings.deps [0, 2] = true

The idiom "pulled strings" is a catena (connected via obj edge)...

theorem DepGrammar.Catena.idiom_not_constituent :

isConstituent pulledSomeStrings.deps 3 [0, 2] = false

...but NOT a constituent (subtree of "pulled" includes "some").

theorem DepGrammar.Catena.phrase_is_constituent :

isConstituent pulledSomeStrings.deps 3 [0, 1, 2] = true

The full phrase "pulled some strings" IS both a constituent and a catena.

theorem DepGrammar.Catena.phrase_is_catena :

isCatena pulledSomeStrings.deps [0, 1, 2] = true

theorem DepGrammar.Catena.IsCatena_singleton {n : ℕ} (G : SimpleGraph (Fin n)) (v : Fin n) :

Every singleton is a catena in any SimpleGraph (mathlib Prop-level). Proof: the induced subgraph on {v} has a single vertex, so it's trivially preconnected.

theorem DepGrammar.Catena.isCatena_iff_IsCatena {n : ℕ} (deps : List Dependency) (nodes : List ℕ) (hbounds : ∀ i ∈ nodes, i < n) (hnodup : nodes.Nodup) :

isCatena deps nodes = true ↔ IsCatena (depsToSimpleGraph n deps) (List.filterMap (fun (i : ℕ) => if h : i < n then some ⟨i, h⟩ else none) nodes).toFinset

The computable isCatena agrees with the Prop-level IsCatena.

Forward (isCatena = true → IsCatena): BFS from the start node reaches all nodes in the list. BFS soundness gives BidirReachable from start to every node; symmetry + transitivity gives connectivity between any pair; the bridge converts to SimpleGraph.Reachable.

Backward (IsCatena → isCatena = true): Preconnected gives Reachable start v for every v in the set. The bridge converts each Reachable path to BidirReachable, and BFS completeness ensures every such node appears in the output.

def DepGrammar.Catena.catenaTotalDepLength (deps : List Dependency) (nodes : List ℕ) :

Total dependency length restricted to edges within a catena. Measures the linear spread of the catena.

Equations

One or more equations did not get rendered due to their size.

Instances For

theorem DepGrammar.Catena.idiom_catena_dep_length :

catenaTotalDepLength pulledSomeStrings.deps [0, 2] = 2

The idiom catena {pulled, strings} has dep length 2.

theorem DepGrammar.Catena.constituent_dep_length :

catenaTotalDepLength pulledSomeStrings.deps [0, 1, 2] = 3

The full constituent {pulled, some, strings} has dep length 3.