Documentation

Linglib.Theories.Syntax.DependencyGrammar.Formal.Catena

Catenae: A Novel Unit of Syntactic Analysis #

@cite{osborne-gross-2012}

Formalizes the catena (Osborne, Putnam & Groß 2012, Syntax 15:4, 354–396).

A catena (Latin: "chain") is a connected subgraph of a dependency tree — any word or combination of words that is continuous with respect to the dominance relation. Catenae strictly generalize constituents: every constituent is a catena, but not every catena is a constituent.

Mathlib Integration #

The dependency tree is converted to a mathlib SimpleGraph (Fin n) via depsToSimpleGraph, bridging linglib's DepTree/Dependency types to mathlib's graph theory infrastructure. The Prop-level IsCatena is defined using SimpleGraph.Preconnected on induced subgraphs. Computable Bool functions (isCatena, isConstituent) enable native_decide proofs.

Key Results #

Bridges #

The undirected simple graph underlying dependency edges over n nodes. Forgets edge direction and labels: i ~ j iff some dependency connects them. Uses mathlib's SimpleGraph (Fin n) — the fundamental bridge from linglib's DepTree/Dependency types to mathlib's graph theory.

Equations
Instances For

    Convert a DepTree to a mathlib SimpleGraph on its node set.

    Equations
    Instances For

      A catena is a non-empty subset S of tree nodes where the induced subgraph on S is preconnected. Equivalently: a word or combination of words that is continuous with respect to dominance.

      Uses mathlib's SimpleGraph.induce and SimpleGraph.Preconnected.

      Equations
      Instances For

        Check if a node set is connected within the dependency graph. Uses BFS from the first node and checks all others are reached.

        Equations
        Instances For

          Computable catena check: non-empty and connected in the tree.

          Equations
          Instances For
            inductive DepGrammar.Catena.BidirReachable (deps : List Dependency) (allowed : List ) :
            Prop

            Bidirectional reachability within a restricted node set. BidirReachable deps allowed u v holds when there is a path from u to v using dependency edges (in either direction) where all nodes are in allowed.

            Instances For
              theorem DepGrammar.Catena.bidir_step_append {deps : List Dependency} {allowed : List } {u v w : } (h : BidirReachable deps allowed u v) (hv : v allowed) (hw : w allowed) (hedge : ddeps, d.headIdx = v d.depIdx = w d.depIdx = v d.headIdx = w) :
              BidirReachable deps allowed u w

              Append a step to the end of a bidirectional path.

              theorem DepGrammar.Catena.bidir_symm {deps : List Dependency} {allowed : List } {u v : } (h : BidirReachable deps allowed u v) :
              BidirReachable deps allowed v u

              Bidirectional reachability is symmetric (reverse the path, flip edges).

              theorem DepGrammar.Catena.bidir_trans {deps : List Dependency} {allowed : List } {u v w : } (h1 : BidirReachable deps allowed u v) (h2 : BidirReachable deps allowed v w) :
              BidirReachable deps allowed u w

              Bidirectional reachability is transitive.

              theorem DepGrammar.Catena.bfsReachable_complete (deps : List Dependency) (allowed : List ) (start target : ) (h : BidirReachable deps allowed start target) :
              target DepGrammar.Catena.bfsReachable✝ deps allowed start

              BFS completeness: every bidirectionally reachable node appears in the bfsReachable output.

              Proved by showing the output contains start and is closed under edges within allowed, then applying induction on BidirReachable.

              Any singleton is a catena: non-empty and trivially connected.

              Convenience: check catena on a DepTree directly.

              Equations
              Instances For

                Check if a node set equals the complete subtree (projection) rooted at some node. Uses projection from Core/Basic.lean.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For
                  Equations
                  Instances For

                    Count catenae for a tree with n nodes and given dependency edges.

                    Equations
                    Instances For

                      Total non-empty subsets of n elements: 2^n - 1.

                      Equations
                      Instances For

                        Catena ratio as (catenae, non-catenae). Flatter trees → higher ratio.

                        Equations
                        Instances For

                          Tree (9), p. 359: 4 abstract nodes. a(0) /
                          b(1) c(2) | d(3)

                          10 catenae, 5 non-catenae, 4 constituents out of 15 total. Catenae: {a},{b},{c},{d},{a,b},{a,c},{b,d},{a,b,c},{a,b,d},{a,b,c,d} Constituents: {d},{c},{b,d},{a,b,c,d}

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Tree (22), p. 371: 3-node flat tree. a(0) /
                            b(1) c(2)

                            6 catenae, 1 non-catena, 3 constituents out of 7 total.

                            Equations
                            Instances For

                              4-node chain: a(0) → b(1) → c(2) → d(3). 10 catenae (only contiguous intervals are connected).

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                4-node star: a(0) → {b(1), c(2), d(3)}. 11 catenae (every root-containing subset is connected).

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  3-node chain: a(0) → b(1) → c(2).

                                  Equations
                                  Instances For

                                    "pulled some strings" — the idiom {pulled, strings} forms a catena but not a constituent.

                                    Words: pulled(0) some(1) strings(2) UD: pulled → strings (obj), strings → some (det).

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Flatter trees have strictly more catenae than chain-shaped trees. (@cite{osborne-gross-2012}, p. 371: the catena ratio increases with flatness)

                                      Every constituent is a catena — verified exhaustively for tree (9). (@cite{osborne-gross-2012}, p. 360: "every 'constituent' is also a catena")

                                      Every constituent is a catena — verified for star4.

                                      Every constituent is a catena — verified for chain4.

                                      n constituents ≤ catenae count ≤ 2^n - 1 total combinations.

                                      Every singleton is a catena.

                                      {a, d} is NOT a catena — a and d aren't connected without b.

                                      {b, c} is NOT a catena — b and c aren't connected without a.

                                      The idiom "pulled strings" is a catena (connected via obj edge)...

                                      ...but NOT a constituent (subtree of "pulled" includes "some").

                                      The full phrase "pulled some strings" IS both a constituent and a catena.

                                      Every singleton is a catena in any SimpleGraph (mathlib Prop-level). Proof: the induced subgraph on {v} has a single vertex, so it's trivially preconnected.

                                      theorem DepGrammar.Catena.isCatena_iff_IsCatena {n : } (deps : List Dependency) (nodes : List ) (hbounds : inodes, i < n) (hnodup : nodes.Nodup) :
                                      isCatena deps nodes = true IsCatena (depsToSimpleGraph n deps) (List.filterMap (fun (i : ) => if h : i < n then some i, h else none) nodes).toFinset

                                      The computable isCatena agrees with the Prop-level IsCatena.

                                      Forward (isCatena = true → IsCatena): BFS from the start node reaches all nodes in the list. BFS soundness gives BidirReachable from start to every node; symmetry + transitivity gives connectivity between any pair; the bridge converts to SimpleGraph.Reachable.

                                      Backward (IsCatenaisCatena = true): Preconnected gives Reachable start v for every v in the set. The bridge converts each Reachable path to BidirReachable, and BFS completeness ensures every such node appears in the output.

                                      Total dependency length restricted to edges within a catena. Measures the linear spread of the catena.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        The idiom catena {pulled, strings} has dep length 2.

                                        The full constituent {pulled, some, strings} has dep length 3.