VP Divergence: DG vs PSG Constituency #
@cite{osborne-2019} @cite{osborne-gross-2012}
Formalizes the central empirical disagreement between Dependency Grammar and Phrase Structure Grammar regarding the finite VP (@cite{osborne-2019}, Ch. 2–4; Osborne, Putnam & Groß 2012, Syntax 15:4).
The core claim: In DG, the finite VP (verb + complements, excluding subject) is a catena but not a constituent. In PSG, the finite VP is a constituent. Constituency tests (topicalization, clefting, pseudoclefting, proform substitution, answer fragments) systematically fail to identify the finite VP as a constituent — supporting DG's prediction.
Key Results #
strict_containment_*: For any non-trivial tree, constituents ⊊ catenaeexists_catena_not_constituent: Universal witness — singleton of internal nodevp_is_catena_*/vp_not_constituent_*: The finite VP divergencedg_predictions_match_observed: DG matches ≥4/5 constituency testspsg_predictions_mismatch: PSG matches ≤2/5 constituency tests
Bridges #
- →
Catena.lean: reusesisCatena,isConstituent,catenaeCount,constituentCount - →
Core/Basic.lean: usesDepTree,Dependency,Word - →
DependencyLength.lean: VP catena dep length
Check that constituent count is strictly less than catena count.
Equations
- DepGrammar.VPDivergence.isStrictSubset n deps = decide (DepGrammar.Catena.constituentCount n deps < DepGrammar.Catena.catenaeCount n deps)
Instances For
Enumerate all catenae that are NOT constituents.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Strict containment for tree (9): constituents < catenae. (4 constituents < 10 catenae, @cite{osborne-gross-2012}, p. 359)
Strict containment for chain4: constituents < catenae. (4 constituents < 10 catenae)
Strict containment for star4: constituents < catenae. (4 constituents < 11 catenae)
Tree (9) has exactly 6 non-constituent catenae.
Universal witness for strict containment:
For any tree with ≥2 nodes and an edge (v, w), the singleton {v} is a catena (trivially connected: any singleton is connected in the dep graph) but NOT a constituent ({v} ≠ projection(v) because projection(v) includes w as a descendant).
Uses the computable isCatena (from Catena.lean) rather than the Prop-level
IsCatena which takes a SimpleGraph parameter unrelated to the dependency
list. The two key facts:
isCatena deps [v] = true— any singleton is a catenaprojection deps v ≠ [v]— v has a child w, so projection is strictly larger
A minimal phrase structure tree for structural comparisons with DG.
NOT intended to replace Minimalism's SyntacticObject/XBarPhrase
(those carry theory-specific features). This is purely for the
DG-vs-PSG constituency comparison.
- leaf (word : String) (cat : UD.UPOS) : PSTree
- node (label : String) (children : List PSTree) : PSTree
Instances For
Equations
Yield of a PSTree: the leaf words in left-to-right order.
Equations
Instances For
All constituents of a PSTree: every subtree's yield is a constituent.
Equations
Instances For
Count of distinct constituents in a PSTree.
Equations
Instances For
Check whether a given word sequence is a constituent of this PSTree.
Equations
- t.hasConstituent ws = t.constituents.any fun (x : List String) => x == ws
Instances For
"Bill plays chess" (@cite{osborne-2019}, p. 92, example 24) #
DG analysis:
plays(0)
/ \
Bill(1) chess(2)
- 3 DG constituents: {Bill}, {chess}, {Bill, plays, chess}
- 6 catenae: {Bill}, {plays}, {chess}, {Bill,plays}, {plays,chess}, {Bill,plays,chess}
- The finite VP {plays, chess} is a catena but not a constituent
PSG analysis:
S
/ \
Bill VP
/ \
plays chess
- 5 PS constituents: {Bill}, {plays}, {chess}, {plays, chess}, {Bill, plays, chess}
- The finite VP {plays, chess} IS a constituent
DG tree for "Bill plays chess": plays(0) → Bill(1), plays(0) → chess(2).
Equations
- One or more equations did not get rendered due to their size.
Instances For
PSG tree for "Bill plays chess".
Equations
- One or more equations did not get rendered due to their size.
Instances For
"She reads everything" (@cite{osborne-2019}, p. 46, example 12) #
DG: reads(0) → she(1), reads(0) → everything(2). {reads, everything} is catena not constituent.
PSG: [S she [VP reads everything]]. {reads, everything} IS a constituent.
DG tree for "She reads everything".
Equations
- One or more equations did not get rendered due to their size.
Instances For
PSG tree for "She reads everything".
Equations
- One or more equations did not get rendered due to their size.
Instances For
"They will get the teacher a present" (@cite{osborne-2019}, p. 95–97, ex. 30–34) #
DG analysis — flat tree from will:
will(0)
/ | \ \ \
They(1) get(2) teacher(3) present(4) the(5)
|
a(6)
Actually, following UD conventions more carefully:
- will(0) is AUX but heads in UD (aux relation goes dep→head)
- get(1) → will(0) via aux
- Let's use: get as root, will as aux dependent
Simplified DG (UD-style): get(0) → They(1), get(0) → will(2), get(0) → teacher(3), get(0) → present(4), teacher(3) → the(5), present(4) → a(6)
{get, teacher, present} is catena but not constituent.
PSG analysis — deeply layered:
S
/ \
They VP
/ \
will VP
/ \
get VP
/ \
the teacher NP
/ \
a present
Multiple constituents DG doesn't recognize.
DG tree for "They will get the teacher a present" (UD-style).
Equations
- One or more equations did not get rendered due to their size.
Instances For
PSG tree for "They will get the teacher a present".
Equations
- One or more equations did not get rendered due to their size.
Instances For
The finite VP {plays, chess} is a catena in DG.
The finite VP {plays, chess} is NOT a constituent in DG. (subtree of plays = {plays, Bill, chess} = whole sentence)
The finite VP {plays, chess} IS a constituent in PSG.
DG has 3 constituents, PSG has 5 — DG has fewer.
"Bill plays chess": exactly 3 DG constituents.
"Bill plays chess": exactly 6 catenae.
"Bill plays chess": exactly 5 PSG constituents.
The finite VP {reads, everything} is a catena in DG.
The finite VP {reads, everything} is NOT a constituent in DG.
The finite VP {reads, everything} IS a constituent in PSG.
{get, teacher, present} is a catena in the DG tree.
{get, teacher, present} is NOT a constituent in DG.
{get, the, teacher, a, present} is a catena in DG.
{get, the, teacher, a, present} is NOT a constituent in DG (subtree of get = whole sentence).
The five standard constituency tests (@cite{osborne-2019}, p. 92, ex. 25).
- topicalization : ConstituencyTest
- clefting : ConstituencyTest
- pseudoclefting : ConstituencyTest
- proformSub : ConstituencyTest
- answerFragment : ConstituencyTest
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- DepGrammar.VPDivergence.instBEqConstituencyTest.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
A constituency test result recording DG vs PSG predictions vs observation.
- test : ConstituencyTest
- dgPredicts : Bool
- psgPredicts : Bool
- observed : Bool
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Constituency test results for the finite VP "plays chess" (@cite{osborne-2019}, p. 92, example 25):
- Topicalization: *"...and plays chess Bill" → FAIL
- Clefting: *"It is plays chess that Bill does" → FAIL
- Pseudoclefting: ?"What Bill does is plays chess" → FAIL (infinitival preferred)
- Proform sub (do so): "Bill does so" → PASS (but do-so matches non-constituents too, §3.5)
- Answer fragment: *"?Plays chess" → FAIL (bare infinitive "Play chess" preferred)
DG predicts: FAIL on all 5 (finite VP is not a constituent) PSG predicts: PASS on all 5 (finite VP is a constituent)
Equations
- One or more equations did not get rendered due to their size.
Instances For
DG predictions match observed results on 4 of 5 tests (only proform substitution is a mismatch — DG predicts fail, observed pass; but proform sub is known to be unreliable for finite VP, see §3.5).
PSG predictions match observed results on only 1 of 5 tests. PSG predicts all 5 pass (it's a constituent), but only proform sub passes.
PSG matches exactly 1 out of 5 tests.
DG matches exactly 4 out of 5 tests.
DG always has exactly n constituents for an n-word tree (one per node's complete subtree). Verified for "Bill plays chess" (3 words, 3 constituents).
"She reads everything": also 3 constituents for 3 words.
Constituent ratio comparison: DG 3:4 vs PSG 5:2 for "Bill plays chess" (out of 7 total non-empty subsets of 3 words).
DG produces strictly fewer constituents than PSG for every example sentence.
The VP catena {plays, chess} has dependency length 2 (bridge to DependencyLength.lean).
The full sentence constituent has dependency length 3.
Every constituent is a catena — verified exhaustively for "Bill plays chess". (Bridge to Catena.lean's constituent_is_catena theorems)
The finite VP divergence is robust: it holds for the isomorphic "She reads everything" tree as well. Same structure → same divergence.