Documentation

Linglib.Phenomena.Case.Studies.Haspelmath2021

@cite{haspelmath-2021}: Explaining Argument-Coding Splits @cite{haspelmath-2021} #

Explaining argument-coding splits with role-reference associations. Linguistics 59(5): 1231–1270.

Overview #

Haspelmath proposes a single meta-universal — the Role-Reference Association Universal (Universal 1) — that subsumes differential object marking, split ergativity, ditransitive splits, and person-scenario splits under one generalization: deviations from the usual associations of role rank and referential prominence tend to be coded by longer grammatical forms.

The paper further argues that this universal itself reduces to a form-frequency correspondence (§10.2): the "usual" associations ARE the frequent ones, and frequent expressions get shorter forms (Zipf).

The Paper's 14 Universals #

The paper states 14 numbered universals organized hierarchically (Figure 1, §11.1):

Meta-universals (§2) #

Single-argument coding splits (§3–5) #

Scenario and voice splits (§6–9) #

What We Formalize #

This file captures Universals 1–9 using existing linglib infrastructure. Universals 4 and 6 are already proven in Aissen2003.lean and DeHoopMalchukov2008.lean respectively — we re-export those results. Universals 10–14 (ditransitive scenarios, inverse, passive, dative alternation) are noted but require additional infrastructure.

Universal 1 (@cite{haspelmath-2021}, §2, statement (5)):

"Deviations from the usual associations of role rank and referential
prominence tend to be coded by longer grammatical forms (if the coding
is asymmetric)."

This is the paper's central claim — a single meta-universal that
subsumes all 13 other universals. It is formalized here as the
conjunction of two properties: (a) the direction of differential
marking matches the role's default prominence zone, and (b) the
form-frequency correspondence assigns higher frequency to the
usual (default) zone. 

Universal 1: The Role-Reference Association Universal holds for all monotransitive and ditransitive roles. For each role, the direction of differential marking matches the deviation from the usual association.

  • P/T: prominent end is unusual → differential marking targets prominent
  • A/R: non-prominent end is unusual → differential marking targets non-prominent

Universal 1 is grounded by the form-frequency correspondence (§10.2): the frequency proxy assigns higher frequency to the "usual" (default) zone for every role. Usual = frequent = short coding.

Universal 2 (@cite{haspelmath-2021}, §2, statement (6)):

"Arguments with higher-ranked roles tend to be more referentially
prominent, and vice versa."

This defines the *baseline* for Universal 1: A/R arguments (higher role
rank) tend to be human, definite, topical. P/T arguments (lower role
rank) tend to be inanimate, indefinite, new-information.

Formalized as: high-rank roles have high default prominence, low-rank
roles have low default prominence. 

Universal 2: Role rank determines default prominence direction. A and R are high-default (expect prominent referents). P and T are low-default (expect non-prominent referents).

Universal 3 (@cite{haspelmath-2021}, §3, statement (13)):

"The single-argument flagging universal: If a language has an asymmetric
single-argument flagging split depending on some prominence scale, then
the coding is longer for prominent P/T-arguments or for non-prominent
A/R-arguments."

This is the general form from which Universals 4, 6, 7, 8 follow as
specific cases for each argument role. It applies to both flagging and
indexing (statement (15)). 

Universal 3: The monotonicity direction is determined by role type. For P/T (low-default roles), differential marking targets the prominent end (upper set / monotone). For A/R (high-default roles), it targets the non-prominent end (lower set / anti-monotone).

This subsumes both flagging and indexing channels.

Universal 4 (@cite{haspelmath-2021}, §4.1, statement (14)):

"Split P flagging (Differential Object Marking): If a language has an
asymmetric split in P flagging depending on some prominence scale, then
the special flag is used on the prominent P-argument."

This is @cite{aissen-2003}'s core result: the OT factorial typology generates
only monotone DOM patterns — marking always starts from the prominent
end of the animacy/definiteness scale. 

Universal 4: All OT-generated animacy DOM patterns are monotone (prominent Ps marked before non-prominent Ps). Re-exported from @cite{aissen-2003}.

Universal 5:

The scenario coding universal: if a language has an asymmetric scenario
split, then the coding is longest for upstream scenarios (P more
prominent than A), shortest for downstream scenarios (A more prominent
than P), and intermediate for balanced scenarios.

This is the second major branch under Universal 1 (alongside Universal
3). While Universal 3 conditions coding on the prominence of a single
argument, Universal 5 conditions it on the *combination* of A-person
and P-person. 

Universal 5: Downstream scenarios have higher A-person rank than upstream. Higher A-person rank = more "usual" = predicted shorter coding.

Universal 5: The downstream/upstream/balanced trichotomy is exhaustive for all 9 person-pair scenarios.

Universal 6 (@cite{haspelmath-2021}, §4.2, statement (21)):

"Split A flagging (Differential Subject Marking): If a language has an
asymmetric split in A flagging depending on some prominence scale, then
the special flag is used on the non-prominent A-argument."

The mirror image of Universal 4. Verified via @cite{de-hoop-malchukov-2008} Distinguish constraint: weak (non-prominent) subjects get overt
ergative marking. 

Universal 7 (@cite{haspelmath-2021}, §5, statement (26)):

"Split R flagging: If a language has an asymmetric split in R flagging
depending on some prominence scale, then the special flag is used on
the non-prominent R-argument."

R behaves like A: both are high-rank roles whose differential marking
targets the non-prominent end.

**Universal 8** (@cite{haspelmath-2021}, §5, statement (27)):

"Split T flagging: If a language has an asymmetric split in T flagging
depending on some prominence scale, then the special flag is used on
the prominent T-argument."

T behaves like P: both are low-rank roles whose differential marking
targets the prominent end. 
theorem Phenomena.Case.Studies.Haspelmath2021.R_monotone_like_A (profile : Core.Prominence.DifferentialMarkingProfile) :
{ name := profile.name, role := Core.Prominence.ArgumentRole.R, channel := profile.channel, marks := profile.marks }.isMonotone = { name := profile.name, role := Core.Prominence.ArgumentRole.A, channel := profile.channel, marks := profile.marks }.isMonotone

R monotonicity uses the same direction as A monotonicity.

theorem Phenomena.Case.Studies.Haspelmath2021.T_monotone_like_P (profile : Core.Prominence.DifferentialMarkingProfile) :
{ name := profile.name, role := Core.Prominence.ArgumentRole.T, channel := profile.channel, marks := profile.marks }.isMonotone = { name := profile.name, role := Core.Prominence.ArgumentRole.P, channel := profile.channel, marks := profile.marks }.isMonotone

T monotonicity uses the same direction as P monotonicity.

Universal 9: Monotransitive scenario splits.

When argument coding depends on the person combination (A-person ×
P-person), coding is longest for upstream scenarios (3→SAP) and
shortest for downstream (SAP→3). Local scenarios (SAP↔SAP) tend
to get short coding as both arguments are highly prominent. 

Universal 9: Local scenarios have SAP in both slots — the most prominent combination.

The parallel between monotransitive and ditransitive alignment is a structural consequence of the role-rank hierarchy (Universal 2):

- Indirective (R marked, T = P) parallels accusative (P marked, A = S)
- Secundative (T marked, R = P) parallels ergative (A marked, P = S)

This follows from Universals 7–8: R behaves like A, T behaves like P. 

The correlation between DOM and accusative alignment, and between DSM and ergative alignment, is independently derived in @cite{de-hoop-malchukov-2008} via the PaIP (Primary Actant Immunity Principle). @cite{haspelmath-2021} discusses this as background but does not number it as one of his 14 universals.

This theorem re-exports the De Hoop & Malchukov result for reference. 

Differential marking patterns (@cite{de-hoop-malchukov-2008}, not a numbered Haspelmath universal — included for cross-reference). Under weak BiOT, all rankings produce differential marking; the asymmetry DOM ↔ nom-acc / DSM ↔ ergative requires voice alternation (passive/antipassive) to resolve PaIP-I/D conflicts.

Universal 10:

"If a language has a person-role constraint in its ditransitive
construction, it is restricted to T=SAP, R=3."

Reuse `Scenario` for R×T pairs (aPerson = R, pPerson = T). When T is
SAP and R is 3rd (upstream ditransitive), coding is longer — predicted
by the role-reference association (R is high-rank, so R=3rd is unusual;
T is low-rank, so T=SAP is unusual). 

Universal 11:

"If a language has a relative scenario split, then the coding of
upstream scenarios is longer (or at least not shorter) than the
coding of downstream scenarios."

Frequency class decreases monotonically: downstream > balanced > upstream.
By the form-frequency correspondence, coding length increases in the
same order. 

Universal 12:

"Inverse verb forms (marking upstream scenarios) tend to be
morphologically more complex than direct verb forms."

Upstream = unusual = lower frequency class = predicted longer by FFC. 

Universal 13:

"Passive voice is preferred when A is non-given (unusual for A) and/or
P is non-new (unusual for P)."

Active voice is the "usual" frame where A=given, P=new. Passive is
preferred when discourse statuses deviate from these defaults. 

Passive is preferred when A's or P's discourse status deviates from the usual association.

Equations
  • One or more equations did not get rendered due to their size.
Instances For

    Universal 14:

    "The prepositional dative (longer form) is preferred when R is non-given
    and/or T is non-new."
    
    The double-object construction is the shorter form, preferred when
    discourse statuses match the usual associations (R=given, T=new). 
    

    PP-dative (longer) is preferred when R's or T's discourse status deviates from the usual association.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Form-frequency unification (@cite{haspelmath-2021}, §10.2, statement 68):

      All 14 universals reduce to Zipf's form-frequency correspondence.
      The scenario-level check: for every pair of scenarios where one is
      downstream and the other upstream, the downstream one has a strictly
      higher frequency class.