@cite{kehler-rohde-2013} #
@cite{hobbs-1979} @cite{kehler-2002}
A Probabilistic Reconciliation of Coherence-Driven and Centering-Driven Theories of Pronoun Interpretation. Theoretical Linguistics 39(1-2), 1–37.
Core Argument #
Two theories make seemingly irreconcilable claims about pronoun interpretation. @cite{hobbs-1979}: it is a by-product of coherence establishment; grammatical form is irrelevant. Centering (Grosz, Joshi & Weinstein 1995): it is driven by information structure and grammatical roles; world knowledge is irrelevant.
The reconciliation is a Bayesian decomposition (eq. 13):
P(referent | pronoun) ∝ P(pronoun | referent) × P(referent)
The two terms have different conditioning:
- P(referent): coherence-driven next-mention bias, computed via eq. (9):
P(referent) = Σ_CR P(CR) × P(referent | CR) - P(pronoun | referent): production/form bias, driven by topichood (centering's contribution)
Five experiments with transfer-of-possession verbs and IC verbs confirm that these two components are empirically dissociable.
Key Findings #
| # | Finding | Section |
|---|---|---|
| 1 | Imperfective → more Source interpretations than perfective | §3 |
| 2 | Coherence relations strongly condition next-mention bias | §4 |
| 3 | Shifting P(CR) via instructions shifts interpretation | §5 |
| 4 | P(referent|CR) stable across conditions | §6 |
| 5 | Pronoun prompt shifts CR distribution bidirectionally | §7 |
| 6 | Voice affects next-mention but not pronominalization per position | §8 |
| 7 | Passive subject → more pronominalization than active subject | §8 |
| 8 | Bayesian predictions match actual interpretation biases | §8 |
| 9 | Contiguity class splits: Occasion → Goal, Elaboration → Source | §9 |
Independence Hypothesis #
P(pronoun | referent) is conditioned by topichood/subjecthood, while P(referent) is conditioned by coherence relations. These two components are independent: coherence-driven semantic biases affect next-mention but NOT pronominalization rate.
Prompt type in passage completion experiments.
- pronoun : PromptType
- noPronoun : PromptType
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Instruction condition (transfer-of-possession exps).
- whatNext : InstructionCond
- why : InstructionCond
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Eq. (9): coherence-marginalized next-mention bias.
P(referent) = Σ_CR P(CR) × P(referent | CR)
The prior probability of a referent being mentioned next is a mixture of CR-specific biases weighted by the prior over coherence relations. This is the coherence-driven "top-down" component.
P(CR): prior probability of coherence relation (%)
- pSourceGivenCR : Core.Discourse.CoherenceRelation.CoherenceRelation → Nat
P(referent = Source | CR): Source bias given CR (%)
Instances For
Topichood level, determined by grammatical construction.
Passive subjects signal stronger topichood than active subjects: using a marked construction to place an entity in subject position is a stronger indicator that the speaker treats it as the sentence topic (Davison 1984). This is the centering-driven "bottom-up" component of the model.
The P(pronoun | referent) term in eq. (13) tracks this level, not grammatical role per se.
- strong : TopichoodLevel
- default_ : TopichoodLevel
- low : TopichoodLevel
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Compute topichood from voice and surface position.
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.topichood voice false = Phenomena.Reference.Studies.KehlerRohde2013.TopichoodLevel.low
- Phenomena.Reference.Studies.KehlerRohde2013.topichood UD.Voice.Pass true = Phenomena.Reference.Studies.KehlerRohde2013.TopichoodLevel.strong
- Phenomena.Reference.Studies.KehlerRohde2013.topichood voice true = Phenomena.Reference.Studies.KehlerRohde2013.TopichoodLevel.default_
Instances For
Table 1: Source interpretation rate by aspect. Imperfective focuses on ongoing event (Source still central); perfective focuses on end state (Goal = endpoint of transfer).
Instances For
Imperfective yields more Source interpretations than perfective.
Coherence relation frequency and bias data from Table 2
(perfective condition, transfer-of-possession verbs).
"Violated Expectation" in the paper = CoherenceRelation.contrast.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.cr_occasion = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.occasion, freqPct := 38, sourceGivenCR := 18 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.cr_elaboration = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.elaboration, freqPct := 28, sourceGivenCR := 98 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.cr_explanation = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.explanation, freqPct := 18, sourceGivenCR := 80 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.cr_violatedExp = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.contrast, freqPct := 8, sourceGivenCR := 76 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.cr_result = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.result, freqPct := 6, sourceGivenCR := 8 }
Instances For
Occasion and Result are Goal-biased (Source < 50%).
Elaboration, Explanation, and Violated Expectation are Source-biased.
The overall ~57/43 Source/Goal split masks strong CR-conditioned biases. Occasion is most common (.38) and Goal-biased (.18 Source); Elaboration is second (.28) and strongly Source-biased (.98).
Instantiate the perfective-condition next-mention model with Table 2 data. Downstream study files can reference these CR biases.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Table 3: "What happened next?" → Occasion-dominated; "Why?" → Explanation-dominated. Instructions shift P(CR) without changing the stimuli.
Instances For
Table 5: Source interpretation by instruction condition (perfective). Shifting P(CR) shifts P(referent), as predicted by eq. (9).
Instances For
Instances For
The instruction effect is 48 pp on identical stimuli. No morphosyntactic heuristic can account for this.
Table 4: P(Source | CR) is stable across the original experiment and the instruction manipulation, supporting the structural claim that CR-conditioned biases are properties of the coherence relation itself, not the experimental context.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.stab_elaboration = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.elaboration, originalPct := 98, instructionPct := 100 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.stab_explanation = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.explanation, originalPct := 80, instructionPct := 82 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.stab_violatedExp = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.contrast, originalPct := 76, instructionPct := 74 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.stab_occasion = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.occasion, originalPct := 18, instructionPct := 27 }
Instances For
Equations
- Phenomena.Reference.Studies.KehlerRohde2013.stab_result = { cr := Core.Discourse.CoherenceRelation.CoherenceRelation.result, originalPct := 8, instructionPct := 9 }
Instances For
Bias direction (above/below 50%) is preserved for all five CRs across conditions. P(CR) can shift independently of P(ref|CR).
Table 6: CR distribution by prompt type. The mere presence of an ambiguous pronoun shifts coherence expectations toward Source-biased relations. This bidirectionality — coreference affects coherence, not just vice versa — is predicted by Bayes (eq. 12) but not by Hobbs (pronouns are inert free variables) or Centering (does not model coherence).
- prompt : PromptType
- freqPct : Nat
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pronoun prompt increases Source-biased CRs.
Pronoun prompt decreases Goal-biased CRs.
Instances For
Instances For
Voice affects next-mention in pronoun condition: active (.77) vs passive (.42). Passivization moves the causally-implicated referent out of subject position — same proposition, different bias.
In the no-pronoun condition the pattern reverses: passive (.76) > active (.59). By-phrases are optional in English, so their inclusion signals the referent will be re-mentioned.
Voice affects coherence in pronoun condition: active produces more Explanations than passive. Since propositions are identical, this is mediated by the shift in pronominal reference — demonstrating bidirectional coherence–coreference dependency.
Central topichood prediction: passive subjects are pronominalized more than active subjects (87% vs 62%).
This is NOT explicable by grammatical role alone — both are subjects. It reflects the stronger topichood signal of the passive: using a marked syntactic form to place an entity in subject position is a stronger indicator of topic status. This is the key evidence that P(pronoun | referent) tracks TOPICHOOD, not subjecthood.
Non-subject pronominalization is invariant across voice (24% vs 23%). At the same topichood level (low), the voice manipulation — which changes coherence expectations dramatically — has no effect on pronominalization rate. This is the Independence Hypothesis in action: P(pronoun | referent) does not depend on coherence-driven factors.
Subjects are pronominalized more than non-subjects in both voices. This subject advantage is the centering-derived component.
Topichood monotonically predicts pronominalization: strong (passive subject, 87%) > default (active subject, 62%)
low (non-subject, ~24%).
Bayesian predictions are directionally correct: active > passive in both predicted and actual biases.
The passive prediction is highly accurate (59% vs 60%).
Compute the coherence-marginalized Source bias from a NextMentionModel. This IS equation (9): P(Source) = Σ_CR P(CR) × P(Source | CR). Result is in basis points (×10000); divide by 100 for percentage.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Eq. (9) derivation: the "Why?" mixture exceeds the "What next?" mixture. This is DERIVED from the model, not read off Table 5. The proof computes: Why: 1×27 + 91×82 + 8×100 + 1×74 + 0×9 + 0×50 = 8363 What next: 71×27 + 1×82 + 5×100 + 8×74 + 5×9 + 10×50 = 3636 and verifies 8363 > 3636. The direction follows from Explanation (Source-biased at 82%) dominating the Why mixture at 91%.
The computed mixtures are consistent with Table 5: Why → ~84% Source, What-next → ~36% Source (vs observed 82% and 34%). The small discrepancy is from integer rounding and the "Other" CR category.
Compute P(Subject | pronoun) via Bayes' rule (eq. 13). Takes P(Subject next-mentioned) from no-pronoun data and P(pronoun | position) from pronominalization rates. Result is a percentage (0–100).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Eq. (13) derivation: active voice. From:
- P(Subject) = 59% (Table 7, no-pronoun, causal ref = subject)
- P(pronoun | Subject) = 62% (Table 9)
- P(pronoun | NonSubject) = 24% (Table 9) Bayes' rule yields: 62×59 / (62×59 + 24×41) = 3658/4642 ≈ 78%. The paper reports 81% (from unrounded data); the direction matches.
Eq. (13) derivation: passive voice. From:
- P(Subject) = 100 - 76 = 24% (Table 7: 76% mention causal ref, who is the NON-subject in passive)
- P(pronoun | Subject) = 87% (Table 9)
- P(pronoun | NonSubject) = 23% (Table 9) Bayes' rule yields: 87×24 / (87×24 + 23×76) = 2088/3836 ≈ 54%.
Central Bayesian prediction: Bayes' rule correctly derives that active > passive for P(Subject | pronoun), even though passive subjects are more likely to be pronominalized (87% vs 62%). The prior P(Subject) is much lower in passive (24% vs 59%), and this dominates. Production bias alone would predict passive > active; the Bayesian model correctly reverses this.
The two Goal-biased CRs (Occasion, Result) both focus on what happens AFTER the prior event. For transfer verbs, the endpoint is the Goal.
Explanation is Source-biased and selects for causes (backward causal). For transfer verbs, the Source/initiator is the cause. For IC verbs, the stimulus is the cause — this is the bridge to IC bias studies.
Key insight: the contiguity class does NOT uniformly predict bias. Occasion (18% Source) and Elaboration (98% Source) are both contiguity relations but have opposite biases. Occasion focuses on the END STATE (Goal); Elaboration redescribes the SAME EVENT (Source/initiator). The bias is determined by the specific relation, not the class.