Noncooperative Communication: Unified Argumentative RSA #
@cite{barnett-griffiths-hawkins-2022} @cite{cummins-2025} @cite{cummins-franke-2021} @cite{merin-1999} @cite{sperber-2010} @cite{goodman-stuhlmuller-2013} @cite{yoon-etal-2020}
Unifies @cite{cummins-franke-2021}'s argumentative strength framework and @cite{barnett-griffiths-hawkins-2022}'s persuasive RSA into a single parameterized model, following @cite{cummins-2025}'s analysis of noncooperative communication.
Core Unification #
Both models instantiate the same weighted-utility architecture:
U(u; w, G) = U_epistemic(u; w) + β · U_goal(u; G)
| Model | U_goal | β |
|---|---|---|
| Standard RSA | — | 0 |
| Barnett et al. | ln P_L0(w*∣u) | fitted (β̂ = 2.26) |
| C&F (semantic) | argStr(u, G) | implicit (speaker maximizes argStr) |
The parameter β controls the cooperativity spectrum:
- β = 0: fully cooperative (standard RSA)
- 0 < β < ∞: partially argumentative (Barnett et al.)
- β → ∞: purely argumentative (C&F's rational speaker)
Epistemic Vigilance #
Following @cite{sperber-2010}, the hearer's interpretation is a trust-weighted mixture of pragmatic and literal posteriors:
P_vigilant(w∣u) = τ · P_L1(w∣u) + (1−τ) · P_L0(w∣u)
where τ ∈ [0,1] is the hearer's trust in speaker cooperativity.
Meaning-Level Taxonomy #
@cite{cummins-2025} identifies four levels at which falsehood can occur: assertion, implicature, presupposition, and typicality departure. Both C&F and Barnett involve misleading at the typicality/implicature level while maintaining truthful assertions — the argumentative speaker exploits pragmatic expectations without violating Quality.
Speaker orientation on the cooperativity spectrum.
- cooperative: β=0, speaker maximizes hearer's accurate belief (standard RSA)
- argumentative: β>0, speaker has a goal G and balances informativity and persuasion (@cite{cummins-franke-2021}, @cite{barnett-griffiths-hawkins-2022}, Macuch @cite{macuch-silva-etal-2024})
The distinction is continuous: β parameterizes the spectrum.
- cooperative : SpeakerOrientation
- argumentative : SpeakerOrientation
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Classify speaker orientation from the goal weight λ ∈ [0,1]
Equations
- One or more equations did not get rendered due to their size.
Instances For
goalOrientedUtility and combinedWeighted are literally the same function (up to the (1,β) scaling).
Both representations agree at β=0: cooperative RSA.
The ordinal bridge: an utterance with higher Bayes factor (C&F's measure) also induces a higher posterior for the goal (Barnett's measure).
P(G|u) = bf · P(G) / (bf · P(G) + P(¬G))
This function is strictly increasing in bf when P(G) ∈ (0,1).
C&F's positive argumentative strength (bayesFactor > 1) iff the utterance shifts the posterior above the prior.
bayesFactor(u,G) > 1 iff P(G|u) > P(G)
This connects C&F's ordinal condition hasPositiveArgStr to the Bayesian posterior shift that Barnett's model operates on.
Level of meaning at which falsehood can occur.
@cite{cummins-2025} identifies four levels, ordered by speaker blameworthiness: assertion > implicature > presupposition > typicality.
Both C&F and Barnett involve misleading at the typicality/implicature level while maintaining truthful assertions — the argumentative speaker exploits pragmatic expectations without violating Quality at the assertion level.
- assertion : MeaningLevel
- implicature : MeaningLevel
- presupposition : MeaningLevel
- typicality : MeaningLevel
Instances For
Equations
- RSA.NoncooperativeCommunication.instBEqMeaningLevel.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Blameworthiness ordering: false assertions attract most blame, typicality departures attract least.
Equations
- RSA.NoncooperativeCommunication.blameworthinessRank RSA.NoncooperativeCommunication.MeaningLevel.assertion = 3
- RSA.NoncooperativeCommunication.blameworthinessRank RSA.NoncooperativeCommunication.MeaningLevel.implicature = 2
- RSA.NoncooperativeCommunication.blameworthinessRank RSA.NoncooperativeCommunication.MeaningLevel.presupposition = 1
- RSA.NoncooperativeCommunication.blameworthinessRank RSA.NoncooperativeCommunication.MeaningLevel.typicality = 0
Instances For
Epistemic vigilance: the hearer's trust in speaker cooperativity.
Following @cite{sperber-2010} as discussed in Cummins (2025 §4):
- Hearer first interprets as if speaker is cooperative (stance of trust)
- Then weighs the pragmatic interpretation by trust level τ
- Falls back toward literal interpretation as trust decreases
The vigilant posterior is: P_V(w|u) = τ · P_L1(w|u) + (1−τ) · P_L0(w|u)
This is CombinedUtility.combined(τ, L0_posterior, L1_posterior).
- trustLevel : ℚ
τ ∈ [0,1]: probability that speaker is cooperative
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Full trust: standard pragmatic interpretation
Equations
- One or more equations did not get rendered due to their size.
Instances For
No trust: literal interpretation only
Equations
- One or more equations did not get rendered due to their size.
Instances For
Vigilant listener posterior: trust-weighted mixture of L1 and L0.
When the hearer suspects the speaker is argumentative (low τ), they discount pragmatic enrichment and rely more on literal meaning.
Equations
- RSA.NoncooperativeCommunication.vigilantPosterior ev l1Post l0Post = ev.trustLevel * l1Post + (1 - ev.trustLevel) * l0Post
Instances For
At full trust, vigilant listener = pragmatic listener L1
At zero trust, vigilant listener = literal listener L0
Vigilant posterior IS CombinedUtility.combined(τ, L0, L1). The epistemic vigilance machinery reuses the same interpolation framework as the speaker's utility tradeoff.
Vigilant posterior is a convex combination when τ ∈ [0,1]
The unified noncooperative RSA model has two parameters:
- goalWeight (speaker side): convex weight on goal utility ∈ [0,1]
- τ (hearer side): trust level (1 = full trust, 0 = literal only)
Both sides use combined (convex interpolation), making the symmetry explicit:
- Speaker:
combined goalWeight uEpi uGoal - Hearer:
combined τ l0Post l1Post(via vigilantPosterior)
Standard RSA is the special case goalWeight=0, τ=1. @cite{barnett-griffiths-hawkins-2022} is goalWeight=226/326 (≈ 0.693, from β=2.26), τ=1. A suspicious hearer facing an argumentative speaker would have high goalWeight, low τ.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Standard cooperative RSA: no goal bias, full trust
Equations
- One or more equations did not get rendered due to their size.
Instances For
@cite{barnett-griffiths-hawkins-2022} fitted model: goalWeight = β/(1+β) = 226/326 ≈ 0.693, pragmatic group with full trust.
Original paper parameterization: β̂ = 2.26 (additive form). Convex reparameterization: goalWeight = 2.26/3.26 = 226/326.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Standard RSA has cooperative orientation
Barnett's fitted model has argumentative orientation
In the unified model, BOTH sides use combined (convex interpolation):
- Speaker:
combined goalWeight uEpi uGoal - Hearer:
vigilantPosterior(τ, L1, L0)=combined τ L0 L1
Full model: speaker chooses u to maximize combined(goalWeight, U_epi, U_goal), listener computes combined(τ, L0(w|u), L1(w|u)).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The unified model at standard parameters reduces to (U_epi, L1) — the standard RSA speaker utility and pragmatic listener.
Barnett et al.'s Eq. 6 (additive: U + β·V) is a scaled version of the
convex combined form used in fullModel:
goalOrientedUtility uEpi uGoal β = (1+β) · combined(β/(1+β), uEpi, uGoal)
Since scaling by (1+β) > 0 preserves ranking, the additive and convex parameterizations are strategically equivalent.
The vigilant posterior is monotone in trust: more trust pulls the posterior toward L1. When L1 is misleading, more trust = more misled.
This is the hearer-side consequence of higher_lambda_when_B_dominates:
the vigilant posterior is combined τ L0 L1, so increasing τ gives
more weight to L1 (the B component).
Symmetric: when L1 < L0, more trust DECREASES the posterior (toward L1).
Pragmatic vulnerability (@cite{cummins-2025} §4): pragmatic inference is exploitable precisely because it is rational.
When L1 diverges from L0 (l0 < l1), the fully-pragmatic listener (τ=1) is maximally exposed to the divergence. Epistemic vigilance (0 < τ < 1) strictly pulls the posterior back toward the immune L0:
- τ = 1: posterior = L1 (fully exposed) —
vigilant_at_full_trust - τ = 0: posterior = L0 (immune) —
vigilant_at_zero_trust - 0 < τ < 1: L0 < posterior < L1 (partially protected) — THIS THEOREM
The weak evidence effect (@cite{barnett-griffiths-hawkins-2022}, weak_evidence_effect_s4) is
a concrete instance: L0 correctly identifies stick 4 as evidence for "longer",
but L1 at β=2 overshoots in the wrong direction. Reducing τ from 1 would pull
the posterior back toward L0's correct assessment.
Linguistic content: An argumentative speaker (goalWeight > 0) exploits L1's reasoning about speaker intentions. L0, which interprets literally without modeling the speaker, cannot be manipulated this way. Vigilance (epistemic caution about speaker cooperativity) is the rational defense.
The converse also holds: if L1 = L0, no speaker parameterization can exploit the difference, since vigilantPosterior degenerates to L0 = L1 at any τ.
Symmetric vulnerability: when L1 undershoots L0 (l1 < l0), vigilance pulls the posterior upward from L1 toward L0.
When L1 = L0, pragmatic inference adds nothing exploitable: the vigilant
posterior equals L0 = L1 at any trust level τ. No speaker parameterization
can create a gap to exploit. This is the formal converse of
pragmatic_vulnerability.
Exact signed deviation from L0: the vigilant posterior deviates from L0 by exactly τ · (L1 - L0).
τ is literally the fraction of L1's divergence from L0 that the listener absorbs. At τ=0: deviation = 0 (immune). At τ=1: deviation = L1-L0 (fully exposed).
Exact gap between L1 and the vigilant posterior: the amount of L1's divergence that vigilance closes is (1-τ) · (L1 - L0).
At τ=0: gap = L1-L0 (fully closed). At τ=1: gap = 0 (nothing closed).
Squared exploitability scales as τ²: the squared deviation of the vigilant posterior from L0 is exactly τ² times the squared L1-L0 gap.
This gives the precise risk calculus for trust: doubling τ quadruples the squared deviation from L0.
Error decomposition: the vigilant posterior's deviation from truth decomposes as a τ-weighted sum of L1's error and L0's error.
(vigilant - truth) = τ · (L1 - truth) + (1-τ) · (L0 - truth)
This is the fundamental equation of epistemic vigilance: the listener's error is a convex combination of the two listeners' errors.
When L0 is perfectly calibrated (l0 = truth), the squared error of the vigilant posterior reduces to τ² · (L1 - truth)².
This is the strongest quantitative vulnerability result: if the literal listener is correct (as in Barnett's weak evidence domain where L0 correctly assesses stick 4), then vigilance reduces squared error by exactly a factor of τ². At τ=1 you absorb all of L1's error; at τ=0 you absorb none.
Zero-error vigilance: there exists a unique τ that makes the vigilant posterior exactly equal truth, namely τ* = (truth - L0) / (L1 - L0).
When truth is between L0 and L1, τ* ∈ [0,1] (optimal_vigilance_in_range),
so it corresponds to a valid trust level.
The optimal τ* has a natural interpretation: it is the relative position of truth within the [L0, L1] interval. If truth is at L0 (literal listener is perfect), τ*=0. If truth is at L1 (pragmatic listener is perfect), τ*=1. If truth is at the midpoint, τ*=1/2.
The optimal τ* is in [0,1] when truth is between L0 and L1.
Backfire generalization conjecture: the weak evidence effect and scalar implicature are structurally identical phenomena.
Whenever a pragmatic listener expects the speaker to use the strongest available utterance, observing a non-maximal one triggers a negative inference that can reverse the literal evidence.
Instances:
- Scalar implicature: hearing "some" → infer speaker couldn't say "all" → conclude ¬all
- Weak evidence effect: seeing stick 4 → infer speaker lacked stick 5 → conclude probably not "longer"
- Polite understatement: "not terrible" → infer speaker couldn't honestly say "good" → conclude mediocre
Abstract pattern: Let U = {u₁,..., uₙ} be utterances ordered by strength, and let L0(goal | uᵢ) be monotone in i. If the speaker is goal-oriented (prefers stronger utterances when goal is true), then for some non-maximal uᵢ:
L0(goal | uᵢ) > prior AND L1(goal | uᵢ, β) < prior
The pragmatic listener's inference — "if the speaker had stronger evidence, they would have used it" — reverses the literal evidence.
Formal statement: Given L0 posteriors monotone in utterance strength with at least two values above the prior, and L1 posteriors derived from a goal-oriented speaker (β > 0), there exists a non-maximal utterance where L0 says "goal is likely" but L1 says "goal is unlikely."
Evidence: weak_evidence_effect_s4 (BarnettEtAl2022) demonstrates
this at β=2. A general proof requires formalizing L1 Bayesian inversion
over abstract strength-ordered RSA scenarios.
What's missing for a proof: The L1 computation (Bayesian inversion of the speaker model) must be formalized generically over ordered utterance sets. The key step is showing that the speaker's monotone preference concentrates probability mass on maximal utterances, making the L1 posterior decrease for non-maximal ones. This is essentially the derivation of quantity implicatures from RSA, generalized beyond scales.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The Barnett stick domain instantiates backfire_generalization.
Sticks {s1,...,s5} ordered by length. L0(longer | sᵢ) is monotone
(l0Longer_monotone). Sticks s4 and s5 both have L0 above the prior
(s4_positive_under_l0, s5_strongest_evidence). At β=2, stick 4
backfires (weak_evidence_effect_s4).
TODO: Prove via Fin 5 ↔ Stick correspondence and the existing BarnettEtAl2022 theorems.