Linglib.Theories.Pragmatics.RSA.Implementations.SumersEtAl2023

Signaling Bandits #

@cite{frank-goodman-2012} @cite{sumers-hawkins-2023}

Unlike Lewis signaling games where world state = correct action, signaling bandits separate abstract knowledge (feature values) from concrete decisions (which action to take).

source

inductive RSA.SumersEtAl2023.Feature :

Type

Features that characterize actions (e.g., colors, textures)

green : Feature
red : Feature
blue : Feature
spotted : Feature
solid : Feature
striped : Feature

Instances For

source

instance RSA.SumersEtAl2023.instDecidableEqFeature :

DecidableEq Feature

Equations

RSA.SumersEtAl2023.instDecidableEqFeature x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

source

instance RSA.SumersEtAl2023.instReprFeature :

Repr Feature

Equations

RSA.SumersEtAl2023.instReprFeature = { reprPrec := RSA.SumersEtAl2023.instReprFeature.repr }

source

def RSA.SumersEtAl2023.instReprFeature.repr :

Feature → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.SumersEtAl2023.instBEqFeature.beq :

Feature → Feature → Bool

Equations

RSA.SumersEtAl2023.instBEqFeature.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)

Instances For

source

instance RSA.SumersEtAl2023.instBEqFeature :

BEq Feature

Equations

RSA.SumersEtAl2023.instBEqFeature = { beq := RSA.SumersEtAl2023.instBEqFeature.beq }

source

inductive RSA.SumersEtAl2023.FeatureValue :

Type

Feature values in the experimental range

Instances For

source

instance RSA.SumersEtAl2023.instDecidableEqFeatureValue :

DecidableEq FeatureValue

Equations

RSA.SumersEtAl2023.instDecidableEqFeatureValue x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

source

instance RSA.SumersEtAl2023.instReprFeatureValue :

Repr FeatureValue

Equations

RSA.SumersEtAl2023.instReprFeatureValue = { reprPrec := RSA.SumersEtAl2023.instReprFeatureValue.repr }

source

def RSA.SumersEtAl2023.instReprFeatureValue.repr :

FeatureValue → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

source

instance RSA.SumersEtAl2023.instBEqFeatureValue :

BEq FeatureValue

Equations

RSA.SumersEtAl2023.instBEqFeatureValue = { beq := RSA.SumersEtAl2023.instBEqFeatureValue.beq }

source

def RSA.SumersEtAl2023.instBEqFeatureValue.beq :

FeatureValue → FeatureValue → Bool

Equations

RSA.SumersEtAl2023.instBEqFeatureValue.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)

Instances For

source

def RSA.SumersEtAl2023.FeatureValue.toRat :

FeatureValue → ℚ

Convert feature value to rational

Equations

Instances For

source

def RSA.SumersEtAl2023.allFeatureValues :

List FeatureValue

All feature values

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.SumersEtAl2023.allFeatures :

List Feature

All features

Equations

One or more equations did not get rendered due to their size.

Instances For

source

structure RSA.SumersEtAl2023.WorldState :

Type

World state: mapping from features to values.

In the mushroom experiment, this defines how valuable each feature is. Example: {Green -> +2, Red -> 0, Blue -> -2, Spotted -> +1, Solid -> 0, Striped -> -1}

featureValue : Feature → FeatureValue

Instances For

source

instance RSA.SumersEtAl2023.instBEqWorldState :

BEq WorldState

Equations

One or more equations did not get rendered due to their size.

source

def RSA.SumersEtAl2023.WorldState.getValue (w : WorldState) (f : Feature) :

ℚ

Get the rational value of a feature in a world

Equations

w.getValue f = (w.featureValue f).toRat

Instances For

source

structure RSA.SumersEtAl2023.Action :

Type

Action (mushroom) characterized by features it has

hasFeature : Feature → Bool
Which features this action has (e.g., a green spotted mushroom)
name : String
Human-readable name

Instances For

source

instance RSA.SumersEtAl2023.instBEqAction :

BEq Action

Equations

One or more equations did not get rendered due to their size.

source

def RSA.SumersEtAl2023.reward (a : Action) (w : WorldState) :

ℚ

Reward for taking an action in a world state.

R(a,w) = Sum_f [a has f] * w(f)

Linear combination of feature values for features the action has.

Equations

RSA.SumersEtAl2023.reward a w = List.foldl (fun (acc : ℚ) (f : RSA.SumersEtAl2023.Feature) => if a.hasFeature f = true then acc + w.getValue f else acc) 0 RSA.SumersEtAl2023.allFeatures

Instances For

source

structure RSA.SumersEtAl2023.Context :

Type

Decision context: subset of available actions

actions : List Action

Instances For

source

structure RSA.SumersEtAl2023.Utterance :

Type

Utterance: claim about a feature's value.

Example: "Spots are +1" = {feature :=.spotted, value :=.pos1}

feature : Feature
value : FeatureValue

Instances For

source

instance RSA.SumersEtAl2023.instDecidableEqUtterance :

DecidableEq Utterance

Equations

RSA.SumersEtAl2023.instDecidableEqUtterance = RSA.SumersEtAl2023.instDecidableEqUtterance.decEq

source

def RSA.SumersEtAl2023.instDecidableEqUtterance.decEq (x✝ x✝¹ : Utterance) :

Decidable (x✝ = x✝¹)

Equations

One or more equations did not get rendered due to their size.

Instances For

source

instance RSA.SumersEtAl2023.instReprUtterance :

Repr Utterance

Equations

RSA.SumersEtAl2023.instReprUtterance = { reprPrec := RSA.SumersEtAl2023.instReprUtterance.repr }

source

def RSA.SumersEtAl2023.instReprUtterance.repr :

Utterance → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.SumersEtAl2023.instBEqUtterance.beq :

Utterance → Utterance → Bool

Equations

RSA.SumersEtAl2023.instBEqUtterance.beq { feature := a, value := a_1 } { feature := b, value := b_1 } = (a == b && a_1 == b_1)
RSA.SumersEtAl2023.instBEqUtterance.beq x✝¹ x✝ = false

Instances For

source

instance RSA.SumersEtAl2023.instBEqUtterance :

BEq Utterance

Equations

RSA.SumersEtAl2023.instBEqUtterance = { beq := RSA.SumersEtAl2023.instBEqUtterance.beq }

source

def RSA.SumersEtAl2023.allUtterances :

List Utterance

All possible utterances (30 = 6 features x 5 values)

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.SumersEtAl2023.utteranceTruth (u : Utterance) (w : WorldState) :

Bool

Truth of an utterance in a world state

Equations

RSA.SumersEtAl2023.utteranceTruth u w = (w.featureValue u.feature == u.value)

Instances For

source

structure RSA.SumersEtAl2023.Params :

Type

Model parameters for Sumers et al. speaker model

βS : ℚ
Speaker rationality (soft-max temperature)
βL : ℚ
Listener rationality
lam : ℚ
Tradeoff: 0 = pure truthfulness, 1 = pure relevance
costWeight : ℚ
Cost weight

Instances For

source

def RSA.SumersEtAl2023.instReprParams.repr :

Params → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

source

instance RSA.SumersEtAl2023.instReprParams :

Repr Params

Equations

RSA.SumersEtAl2023.instReprParams = { reprPrec := RSA.SumersEtAl2023.instReprParams.repr }

source

instance RSA.SumersEtAl2023.instBEqParams :

BEq Params

Equations

RSA.SumersEtAl2023.instBEqParams = { beq := RSA.SumersEtAl2023.instBEqParams.beq }

source

def RSA.SumersEtAl2023.instBEqParams.beq :

Params → Params → Bool

Equations

One or more equations did not get rendered due to their size.
RSA.SumersEtAl2023.instBEqParams.beq x✝¹ x✝ = false

Instances For

source

def RSA.SumersEtAl2023.defaultParams :

Params

Default parameters (matches Exp 1 Unbiased MLE)

Equations

RSA.SumersEtAl2023.defaultParams = { lam := 55 / 100 }

Instances For

source

def RSA.SumersEtAl2023.truthBiasedParams :

Params

Truth-biased parameters (Exp 1 MLE)

Equations

RSA.SumersEtAl2023.truthBiasedParams = { lam := 35 / 100 }

Instances For

source

def RSA.SumersEtAl2023.relevanceBiasedParams :

Params

Relevance-biased parameters (Exp 1 MLE)

Equations

RSA.SumersEtAl2023.relevanceBiasedParams = { lam := 85 / 100 }

Instances For

Speaker Utilities #

Three components:

Truthfulness (Eq. 5): epistemic preference for true utterances
Relevance (Eq. 8): decision-theoretic preference for action-improving utterances
Cost: production/processing effort

source

def RSA.SumersEtAl2023.truthfulnessUtility (u : Utterance) (w : WorldState) :

ℚ

Truthfulness utility (Eq. 5).

U_T(u|w) = +1 if [u] = true = -1 if [u] = false

Note: This is a soft constraint via betaS, not a hard filter.

Equations

RSA.SumersEtAl2023.truthfulnessUtility u w = if RSA.SumersEtAl2023.utteranceTruth u w = true then 1 else -1

Instances For

source

def RSA.SumersEtAl2023.utteranceCost (_u : Utterance) :

ℚ

Utterance cost.

Default: 0 for all utterances. Can be extended for valence bias (positive utterances preferred).

Equations

RSA.SumersEtAl2023.utteranceCost _u = 0

Instances For

source

def RSA.SumersEtAl2023.valenceCost (u : Utterance) (ν : ℚ := 1 / 4) :

ℚ

Valence-based cost (from Exp 1 residual analysis).

Negative-valued utterances have higher cost (require more processing).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.SumersEtAl2023.combinedUtility (lam uT uR costWeight cost : ℚ) :

ℚ

Combined utility (Eq. 9).

U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)

Convex combination of relevance and truthfulness, minus cost. Note: Relevance utility requires the full listener model, which depends on the removed RSA.Eval infrastructure. We define the combined utility in terms of the abstract combined function from CombinedUtility, with relevance as a parameter.

Equations

RSA.SumersEtAl2023.combinedUtility lam uT uR costWeight cost = RSA.CombinedUtility.combined lam uT uR (costWeight * cost)

Instances For

Experimental Domain: Mushroom Foraging #

The experiments use a mushroom foraging cover story:

Features: Green, Red, Blue (colors) and Spotted, Solid, Striped (textures)
Each mushroom has one color and one texture
Rewards are additive over features

source

def RSA.SumersEtAl2023.makeMushroom (color texture : Feature) (name : String := "mushroom") :

Action

Create a mushroom with one color and one texture

Equations

RSA.SumersEtAl2023.makeMushroom color texture name = { hasFeature := fun (f : RSA.SumersEtAl2023.Feature) => f == color || f == texture, name := name }

Instances For

source

def RSA.SumersEtAl2023.canonicalWorld :

WorldState

Canonical world state from the experiment.

Green = +2, Red = 0, Blue = -2 Spotted = +1, Solid = 0, Striped = -1

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.SumersEtAl2023.exampleContext :

Context

Example context from Figure 6B: three mushrooms

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.SumersEtAl2023.trueUtterance :

Utterance

True utterance in canonical world

Equations

RSA.SumersEtAl2023.trueUtterance = { feature := RSA.SumersEtAl2023.Feature.spotted, value := RSA.SumersEtAl2023.FeatureValue.pos1 }

Instances For

source

def RSA.SumersEtAl2023.falseRelevantUtterance :

Utterance

False but relevant utterance

Equations

RSA.SumersEtAl2023.falseRelevantUtterance = { feature := RSA.SumersEtAl2023.Feature.spotted, value := RSA.SumersEtAl2023.FeatureValue.pos2 }

Instances For

source

def RSA.SumersEtAl2023.trueIrrelevantUtterance :

Utterance

True but irrelevant utterance (feature not in context)

Equations

RSA.SumersEtAl2023.trueIrrelevantUtterance = { feature := RSA.SumersEtAl2023.Feature.green, value := RSA.SumersEtAl2023.FeatureValue.pos2 }

Instances For

Key Theoretical Results #

These connect to Comparisons/RelevanceTheories.lean for the deep theorems.

source

theorem RSA.SumersEtAl2023.combined_pure_truthfulness (uT uR : ℚ) :

combinedUtility 0 uT uR 0 0 = uT

Combined model reduces to truthfulness when lambda = 0.

U_C(u|w,A) = U_T(u|w) when lambda = 0. Delegates to CombinedUtility.combined_at_zero.

source

theorem RSA.SumersEtAl2023.combined_pure_relevance (uT uR : ℚ) :

combinedUtility 1 uT uR 0 0 = uR

Combined model reduces to relevance when lambda = 1.

U_C(u|w,A) = U_R(u|w,A) when lambda = 1. Delegates to CombinedUtility.combined_at_one.

source

theorem RSA.SumersEtAl2023.truthfulness_relevance_independent :

have w := { featureValue := fun (f : Feature) => match f with | Feature.green => FeatureValue.pos2 | Feature.red => FeatureValue.zero | Feature.blue => FeatureValue.neg2 | Feature.spotted => FeatureValue.pos1 | Feature.solid => FeatureValue.zero | Feature.striped => FeatureValue.neg1 }; have trueIrrel := { feature := Feature.green, value := FeatureValue.pos2 }; have falseRel := { feature := Feature.spotted, value := FeatureValue.pos2 }; truthfulnessUtility trueIrrel w = 1 ∧ truthfulnessUtility falseRel w = -1

Truthfulness and relevance are independent objectives.

In Lewis signaling games, they are perfectly correlated (knowing the world = knowing the best action). In signaling bandits, they can diverge:

True but irrelevant: "Green is +2" when no green actions in context
False but relevant: "Spots are +2" when spots are actually +1

Witness 1 (true but irrelevant): "Green is +2" -- true in the canonical world but no green mushrooms appear in the example context. Witness 2 (false but relevant): "Spots are +2" -- false (spots are +1) but would steer the listener toward the spotted mushroom (the best action).

Empirical Predictions from Experiments #

The paper reports MLE parameters and response patterns.

source

structure RSA.SumersEtAl2023.Exp1Results :

Type

Experiment 1: Free choice paradigm.

Participants chose from 30 utterances. MLE parameters:

Truth-biased: lambda = 0.35
Unbiased: lambda = 0.55
Relevance-biased: lambda = 0.85

truthBiased_lam : ℚ
unbiased_lam : ℚ
relevanceBiased_lam : ℚ

Instances For

source

def RSA.SumersEtAl2023.exp1Results :

Exp1Results

Equations

RSA.SumersEtAl2023.exp1Results = { }

Instances For

source

structure RSA.SumersEtAl2023.Exp2Results :

Type

Experiment 2: Forced choice (endorsement) paradigm.

Participants endorsed specific utterances. MLE parameters:

Truth-biased: lambda = 0.15
Unbiased: lambda = 0.75
Relevance-biased: lambda = 0.90

truthBiased_lam : ℚ
unbiased_lam : ℚ
relevanceBiased_lam : ℚ

Instances For

source

def RSA.SumersEtAl2023.exp2Results :

Exp2Results

Equations

RSA.SumersEtAl2023.exp2Results = { }

Instances For

source

theorem RSA.SumersEtAl2023.unbiased_participants_use_combined :

exp1Results.unbiased_lam > 0 ∧ exp1Results.unbiased_lam < 1

Unbiased participants jointly optimize truthfulness and relevance.

Neither lambda = 0 (pure truth) nor lambda = 1 (pure relevance) fits the data. Participants make a graded tradeoff.

source

theorem RSA.SumersEtAl2023.manipulation_affects_lambda :

exp1Results.truthBiased_lam < exp1Results.unbiased_lam ∧ exp1Results.unbiased_lam < exp1Results.relevanceBiased_lam

Manipulation affects lambda parameter ordering.

lambda_truth < lambda_unbiased < lambda_relevance

Connections to Other Frameworks #

Sumers et al. bridges several research traditions:

Standard RSA: Pure epistemic utility. Recovered when lambda = 0 and listener has identity decision problem.
Game-theoretic pragmatics (Benz, Parikh): Decision-theoretic relevance. Recovered when lambda = 1.
Relevance Theory (Sperber & Wilson): Relevance as primary. Empirically challenged: participants value truthfulness independently.
QUD models (Roberts): Question under discussion. QUDs can be derived from decision problems (Theorem 2).

See Comparisons/RelevanceTheories.lean for the formal connections:

Identity DP equiv epistemic utility (Theorem 1)
Any QUD is some DP (Theorem 2)
DT strictly more expressive than QUD (Theorem 3)

source

theorem RSA.SumersEtAl2023.standard_rsa_is_special_case (uT uR : ℚ) :

combinedUtility 0 uT uR 0 0 = uT

Standard RSA is a special case: when lambda = 0 and cost = 0, the combined utility equals truthfulness utility alone.

This recovers standard RSA's epistemic speaker, which soft-maximizes truthfulness (informativity). The identity-DP connection (Theorem 1 of Sumers et al.) is proved in combined_pure_truthfulness above.

source

theorem RSA.SumersEtAl2023.relevance_theory_challenged :

exp1Results.relevanceBiased_lam < 1

Relevance Theory predicts lambda = 1, which is empirically falsified

Summary #

Unified speaker model combining truthfulness and relevance:

U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)

Empirical findings:

Participants use both truthfulness and relevance (0 < lambda < 1)
Neither objective strictly dominates
The tradeoff is graded, not binary

Theoretical implications:

Decision-theoretic relevance grounds QUD-based relevance
Truthfulness is an independent constraint, not derived from relevance
The combined model explains loose talk and context-sensitivity

source

theorem RSA.SumersEtAl2023.sumers_uses_combined (lam uT uR costWeight cost : ℚ) :

combinedUtility lam uT uR costWeight cost = CombinedUtility.combined lam uT uR (costWeight * cost)

Sumers et al.'s combinedUtility is CombinedUtility.combined(lambda, U_T, U_R, cost).

This makes the shared combined theorems (combined_at_zero, combined_at_one, combined_convex, combined_mono_A/B) directly applicable.

source

def RSA.SumersEtAl2023.integratedModel :

String

The integrated model of truthfulness and relevance

Equations

RSA.SumersEtAl2023.integratedModel = "U_C = lambda*U_Relevance + (1-lambda)*U_Truthfulness - Cost"

Instances For