Signaling Bandits #
@cite{frank-goodman-2012} @cite{sumers-hawkins-2023}
Unlike Lewis signaling games where world state = correct action, signaling bandits separate abstract knowledge (feature values) from concrete decisions (which action to take).
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- RSA.SumersEtAl2023.instBEqFeature.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
Feature values in the experimental range
- neg2 : FeatureValue
- neg1 : FeatureValue
- zero : FeatureValue
- pos1 : FeatureValue
- pos2 : FeatureValue
Instances For
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- RSA.SumersEtAl2023.instBEqFeatureValue.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Convert feature value to rational
Equations
Instances For
All feature values
Equations
- One or more equations did not get rendered due to their size.
Instances For
All features
Equations
- One or more equations did not get rendered due to their size.
Instances For
World state: mapping from features to values.
In the mushroom experiment, this defines how valuable each feature is. Example: {Green -> +2, Red -> 0, Blue -> -2, Spotted -> +1, Solid -> 0, Striped -> -1}
- featureValue : Feature → FeatureValue
Instances For
Equations
- One or more equations did not get rendered due to their size.
Get the rational value of a feature in a world
Equations
- w.getValue f = (w.featureValue f).toRat
Instances For
Equations
- One or more equations did not get rendered due to their size.
Reward for taking an action in a world state.
R(a,w) = Sum_f [a has f] * w(f)
Linear combination of feature values for features the action has.
Equations
- RSA.SumersEtAl2023.reward a w = List.foldl (fun (acc : ℚ) (f : RSA.SumersEtAl2023.Feature) => if a.hasFeature f = true then acc + w.getValue f else acc) 0 RSA.SumersEtAl2023.allFeatures
Instances For
Decision context: subset of available actions
Instances For
Utterance: claim about a feature's value.
Example: "Spots are +1" = {feature :=.spotted, value :=.pos1}
- feature : Feature
- value : FeatureValue
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
All possible utterances (30 = 6 features x 5 values)
Equations
- One or more equations did not get rendered due to their size.
Instances For
Truth of an utterance in a world state
Equations
- RSA.SumersEtAl2023.utteranceTruth u w = (w.featureValue u.feature == u.value)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Equations
Equations
- One or more equations did not get rendered due to their size.
- RSA.SumersEtAl2023.instBEqParams.beq x✝¹ x✝ = false
Instances For
Default parameters (matches Exp 1 Unbiased MLE)
Equations
- RSA.SumersEtAl2023.defaultParams = { lam := 55 / 100 }
Instances For
Truth-biased parameters (Exp 1 MLE)
Equations
- RSA.SumersEtAl2023.truthBiasedParams = { lam := 35 / 100 }
Instances For
Relevance-biased parameters (Exp 1 MLE)
Equations
- RSA.SumersEtAl2023.relevanceBiasedParams = { lam := 85 / 100 }
Instances For
Speaker Utilities #
Three components:
- Truthfulness (Eq. 5): epistemic preference for true utterances
- Relevance (Eq. 8): decision-theoretic preference for action-improving utterances
- Cost: production/processing effort
Utterance cost.
Default: 0 for all utterances. Can be extended for valence bias (positive utterances preferred).
Equations
Instances For
Valence-based cost (from Exp 1 residual analysis).
Negative-valued utterances have higher cost (require more processing).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Combined utility (Eq. 9).
U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)
Convex combination of relevance and truthfulness, minus cost.
Note: Relevance utility requires the full listener model, which depends
on the removed RSA.Eval infrastructure. We define the combined utility
in terms of the abstract combined function from CombinedUtility,
with relevance as a parameter.
Equations
- RSA.SumersEtAl2023.combinedUtility lam uT uR costWeight cost = RSA.CombinedUtility.combined lam uT uR (costWeight * cost)
Instances For
Experimental Domain: Mushroom Foraging #
The experiments use a mushroom foraging cover story:
- Features: Green, Red, Blue (colors) and Spotted, Solid, Striped (textures)
- Each mushroom has one color and one texture
- Rewards are additive over features
Create a mushroom with one color and one texture
Equations
- RSA.SumersEtAl2023.makeMushroom color texture name = { hasFeature := fun (f : RSA.SumersEtAl2023.Feature) => f == color || f == texture, name := name }
Instances For
Canonical world state from the experiment.
Green = +2, Red = 0, Blue = -2 Spotted = +1, Solid = 0, Striped = -1
Equations
- One or more equations did not get rendered due to their size.
Instances For
Example context from Figure 6B: three mushrooms
Equations
- One or more equations did not get rendered due to their size.
Instances For
True utterance in canonical world
Equations
Instances For
False but relevant utterance
Equations
Instances For
True but irrelevant utterance (feature not in context)
Equations
Instances For
Key Theoretical Results #
These connect to Comparisons/RelevanceTheories.lean for the deep theorems.
Combined model reduces to truthfulness when lambda = 0.
U_C(u|w,A) = U_T(u|w) when lambda = 0.
Delegates to CombinedUtility.combined_at_zero.
Combined model reduces to relevance when lambda = 1.
U_C(u|w,A) = U_R(u|w,A) when lambda = 1.
Delegates to CombinedUtility.combined_at_one.
Truthfulness and relevance are independent objectives.
In Lewis signaling games, they are perfectly correlated (knowing the world = knowing the best action). In signaling bandits, they can diverge:
- True but irrelevant: "Green is +2" when no green actions in context
- False but relevant: "Spots are +2" when spots are actually +1
Witness 1 (true but irrelevant): "Green is +2" -- true in the canonical world but no green mushrooms appear in the example context. Witness 2 (false but relevant): "Spots are +2" -- false (spots are +1) but would steer the listener toward the spotted mushroom (the best action).
Empirical Predictions from Experiments #
The paper reports MLE parameters and response patterns.
Equations
Instances For
Experiment 2: Forced choice (endorsement) paradigm.
Participants endorsed specific utterances. MLE parameters:
- Truth-biased: lambda = 0.15
- Unbiased: lambda = 0.75
- Relevance-biased: lambda = 0.90
Instances For
Equations
Instances For
Unbiased participants jointly optimize truthfulness and relevance.
Neither lambda = 0 (pure truth) nor lambda = 1 (pure relevance) fits the data. Participants make a graded tradeoff.
Manipulation affects lambda parameter ordering.
lambda_truth < lambda_unbiased < lambda_relevance
Connections to Other Frameworks #
Sumers et al. bridges several research traditions:
Standard RSA: Pure epistemic utility. Recovered when lambda = 0 and listener has identity decision problem.
Game-theoretic pragmatics (Benz, Parikh): Decision-theoretic relevance. Recovered when lambda = 1.
Relevance Theory (Sperber & Wilson): Relevance as primary. Empirically challenged: participants value truthfulness independently.
QUD models (Roberts): Question under discussion. QUDs can be derived from decision problems (Theorem 2).
See Comparisons/RelevanceTheories.lean for the formal connections:
- Identity DP equiv epistemic utility (Theorem 1)
- Any QUD is some DP (Theorem 2)
- DT strictly more expressive than QUD (Theorem 3)
Standard RSA is a special case: when lambda = 0 and cost = 0, the combined utility equals truthfulness utility alone.
This recovers standard RSA's epistemic speaker, which soft-maximizes
truthfulness (informativity). The identity-DP connection (Theorem 1 of
Sumers et al.) is proved in combined_pure_truthfulness above.
Relevance Theory predicts lambda = 1, which is empirically falsified
Summary #
Unified speaker model combining truthfulness and relevance:
U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)
Empirical findings:
- Participants use both truthfulness and relevance (0 < lambda < 1)
- Neither objective strictly dominates
- The tradeoff is graded, not binary
Theoretical implications:
- Decision-theoretic relevance grounds QUD-based relevance
- Truthfulness is an independent constraint, not derived from relevance
- The combined model explains loose talk and context-sensitivity
Sumers et al.'s combinedUtility is CombinedUtility.combined(lambda, U_T, U_R, cost).
This makes the shared combined theorems (combined_at_zero, combined_at_one,
combined_convex, combined_mono_A/B) directly applicable.
The integrated model of truthfulness and relevance
Equations
- RSA.SumersEtAl2023.integratedModel = "U_C = lambda*U_Relevance + (1-lambda)*U_Truthfulness - Cost"