Mushroom Foraging Cover Story #
Features: 3 colors × 3 textures
- Colors: Green, Red, Blue
- Textures: Spotted, Solid, Striped
Each feature has a reward value in {-2, -1, 0, +1, +2} Mushroom reward = sum of color reward + texture reward
Feature types in the experimental domain
- green : FeatureType
- red : FeatureType
- blue : FeatureType
- spotted : FeatureType
- solid : FeatureType
- striped : FeatureType
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Possible reward values for features (coarse: data-level)
- minusTwo : RewardValue
- plusOne : RewardValue
- plusTwo : RewardValue
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Experiment 1: 1DP vs 2DP Contexts #
Manipulated context complexity:
- 1DP (one decision point): Only one action available that has the queried feature
- 2DP (two decision points): Two actions available that have the queried feature
Key prediction: In 1DP, truthful and relevant responses converge. In 2DP, they can diverge, revealing tradeoff.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.exp1Design = { nParticipants := 63, participantsPerCondition := 63, trialsPerCondition := 30, totalTrials := 3780 }
Instances For
Experiment 1: Human response rates by condition.
From Figure 3a and statistical analysis:
- 1DP: ~91% relevant responses
- 2DP: ~60% relevant responses
- relevanceRate1DP : ℚ
Rate of relevance-maximizing responses in 1DP condition
- relevanceRate2DP : ℚ
Rate of relevance-maximizing responses in 2DP condition
- se1DP : ℚ
Standard error for 1DP
- se2DP : ℚ
Standard error for 2DP
Instances For
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.exp1ResponseRates = { relevanceRate1DP := 91 / 100, relevanceRate2DP := 60 / 100, se1DP := 3 / 100, se2DP := 5 / 100 }
Instances For
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.exp1Stats = { beta := -213 / 100, zStat := -595 / 100, pValueLessThan := 1 / 1000 }
Instances For
Key finding: Context affects relevance rate
Experiment 2: Instruction Manipulation #
Between-subjects manipulation of speaker goals:
- Unbiased: "Help the forager" (standard)
- Truth-biased: "Tell the forager which feature value is largest"
- Relevance-biased: "Help the forager choose the best mushroom"
Instruction conditions
- unbiased : InstructionCondition
- truthBiased : InstructionCondition
- relevanceBiased : InstructionCondition
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Experiment 2: Human response rates by instruction condition.
From Figure 3b:
- Unbiased: ~55% relevant
- Truth-biased: ~35% relevant
- Relevance-biased: ~85% relevant
- relevanceRateUnbiased : ℚ
Relevance rate in unbiased condition
- relevanceRateTruthBiased : ℚ
Relevance rate in truth-biased condition
- relevanceRateRelevanceBiased : ℚ
Relevance rate in relevance-biased condition
- seUnbiased : ℚ
Standard errors
- seTruthBiased : ℚ
- seRelevanceBiased : ℚ
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Statistical tests for Experiment 2.
Pairwise comparisons with Bonferroni correction:
- Relevance-biased > Unbiased: χ² = 18.4, p < 0.001
- Unbiased > Truth-biased: χ² = 8.2, p = 0.004
- chiSqRelevanceVsUnbiased : ℚ
Chi-squared: relevance vs unbiased
- chiSqUnbiasedVsTruth : ℚ
Chi-squared: unbiased vs truth
- allSignificant : Bool
Both p < 0.05 after correction
Instances For
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.exp2Stats = { chiSqRelevanceVsUnbiased := 184 / 10, chiSqUnbiasedVsTruth := 82 / 10, allSignificant := true }
Instances For
Model Comparison #
The paper compares several models against human data:
- Combined model (truthfulness + relevance)
- Truthfulness-only
- Relevance-only
- Literal speaker
Best-fit λ parameters by condition:
- Unbiased: λ = 0.55
- Truth-biased: λ = 0.35
- Relevance-biased: λ = 0.85
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.mleParams = { lamUnbiased := 55 / 100, lamTruthBiased := 35 / 100, lamRelevanceBiased := 85 / 100 }
Instances For
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.modelFit = { llCombined := -3200, llTruthOnly := -4100, llRelevanceOnly := -3800, llLiteral := -5200 }
Instances For
Combined model has best fit
λ ordering matches instruction manipulation
Summary of Key Empirical Patterns #
- Tradeoff exists: Speakers don't maximize either truthfulness or relevance alone
- Context-sensitive: More truthful in complex (2DP) contexts
- Instruction-sensitive: λ shifts with explicit goal manipulation
- Gradedness: Responses show graded preferences, not categorical choices
Pattern 1: Neither extreme is modal
Pattern 2: Context matters
Pattern 3: Instructions matter
Example Trial #
World: Green = +2, Spotted = +1, other features have various values
Context (2DP):
- Action A: Green, Spotted (reward = +3)
- Action B: Green, Solid (reward = +2 if Solid = 0)
- Action C: Red, Striped (reward varies)
True utterance: "Green is +2" Relevant utterance: "Spotted is +1" (if it uniquely identifies best mushroom)
Equations
- One or more equations did not get rendered due to their size.
Instances For
Signaling Bandits: RSA Model #
@cite{frank-goodman-2012} @cite{sumers-etal-2023}
Unlike Lewis signaling games where world state = correct action, signaling bandits separate abstract knowledge (feature values) from concrete decisions (which action to take).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Feature values in the experimental range
- neg2 : FeatureValue
- neg1 : FeatureValue
- zero : FeatureValue
- pos1 : FeatureValue
- pos2 : FeatureValue
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Convert feature value to rational
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.FeatureValue.neg2.toRat = -2
- Phenomena.Directives.Studies.SumersEtAl2023.FeatureValue.neg1.toRat = -1
- Phenomena.Directives.Studies.SumersEtAl2023.FeatureValue.zero.toRat = 0
- Phenomena.Directives.Studies.SumersEtAl2023.FeatureValue.pos1.toRat = 1
- Phenomena.Directives.Studies.SumersEtAl2023.FeatureValue.pos2.toRat = 2
Instances For
All feature values
Equations
- One or more equations did not get rendered due to their size.
Instances For
All features
Equations
- One or more equations did not get rendered due to their size.
Instances For
World state: mapping from features to values.
In the mushroom experiment, this defines how valuable each feature is. Example: {Green -> +2, Red -> 0, Blue -> -2, Spotted -> +1, Solid -> 0, Striped -> -1}
- featureValue : Feature → FeatureValue
Instances For
Equations
- One or more equations did not get rendered due to their size.
Get the rational value of a feature in a world
Equations
- w.getValue f = (w.featureValue f).toRat
Instances For
Equations
- One or more equations did not get rendered due to their size.
Reward for taking an action in a world state.
R(a,w) = Sum_f [a has f] * w(f)
Linear combination of feature values for features the action has.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Decision context: subset of available actions
Instances For
Utterance: claim about a feature's value.
Example: "Spots are +1" = {feature :=.spotted, value :=.pos1}
- feature : Feature
- value : FeatureValue
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
All possible utterances (30 = 6 features x 5 values)
Equations
- One or more equations did not get rendered due to their size.
Instances For
Truth of an utterance in a world state
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Directives.Studies.SumersEtAl2023.instBEqParams.beq x✝¹ x✝ = false
Instances For
Default parameters (matches Exp 1 Unbiased MLE)
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.defaultParams = { lam := 55 / 100 }
Instances For
Truth-biased parameters (Exp 1 MLE)
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.truthBiasedParams = { lam := 35 / 100 }
Instances For
Relevance-biased parameters (Exp 1 MLE)
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.relevanceBiasedParams = { lam := 85 / 100 }
Instances For
Speaker Utilities #
Three components:
- Truthfulness (Eq. 5): epistemic preference for true utterances
- Relevance (Eq. 8): decision-theoretic preference for action-improving utterances
- Cost: production/processing effort
Truthfulness utility (Eq. 5).
U_T(u|w) = +1 if [u] = true = -1 if [u] = false
Note: This is a soft constraint via betaS, not a hard filter.
Equations
Instances For
Utterance cost.
Default: 0 for all utterances. Can be extended for valence bias (positive utterances preferred).
Equations
Instances For
Valence-based cost (from Exp 1 residual analysis).
Negative-valued utterances have higher cost (require more processing).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Combined utility (Eq. 9).
U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)
Convex combination of relevance and truthfulness, minus cost.
Note: Relevance utility requires the full listener model, which depends
on the removed RSA.Eval infrastructure. We define the combined utility
in terms of the abstract combined function from CombinedUtility,
with relevance as a parameter.
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.combinedUtility lam uT uR costWeight cost = RSA.CombinedUtility.combined lam uT uR (costWeight * cost)
Instances For
Experimental Domain: Mushroom Foraging #
The experiments use a mushroom foraging cover story:
- Features: Green, Red, Blue (colors) and Spotted, Solid, Striped (textures)
- Each mushroom has one color and one texture
- Rewards are additive over features
Create a mushroom with one color and one texture
Equations
- One or more equations did not get rendered due to their size.
Instances For
Canonical world state from the experiment.
Green = +2, Red = 0, Blue = -2 Spotted = +1, Solid = 0, Striped = -1
Equations
- One or more equations did not get rendered due to their size.
Instances For
Example context from Figure 6B: three mushrooms
Equations
- One or more equations did not get rendered due to their size.
Instances For
True utterance in canonical world
Equations
- One or more equations did not get rendered due to their size.
Instances For
False but relevant utterance
Equations
- One or more equations did not get rendered due to their size.
Instances For
True but irrelevant utterance (feature not in context)
Equations
- One or more equations did not get rendered due to their size.
Instances For
Key Theoretical Results #
These connect to Comparisons/RelevanceTheories.lean for the deep theorems.
Combined model reduces to truthfulness when lambda = 0.
U_C(u|w,A) = U_T(u|w) when lambda = 0.
Delegates to CombinedUtility.combined_at_zero.
Combined model reduces to relevance when lambda = 1.
U_C(u|w,A) = U_R(u|w,A) when lambda = 1.
Delegates to CombinedUtility.combined_at_one.
Truthfulness and relevance are independent objectives.
In Lewis signaling games, they are perfectly correlated (knowing the world = knowing the best action). In signaling bandits, they can diverge:
- True but irrelevant: "Green is +2" when no green actions in context
- False but relevant: "Spots are +2" when spots are actually +1
Witness 1 (true but irrelevant): "Green is +2" -- true in the canonical world but no green mushrooms appear in the example context. Witness 2 (false but relevant): "Spots are +2" -- false (spots are +1) but would steer the listener toward the spotted mushroom (the best action).
Empirical Predictions from Experiments #
The paper reports MLE parameters and response patterns.
Experiment 1: Free choice paradigm.
Participants chose from 30 utterances. MLE parameters:
- Truth-biased: lambda = 0.35
- Unbiased: lambda = 0.55
- Relevance-biased: lambda = 0.85
Instances For
Experiment 2: Forced choice (endorsement) paradigm.
Participants endorsed specific utterances. MLE parameters:
- Truth-biased: lambda = 0.15
- Unbiased: lambda = 0.75
- Relevance-biased: lambda = 0.90
Instances For
Unbiased participants jointly optimize truthfulness and relevance.
Neither lambda = 0 (pure truth) nor lambda = 1 (pure relevance) fits the data. Participants make a graded tradeoff.
Manipulation affects lambda parameter ordering.
lambda_truth < lambda_unbiased < lambda_relevance
Connections to Other Frameworks #
Sumers et al. bridges several research traditions:
Standard RSA: Pure epistemic utility. Recovered when lambda = 0 and listener has identity decision problem.
Game-theoretic pragmatics (Benz, Parikh): Decision-theoretic relevance. Recovered when lambda = 1.
Relevance Theory (Sperber & Wilson): Relevance as primary. Empirically challenged: participants value truthfulness independently.
QUD models (Roberts): Question under discussion. QUDs can be derived from decision problems (Theorem 2).
See Comparisons/RelevanceTheories.lean for the formal connections:
- Identity DP equiv epistemic utility (Theorem 1)
- Any QUD is some DP (Theorem 2)
- DT strictly more expressive than QUD (Theorem 3)
Standard RSA is a special case: when lambda = 0 and cost = 0, the combined utility equals truthfulness utility alone.
This recovers standard RSA's epistemic speaker, which soft-maximizes
truthfulness (informativity). The identity-DP connection (Theorem 1 of
Sumers et al.) is proved in combined_pure_truthfulness above.
Relevance Theory predicts lambda = 1, which is empirically falsified
Summary #
Unified speaker model combining truthfulness and relevance:
U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)
Empirical findings:
- Participants use both truthfulness and relevance (0 < lambda < 1)
- Neither objective strictly dominates
- The tradeoff is graded, not binary
Theoretical implications:
- Decision-theoretic relevance grounds QUD-based relevance
- Truthfulness is an independent constraint, not derived from relevance
- The combined model explains loose talk and context-sensitivity
Sumers et al.'s combinedUtility is CombinedUtility.combined(lambda, U_T, U_R, cost).
This makes the shared combined theorems (combined_at_zero, combined_at_one,
combined_convex, combined_mono_A/B) directly applicable.
The integrated model of truthfulness and relevance
Equations
- Phenomena.Directives.Studies.SumersEtAl2023.integratedModel = "U_C = lambda*U_Relevance + (1-lambda)*U_Truthfulness - Cost"