Documentation

Linglib.Theories.Pragmatics.RSA.Implementations.SumersEtAl2023

Signaling Bandits #

@cite{frank-goodman-2012} @cite{sumers-hawkins-2023}

Unlike Lewis signaling games where world state = correct action, signaling bandits separate abstract knowledge (feature values) from concrete decisions (which action to take).

Features that characterize actions (e.g., colors, textures)

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Feature values in the experimental range

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          All feature values

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            All features

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              World state: mapping from features to values.

              In the mushroom experiment, this defines how valuable each feature is. Example: {Green -> +2, Red -> 0, Blue -> -2, Spotted -> +1, Solid -> 0, Striped -> -1}

              Instances For
                Equations
                • One or more equations did not get rendered due to their size.

                Get the rational value of a feature in a world

                Equations
                Instances For

                  Action (mushroom) characterized by features it has

                  • hasFeature : FeatureBool

                    Which features this action has (e.g., a green spotted mushroom)

                  • name : String

                    Human-readable name

                  Instances For
                    Equations
                    • One or more equations did not get rendered due to their size.

                    Reward for taking an action in a world state.

                    R(a,w) = Sum_f [a has f] * w(f)

                    Linear combination of feature values for features the action has.

                    Equations
                    Instances For

                      Decision context: subset of available actions

                      Instances For

                        Utterance: claim about a feature's value.

                        Example: "Spots are +1" = {feature :=.spotted, value :=.pos1}

                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For
                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For
                              Equations
                              Instances For

                                All possible utterances (30 = 6 features x 5 values)

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  Truth of an utterance in a world state

                                  Equations
                                  Instances For

                                    Model parameters for Sumers et al. speaker model

                                    • βS :

                                      Speaker rationality (soft-max temperature)

                                    • βL :

                                      Listener rationality

                                    • lam :

                                      Tradeoff: 0 = pure truthfulness, 1 = pure relevance

                                    • costWeight :

                                      Cost weight

                                    Instances For
                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For
                                        Equations
                                        Instances For

                                          Default parameters (matches Exp 1 Unbiased MLE)

                                          Equations
                                          Instances For

                                            Truth-biased parameters (Exp 1 MLE)

                                            Equations
                                            Instances For

                                              Relevance-biased parameters (Exp 1 MLE)

                                              Equations
                                              Instances For

                                                Speaker Utilities #

                                                Three components:

                                                1. Truthfulness (Eq. 5): epistemic preference for true utterances
                                                2. Relevance (Eq. 8): decision-theoretic preference for action-improving utterances
                                                3. Cost: production/processing effort

                                                Truthfulness utility (Eq. 5).

                                                U_T(u|w) = +1 if [u] = true = -1 if [u] = false

                                                Note: This is a soft constraint via betaS, not a hard filter.

                                                Equations
                                                Instances For

                                                  Utterance cost.

                                                  Default: 0 for all utterances. Can be extended for valence bias (positive utterances preferred).

                                                  Equations
                                                  Instances For
                                                    def RSA.SumersEtAl2023.valenceCost (u : Utterance) (ν : := 1 / 4) :

                                                    Valence-based cost (from Exp 1 residual analysis).

                                                    Negative-valued utterances have higher cost (require more processing).

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For
                                                      def RSA.SumersEtAl2023.combinedUtility (lam uT uR costWeight cost : ) :

                                                      Combined utility (Eq. 9).

                                                      U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)

                                                      Convex combination of relevance and truthfulness, minus cost. Note: Relevance utility requires the full listener model, which depends on the removed RSA.Eval infrastructure. We define the combined utility in terms of the abstract combined function from CombinedUtility, with relevance as a parameter.

                                                      Equations
                                                      Instances For

                                                        Experimental Domain: Mushroom Foraging #

                                                        The experiments use a mushroom foraging cover story:

                                                        def RSA.SumersEtAl2023.makeMushroom (color texture : Feature) (name : String := "mushroom") :

                                                        Create a mushroom with one color and one texture

                                                        Equations
                                                        Instances For

                                                          Canonical world state from the experiment.

                                                          Green = +2, Red = 0, Blue = -2 Spotted = +1, Solid = 0, Striped = -1

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            Example context from Figure 6B: three mushrooms

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              Key Theoretical Results #

                                                              These connect to Comparisons/RelevanceTheories.lean for the deep theorems.

                                                              Combined model reduces to truthfulness when lambda = 0.

                                                              U_C(u|w,A) = U_T(u|w) when lambda = 0. Delegates to CombinedUtility.combined_at_zero.

                                                              Combined model reduces to relevance when lambda = 1.

                                                              U_C(u|w,A) = U_R(u|w,A) when lambda = 1. Delegates to CombinedUtility.combined_at_one.

                                                              Truthfulness and relevance are independent objectives.

                                                              In Lewis signaling games, they are perfectly correlated (knowing the world = knowing the best action). In signaling bandits, they can diverge:

                                                              • True but irrelevant: "Green is +2" when no green actions in context
                                                              • False but relevant: "Spots are +2" when spots are actually +1

                                                              Witness 1 (true but irrelevant): "Green is +2" -- true in the canonical world but no green mushrooms appear in the example context. Witness 2 (false but relevant): "Spots are +2" -- false (spots are +1) but would steer the listener toward the spotted mushroom (the best action).

                                                              Empirical Predictions from Experiments #

                                                              The paper reports MLE parameters and response patterns.

                                                              Experiment 1: Free choice paradigm.

                                                              Participants chose from 30 utterances. MLE parameters:

                                                              • Truth-biased: lambda = 0.35
                                                              • Unbiased: lambda = 0.55
                                                              • Relevance-biased: lambda = 0.85
                                                              • truthBiased_lam :
                                                              • unbiased_lam :
                                                              • relevanceBiased_lam :
                                                              Instances For

                                                                Experiment 2: Forced choice (endorsement) paradigm.

                                                                Participants endorsed specific utterances. MLE parameters:

                                                                • Truth-biased: lambda = 0.15
                                                                • Unbiased: lambda = 0.75
                                                                • Relevance-biased: lambda = 0.90
                                                                • truthBiased_lam :
                                                                • unbiased_lam :
                                                                • relevanceBiased_lam :
                                                                Instances For

                                                                  Unbiased participants jointly optimize truthfulness and relevance.

                                                                  Neither lambda = 0 (pure truth) nor lambda = 1 (pure relevance) fits the data. Participants make a graded tradeoff.

                                                                  Manipulation affects lambda parameter ordering.

                                                                  lambda_truth < lambda_unbiased < lambda_relevance

                                                                  Connections to Other Frameworks #

                                                                  Sumers et al. bridges several research traditions:

                                                                  1. Standard RSA: Pure epistemic utility. Recovered when lambda = 0 and listener has identity decision problem.

                                                                  2. Game-theoretic pragmatics (Benz, Parikh): Decision-theoretic relevance. Recovered when lambda = 1.

                                                                  3. Relevance Theory (Sperber & Wilson): Relevance as primary. Empirically challenged: participants value truthfulness independently.

                                                                  4. QUD models (Roberts): Question under discussion. QUDs can be derived from decision problems (Theorem 2).

                                                                  See Comparisons/RelevanceTheories.lean for the formal connections:

                                                                  Standard RSA is a special case: when lambda = 0 and cost = 0, the combined utility equals truthfulness utility alone.

                                                                  This recovers standard RSA's epistemic speaker, which soft-maximizes truthfulness (informativity). The identity-DP connection (Theorem 1 of Sumers et al.) is proved in combined_pure_truthfulness above.

                                                                  Relevance Theory predicts lambda = 1, which is empirically falsified

                                                                  Summary #

                                                                  Unified speaker model combining truthfulness and relevance:

                                                                  U_C(u|w,A) = lambda*U_R(u|w,A) + (1-lambda)*U_T(u|w) - C(u)

                                                                  Empirical findings:

                                                                  1. Participants use both truthfulness and relevance (0 < lambda < 1)
                                                                  2. Neither objective strictly dominates
                                                                  3. The tradeoff is graded, not binary

                                                                  Theoretical implications:

                                                                  theorem RSA.SumersEtAl2023.sumers_uses_combined (lam uT uR costWeight cost : ) :
                                                                  combinedUtility lam uT uR costWeight cost = CombinedUtility.combined lam uT uR (costWeight * cost)

                                                                  Sumers et al.'s combinedUtility is CombinedUtility.combined(lambda, U_T, U_R, cost).

                                                                  This makes the shared combined theorems (combined_at_zero, combined_at_one, combined_convex, combined_mono_A/B) directly applicable.

                                                                  The integrated model of truthfulness and relevance

                                                                  Equations
                                                                  Instances For