Documentation

Linglib.Theories.Pragmatics.RSA.Core.CompositionalRSA

structure RSA.Compositional.LocalAltNode (World Alt : Type) :

A node in a compositional derivation with local alternatives.

  • meaning : AltWorldBool

    Meaning of each alternative at each world

  • prejacent : Alt

    The prejacent (what was actually uttered)

  • alternatives : List Alt

    The available alternatives at this node

Instances For
    def RSA.Compositional.LocalAltNode.toAltSet {World Alt : Type} (node : LocalAltNode World Alt) :
    Set (Prop' World)

    Convert local alternatives to a set for exhMW

    Equations
    Instances For
      def RSA.Compositional.LocalAltNode.prejacent_prop {World Alt : Type} (node : LocalAltNode World Alt) :
      Prop' World

      The prejacent proposition

      Equations
      Instances For
        def RSA.Compositional.localExhMW {World Alt : Type} (node : LocalAltNode World Alt) :
        Prop' World

        Local exhMW: exhaustification using only local alternatives.

        Equations
        Instances For

          LU-RSA: What It Can and Cannot Do #

          @cite{potts-etal-2016} @cite{bergen-levy-goodman-2016}

          The LU Approach #

          LU-RSA models uncertainty over the lexicon:

          The listener marginalizes over lexica: L(w | m) ∝ P(w) × Σ_L P(L) × S₁(m | w, L)

          What LU Can Do #

          For simple cases like "Exactly one player hit some of his shots":

          LU does capture simple embedded implicatures.

          What LU Cannot Do: The Global Lexicon Problem #

          LU applies one lexicon to all occurrences.

          Consider: "Every student read some book and some paper"

          With LU:

          LU cannot express: "every student read some-but-not-all books AND at-least-some papers" (independent exhaustification).

          Scope vs Lexicon #

          For "Every student read some book":

          EXH analysis (scope-sensitive):

          LU analysis (scope-blind):

          For single scalar items, these are observationally equivalent. For multiple scalar items or nested structures, they diverge.

          ApproachWhat variesMultiple scalars?Scope interactions?
          Standard RSANothingNoNo
          LU-RSALexicon globallySame for allNo
          Compositional RSAWhere EXH appliesIndependentYes

          LU treats implicature as a lexical property (of words), while EXH treats it as a structural property (of positions).

          Per-student world

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Lexical refinements for "some" (LU approach)

              Instances For
                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  The Single-Scalar Equivalence (Potts et al.'s Success Case) #

                  For "every student read some book" with a single scalar item, LU with strong lexicon gives the same result as local EXH.

                  LU would give:

                  This matches @cite{potts-levy-2015}, who show strong empirical fits.

                  Where LU Fails: The Principled Derivation Problem #

                  1. No principled lexicon selection: With flat priors, why prefer strong? Potts et al. rely on neo-Gricean constraints on refinement sets.

                  2. Global lexicon: All occurrences of "some" get the same meaning. EXH can apply at different positions independently.

                  3. No scope interaction: LU cannot model EXH interacting with quantifier scope, negation scope, etc.

                  The theorem lu_strong_equals_local below shows the equivalence for single-scalar cases. The limitation emerges with multiple scalars.

                  LU-RSA scenario for "Every student read some book"

                  Instances For

                    LU with weak lexicon allows all worlds (like literal L0)

                    LU with strong lexicon gives local-EXH-like result

                    The Single-Scalar Success #

                    For single-scalar cases, LU achieves the same result as local EXH:

                    The Compositional Question: Where Does Marginalization Happen? #

                    LU approach (Potts et al.):

                    Compositional RSA/EXH approach:

                    The question: Does top-level marginalization over lexica give the same results as node-by-node pragmatic reasoning?

                    How Multiple Uncertainties Compose #

                    In LU, when you have multiple scalar items, each can be refined. The space of lexica is the product of refinement choices.

                    A single lexicon L is used for the whole utterance. The listener reasons about which L the speaker is using globally.

                    The structural question is whether pragmatic reasoning should happen:

                    1. Globally: marginalize over complete lexica at the end (LU)
                    2. Locally: apply RSA at each compositional node, then compose

                    For simple cases these may coincide. The divergence appears when:

                    The architectural difference:

                    • LU: P(L) is a distribution over complete lexica
                    • Compositional: each node has its own local alternatives

                    This affects how multiple sources of uncertainty interact.

                    • lexicon : SomeLexicon

                      A lexicon assigns meanings to all scalar items

                    • globalMarg : Bool

                      Marginalization happens at the top

                    Instances For
                      • localExh : List Bool

                        Each node can have its own EXH choice

                      • localRSA : Bool

                        RSA reasoning at each node

                      Instances For

                        The Structural Difference #

                        What LU gets right:

                        The architectural question:

                        For the cases Potts et al. test (single scalar under quantifier), these approaches are empirically equivalent. The question is whether they diverge for more complex compositional structures.

                        Open question (cf. @cite{franke-bergen-2020} "Global Intentions"): Does the locus of pragmatic reasoning (global vs. local) matter for predicting human behavior in complex embedded contexts?

                        Compositional RSA #

                        EXH is a structural operator, not a lexical one. Modeling where in the derivation EXH applies requires a compositional approach:

                        1. Build a derivation tree with local alternatives at each node
                        2. Apply RSA/EXH at each node with its local alternatives
                        3. Compose the results via standard semantics

                        For "Every student read some book" #

                        Step 1: At the "some" node (per student)

                        Step 2: Compose with "every"

                        Why This Works #

                        The @cite{franke-2011} limit theorem is parametric in the alternative set: IBR(alternatives) → exhMW(alternatives)

                        Instantiating with local alternatives at each node yields local exhMW at each node. Composition gives the local reading.

                        Inner alternatives: some vs all

                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Local node for a single student's reading behavior

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              LU vs Compositional RSA #

                              LU-RSA (Potts, Lassiter, Levy, Frank):

                              Compositional RSA:

                              The Expressivity Hierarchy #

                              Standard RSA ⊂ LU-RSA ≈ Standard RSA ⊂ Compositional RSA ≈ EXH
                                   ↓ ↓ ↓
                               scope-blind scope-blind scope-sensitive
                                   ↓ ↓ ↓
                               global only global only global AND local
                              

                              LU-RSA and standard RSA have the same structural expressivity. LU adds uncertainty over word meanings, not scope.

                              LU-RSA is scope-blind: it cannot distinguish global from local based on structural scope position.

                              Proof: LU's two lexicons (weak, strong) correspond to (no SI, local-like SI), but the choice is lexical not structural. There is no principled way to derive the local reading.

                              The expressivity gain: Compositional RSA can express local readings that LU-RSA cannot principally derive.

                              Why Standard RSA Cannot Derive Local Readings #

                              No sentence-level alternative can distinguish SS from SA using global worlds. This limitation motivates either:

                              1. Stipulating EXH as a grammatical primitive (the standard view)
                              2. Developing "local RSA" that operates inside composition

                              The Setup #

                              For "every student read some book":

                              The Problem #

                              RSA excludes a world w when hearing utterance u if: ∃ u' ∈ Alt(u). ⟦u'⟧(w) ∧ informativity(u') > informativity(u)

                              To exclude SA when hearing "every...some", we'd need an alternative u' where: ⟦u'⟧(SA) = true AND ⟦u'⟧(SS) = false (or u' more informative)

                              No such sentence exists in the nested Aristotelians.

                              Consequences #

                              If RSA cannot distinguish SS from SA at the sentence level, then either:

                              1. EXH is a separate grammatical mechanism (stipulation)
                              2. RSA must operate sub-sententially with "local" alternatives

                              Option 2 is "RSA all the way down" -- deriving EXH-like behavior from RSA applied at each compositional node.

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                Key observation: SS and SA are indistinguishable by sentence-level meaning.

                                For every nested Aristotelian sentence, if it is true at SS, it is also true at SA (or vice versa in a way that does not help RSA exclude SA).

                                SA "dominates" SS: if some students did X in SS, then some students did X in SA (with potentially more).

                                Sentences true at SA but not SS don't help RSA exclude SA.

                                "some...all" is true at SA and false at SS. But this does not help: RSA excludes worlds where stronger alternatives are true. "some...all" is weaker than "some...some" for the inner quantifier, so it does not trigger exclusion.

                                The only sentence that could help exclude SA would be one that:

                                1. Is true at SS
                                2. Is false at SA
                                3. Is stronger than "some...some"

                                No such sentence exists.

                                The RSA exclusion principle: RSA excludes world w when hearing u if there exists a stronger alternative u' true at w.

                                For "some...some" (SS sentence), the only stronger alternative is "all...all". "all...all" is false at both SS and SA. Therefore, RSA cannot exclude SA when hearing "some...some".

                                The core theorem: Standard RSA with sentence-level alternatives cannot derive the local reading {SS}.

                                Proof: RSA's posterior after hearing "some...some" includes all worlds where "some...some" is true and no strictly stronger alternative is true.

                                • "some...some" is true at: SS, SA, AS, AA (literally: everyone drank some)
                                • "all...all" is the strongest alternative, true only at AA
                                • RSA excludes AA (stronger alternative was available)
                                • RSA keeps: SS, SA, AS -- this is the global reading, not local

                                To get the local reading {SS}, we would need to exclude SA and AS. No sentence-level alternative can do this.

                                The Expressivity Gap #

                                Standard RSA posterior for "some...some":
                                  {w : ⟦some...some⟧(w) ∧ ¬⟦all...all⟧(w)} = {SS, SA, AS}
                                
                                Local EXH reading:
                                  {SS}
                                
                                Gap: {SA, AS} -- worlds RSA cannot exclude without sub-sentential reasoning
                                

                                This gap is why linguists posit EXH as a grammatical primitive. "Local RSA" -- RSA applied at each compositional node -- can derive the local reading without stipulating EXH. EXH then emerges as the α → ∞ limit of local RSA.

                                @cite{franke-bergen-2020}: Global Intentions Model #

                                Franke & Bergen's "Global Intentions" (GI) model provides machinery for reasoning over where EXH applies.

                                The GI Architecture #

                                1. Grammar generates parses: For each utterance, the grammar generates multiple readings based on where EXH is inserted:

                                  • lit: no EXH
                                  • M: EXH at matrix position
                                  • O: EXH at outer quantifier
                                  • I: EXH at inner quantifier
                                  • MO, MI, OI, MOI: combinations
                                2. Speaker chooses (utterance, parse) jointly:

                                  P_S(m, p | t; α) ∝ [P(t | ⟦m⟧^p)]^α
                                  

                                  The speaker picks both what to say and how to mean it.

                                3. Listener infers (world, parse) jointly:

                                  P_L(t, p | m; α) ∝ P(t) × P_S(m, p | t; α)
                                  

                                Finding #

                                GI outperforms simpler models (vanilla RSA, LU) because it can use strong readings like ⟦SS⟧^M = {██-} that uniquely identify states. However, GI assumes EXH is a grammatical primitive.

                                The "RSA All The Way Down" Reinterpretation #

                                Bergen & Franke's "parse" variable can be reinterpreted as "where RSA reasoning applies":

                                Bergen & Franke: RSA All The Way Down:
                                ───────────────── ─────────────────────────
                                parse p ∈ {lit,M,O,I,...} config c ∈ {RSA application sites}
                                ⟦m⟧^p = EXH at positions ⟦m⟧^c = RSA(α) at positions
                                Speaker chooses parse Speaker chooses where to be pragmatic
                                

                                The Equivalence #

                                In the α → ∞ limit:

                                Bergen & Franke's GI model equals "RSA all the way down" in the limit.

                                The Derivation Direction #

                                ApproachStarting PointPragmatic Reasoning
                                Bergen & FrankeEXH is primitiveRSA reasons over EXH positions
                                RSA All The Way DownRSA is primitiveEXH emerges in α → ∞ limit

                                The compositional structure is the same. The difference is foundational:

                                Consequences #

                                1. EXH is not a separate grammatical mechanism, but emergent behavior of rational communication at high precision.

                                2. GI's empirical success (Bayes factor analysis) is evidence for the compositional structure, not for EXH-as-primitive.

                                3. One mechanism (RSA) at different α values, rather than two mechanisms (grammar-EXH + pragmatics-RSA).

                                Formal Correspondence #

                                The GI model's joint distribution over (utterance, parse) corresponds to compositional RSA's joint distribution over (utterance, enrichment-sites).

                                A parse in Bergen & Franke's sense: where EXH is inserted

                                Instances For
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    An RSA application configuration: where RSA reasoning happens

                                    • matrixRSA : Bool

                                      Apply RSA at matrix level?

                                    • outerRSA : Bool

                                      Apply RSA at outer quantifier?

                                    • innerRSA : Bool

                                      Apply RSA at inner quantifier?

                                    Instances For
                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For
                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Convert RSA config to equivalent parse (in the α → ∞ limit)

                                          Equations
                                          Instances For

                                            The Global Intentions model: speaker chooses (utterance, parse)

                                            • utterance : Utt
                                            • parse : Parse
                                            Instances For

                                              RSA-All-The-Way-Down: speaker chooses (utterance, config)

                                              Instances For

                                                In the α → ∞ limit, RSAATWD choice maps to GI choice

                                                Equations
                                                Instances For

                                                  The Limit Theorem (Conceptual) #

                                                  theorem rsaatwd_limit_is_gi :
                                                      ∀ (scenario :...) (m : Utt) (t : World),
                                                        lim_{α→∞} P_RSAATWD(m, c | t; α) = P_GI(m, c.toParse | t)
                                                  

                                                  Bergen & Franke's machinery for reasoning over parses emerges naturally from "RSA at every compositional node" in the limit.

                                                  Implications #

                                                  1. GI's empirical success (they show GI >> LI >> LU >> RSA in Bayes factors) is evidence for compositional pragmatic reasoning, interpretable as:

                                                    • B&F interpretation: Grammar generates EXH, RSA selects among readings
                                                    • Our interpretation: RSA applies compositionally, EXH-like behavior emerges
                                                  2. GI wins because ⟦SS⟧^M uniquely identifies state ██-. This reading requires matrix exhaustification.

                                                    • B&F: Matrix EXH is a grammatical option
                                                    • Us: High-α RSA at the matrix level produces the same strengthening
                                                  3. Parse selection = RSA site selection: B&F's listener inferring the intended parse is equivalent to inferring where the speaker applied pragmatic reasoning.

                                                  Bergen & Franke's key empirical finding: GI model assigns high probability to 'SS' for state ██- because ⟦SS⟧^M = {██-}.

                                                  In RSAATWD terms: with high-α RSA at matrix level, the speaker strongly prefers 'SS' for this state because it's uniquely identifying.

                                                  Algebraic Structure of Compositional RSA #

                                                  Does the joint distribution over (utterance, config) decompose in a principled algebraic way?

                                                  The Product Hypothesis #

                                                  If pragmatic reasoning at different nodes is independent, then:

                                                  P_S(m, c | t; α) ∝ ∏_{node n ∈ c} P_RSA(choice_n | local_state_n; α)
                                                  

                                                  This would mean:

                                                  1. Each node contributes independently to the overall probability
                                                  2. Informativity MULTIPLIES across nodes
                                                  3. The joint inference FACTORS into local inferences

                                                  Informativity Multiplication #

                                                  For a sentence with multiple scalar items, if informativity multiplies:

                                                  informativity(m, c) = ∏_{n ∈ c} informativity_n(choice_n)
                                                  

                                                  Then the softmax over (m, c) decomposes:

                                                  P_S(m, c | t; α) ∝ exp(α × log informativity(m, c))
                                                                   = exp(α × Σ_n log informativity_n(choice_n))
                                                                   = ∏_n exp(α × log informativity_n(choice_n))
                                                                   = ∏_n P_RSA_n(choice_n | local_t_n; α)
                                                  

                                                  This is the algebraic signature of compositional RSA.

                                                  Connection to GradedMonad (RSAFree) #

                                                  The RSAFree graded monad (see GradedMonad.lean) has exactly this structure:

                                                  -- RSAFree is a graded monad where the grade tracks alternatives
                                                  -- Binding (seq) composes RSA computations
                                                  -- Informativity multiplies under seq
                                                  
                                                  seq : RSAFree W A → (A → RSAFree W B) → RSAFree W B
                                                  

                                                  The monad laws ensure:

                                                  1. Associativity: (m >>= f) >>= g = m >>= (λx. f x >>= g)
                                                  2. Identity: return x >>= f = f x

                                                  These correspond to:

                                                  1. Composition is associative: parsing order doesn't matter
                                                  2. Literal meaning is identity: no RSA = base meaning

                                                  The Decomposition Theorem (Conceptual) #

                                                  theorem informativity_multiplicative :
                                                      ∀ (m₁ : RSAFree W A) (m₂ : A → RSAFree W B) (a : A) (b : B),
                                                        informativity (seq m₁ m₂) (a, b) =
                                                        informativity m₁ a × informativity (m₂ a) b
                                                  

                                                  This says: the informativity of a composed utterance (with choices at multiple levels) is the product of the local informativities.

                                                  The joint distribution therefore factors.

                                                  Why Decomposition Matters #

                                                  1. Computational tractability: If the distribution factors, inference is polynomial rather than exponential in the number of nodes.

                                                  2. Theoretical parsimony: Local RSA at each node, composed via standard semantic composition. No special global mechanism.

                                                  3. Empirical predictions: Local RSA predicts independence effects that global reasoning would not.

                                                  4. Connection to grammar: The algebraic structure mirrors compositional semantics. RSA "rides along" with semantic composition.

                                                  The Monoid of Informativities #

                                                  Informativity values form a multiplicative monoid:

                                                  This monoid structure is what makes compositional RSA work:

                                                  Informativity as a value (for algebraic reasoning)

                                                  Instances For

                                                    The multiplicative monoid structure on informativities

                                                    Equations
                                                    Instances For
                                                      theorem RSA.Compositional.informativity_value_mul (i₁ i₂ : Informativity) :
                                                      (i₁ * i₂).value = i₁.value * i₂.value

                                                      Informativity values multiply like rationals

                                                      theorem RSA.Compositional.informativity_mul_assoc_value (i₁ i₂ i₃ : Informativity) :
                                                      (i₁ * i₂ * i₃).value = (i₁ * (i₂ * i₃)).value

                                                      Informativity multiplication is associative (on values)

                                                      @[simp]

                                                      Value of unit informativity is 1

                                                      Informativity multiplication has unit (on values)

                                                      The Factorization Picture #

                                                              Sentence: "Every student read some book"
                                                                               │
                                                                          ┌────┴────┐
                                                                          │ Matrix │ ← RSA here? (config.matrixRSA)
                                                                          │ EXH? │
                                                                          └────┬────┘
                                                                               │
                                                                    ┌──────────┴──────────┐
                                                                    │ │
                                                               ┌────┴────┐ ┌────┴────┐
                                                               │ "every" │ │ VP │
                                                               │ student │ │ │
                                                               └─────────┘ └────┬────┘
                                                                                          │
                                                                                ┌─────────┴─────────┐
                                                                                │ │
                                                                           ┌────┴────┐ ┌────┴────┐
                                                                           │ "read" │ │ "some" │ ← RSA here?
                                                                           └─────────┘ │ book │ (config.innerRSA)
                                                                                               └─────────┘
                                                      
                                                      P_S(m, c | t; α) = P_RSA_matrix(EXH? |...; α) × P_RSA_inner(EXH? |...; α)
                                                      

                                                      The distribution factors along the tree structure. Each node's RSA decision is (conditionally) independent.

                                                      Connection to Bergen & Franke #

                                                      Their finding that GI >> LI >> LU >> RSA can be reinterpreted:

                                                      GI wins because it has the right algebraic structure: the full product over all compositional nodes.

                                                      Summary: The Two Perspectives #

                                                      @cite{franke-bergen-2020} #

                                                      RSA All The Way Down (This File) #

                                                      The Mathematical Connection #

                                                      The joint inference machinery is identical:

                                                      P_S(m, x | t; α) ∝ [P(t | meaning(m, x))]^α
                                                      P_L(t, x | m; α) ∝ P(t) × P_S(m, x | t; α)
                                                      

                                                      where x = parse (B&F) or x = RSA-config (RSAATWD).

                                                      In the α → ∞ limit, these are the same model, because local RSA → local EXH at each node.

                                                      Bergen & Franke's empirical validation of GI is also validation of compositional RSA, without requiring EXH as a grammatical primitive.