Documentation

Linglib.Phenomena.Reference.Studies.SikosEtAl2021

@cite{sikos-etal-2021} #

Sikos, L., Venhuizen, N. J., Drenhaus, H. & Crocker, M. W. (2021). Reevaluating pragmatic reasoning in language games. PLOS ONE 16(3): e0248388.

Core Contribution #

Replicates @cite{frank-goodman-2012} reference games and tests whether RSA's recursive reasoning (S1→L1) adds predictive value beyond a simpler baseline model that uses only the prior and literal semantics (= L0).

Three experiments with increasing pragmatic demands:

Key Arguments #

  1. Prior-driven variance dominates. In Experiments 1–2, most of the correlation between model and data is driven by object priors and literal semantics, not pragmatic reasoning. Trivially true items (where L0 = L1) inflate the correlation.

  2. Methodology critique. Correlation-based evaluation across all items conflates two sources of variance: (a) prior-driven (which any model with the right priors gets right) and (b) pragmatic (where L0 and L1 differ). Removing trivially-predicted items collapses RSA's advantage.

  3. Pragmatically informative contexts (Experiment 3). Even in contexts designed to maximize the L0/L1 difference, RSA does not significantly outperform the baseline.

  4. Typicality priors matter. The paper uses empirically-measured typicality priors (not uniform), which do substantial predictive work independent of pragmatic reasoning.

Relationship to RSA #

The baseline model is, mathematically, RSA's own L0 (literal listener with priors). Both sides agree on this. The critique is that the additional layers of recursive reasoning (S1, L1) don't add empirical value — the first step of RSA may be all that's needed.

Context Types #

Sikos et al. classify reference game contexts by how much pragmatic reasoning they require. This taxonomy is central to their argument: FG2012's stimuli are dominated by trivial contexts.

Classification of reference game contexts by pragmatic demands.

  • trivial : ContextType

    Only one object matches the utterance. L0 = L1 trivially.

  • pragSolvable : ContextType

    Multiple objects match, but pragmatic reasoning can break the tie. L0 ≠ L1: this is where RSA should add value.

  • pragReducible : ContextType

    Multiple objects match; pragmatic reasoning helps but cannot fully disambiguate (e.g., symmetry among speakers).

  • ambiguous : ContextType

    Multiple objects match and pragmatic reasoning cannot help. L0 ≈ L1 even with full RSA.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Model Fit Data #

      Correlation coefficients for the two competing models across experiments. The key comparison: baseline (= L0 with priors) vs full RSA (L1).

      Model fit for one experiment, comparing baseline and RSA correlations. Correlations stored as thousandths (e.g., 988 = r = 0.988).

      • experiment : Nat

        Experiment number (1, 2, or 3)

      • description : String

        Brief description of the experiment

      • nItems : Nat

        Number of unique context–utterance items

      • baselineR_thou : Nat

        Pearson r × 1000: baseline model (prior × literal semantics = L0)

      • rsaR_thou : Nat

        Pearson r × 1000: full RSA model (L1)

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Experiment 1: Replication of FG2012. 3-object contexts. Both models fit almost identically (r = 0.988 vs 0.992).

          Equations
          Instances For

            Experiment 2: Extended to 4-object contexts. Still baseline ≈ RSA (r = 0.990 vs 0.992).

            Equations
            Instances For

              Experiment 3: Pragmatically informative contexts designed to maximize L0/L1 divergence. RSA's advantage is non-significant (r = 0.77 vs 0.82). This is the critical test of the critique.

              Equations
              Instances For

                All three experiments.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Key Empirical Findings #

                  In Experiment 1, the baseline fits nearly as well as RSA (difference is only 4 thousandths of a correlation point).

                  In Experiment 3 (the critical test), the difference between models is 50 thousandths — small and non-significant.

                  RSA never dramatically outperforms the baseline in any experiment (gap < 100 thousandths = 0.100 correlation points in all cases).

                  Context Composition #

                  Sikos et al. show that FG2012's stimuli are dominated by trivially-predicted items, which inflate correlations for any model with the right priors.

                  Proportion of items in FG2012 that are trivially predicted. Stored as tenths of percent (780 = 78.0%). The exact value depends on the counting method; the paper reports that the majority of items in Experiments 1–2 are trivially predicted.

                  Equations
                  Instances For

                    Competing Interpretations #

                    Two interpretations of the finding that baseline ≈ RSA.

                    • rsaUnnecessary : Interpretation

                      RSA's recursive reasoning is empirically unnecessary — the literal listener with priors suffices. The additional S1→L1 computation adds no predictive value. (Sikos et al.'s interpretation)

                    • baselineIsL0 : Interpretation

                      RSA's L0 IS the baseline model, so high baseline fit is consistent with RSA. The question is whether L1 adds value in contexts where L0 ≠ L1. Sikos et al.'s Experiment 3 suggests it may not, though the test has limited statistical power. (Structural observation)

                    Instances For
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        Structural relationships between models #

                        1. The baseline model (prior × literal semantics) IS RSA's L0.
                        2. In trivial contexts (unique referent), L1 = L0.
                        3. In pragmatically solvable contexts, L1 ≠ L0 -- RSA's recursive reasoning makes different predictions.

                        These are mathematical facts about the models, not empirical claims.

                        What this does NOT show: That RSA is empirically vindicated. Sikos et al.'s Experiment 3 tested contexts specifically designed to be pragmatically solvable (where L0 ≠ L1), and RSA still did not significantly outperform the baseline.

                        Colors used in the experiments.

                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Shapes used in the experiments.

                            Instances For
                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                An object in the reference game.

                                Instances For
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For
                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      A feature predicate: either a color or a shape word.

                                      Instances For
                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For
                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            A context-utterance pair is trivial when exactly one object matches.

                                            Equations
                                            Instances For

                                              Trivial context: each utterance uniquely identifies its referent. {blue_square, green_circle, red_triangle}

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Utterances for the trivial context.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  FG2012's classic solvable context: {blue_square, blue_circle, green_square}. "square" applies to two objects; pragmatic reasoning breaks the tie.

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    Utterances for the solvable context.

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      "blue" uniquely identifies blue_square in the trivial context.

                                                      "square" is ambiguous in the solvable context (matches 2 objects).

                                                      The trivial context has all utterances trivially predicted.