Documentation

Linglib.Phenomena.Generics.Studies.TesslerGoodman2019

@cite{tessler-goodman-2019}: The Language of Generalization #

@cite{tessler-goodman-2019} @cite{lassiter-goodman-2017}

Psychological Review, 126(3), 395–436.

Core Insight #

Generics ("Robins lay eggs") use the SAME uncertain threshold semantics as gradable adjectives. The scale is prevalence rather than height/degree:

⟦gen⟧(p, θ) = 1 if prevalence p > threshold θ

This IS positiveMeaning from Semantics.Degree — the generic meaning is grounded in scalar adjective semantics by construction, not by bridge theorem.

Model #

Interpretation model (L0, Eq. 1): L(p, θ | u) ∝ δ_{⟦u⟧(p,θ)} · P(θ) · P(p)

Endorsement model (S1, Eq. 3): S(u | p) ∝ (∫_θ L(p, θ | u) dθ)^λ

The threshold θ is marginalized BEFORE exponentiation (matching the paper). With N discrete thresholds, the marginalized L0 is: L0(p | generic) ∝ P(p) · |{θ : p > θ}| = P(p) · p.toNat

This analytical marginalization eliminates the latent variable entirely, so the RSAConfig has Latent = Unit. S1 then exponentiates the marginalized L0, exactly matching the paper's endorsement model.

Parameters #

All parameters from the paper's code (analysis/model-simulations.Rmd, exampleParameters list, GitHub: mhtess/genlang-paper):

PropertyStable Betaφ (mix)Ref. prev.Paper endorse
barkBeta(5,1)0.495%0.88
hasSpotsBeta(5,1)0.710%0.02
dontEatPeopleBeta(10,1)*1.080%0.41
laysEggsBeta(10,10)0.250%0.95
isFemaleBeta(10,10)1.050%0.50
carriesMalariaBeta(1,30)0.110%0.97

*Paper uses Beta(50,1); we use Beta(10,1) for tractable arithmetic (avoids k^49 terms). Both give the same qualitative prediction.

Prior Model #

Prevalence priors are mixtures of two Beta distributions (Figure 2): P(p) = φ · Beta_stable(p) / Z_s + (1-φ) · Beta_null(p) / Z_n

where φ is the probability a category has the stable causal mechanism, Beta_stable varies per property, and Beta_null = Beta(1,50) for all properties (representing categories lacking the property mechanism).

Each component is NORMALIZED before mixing (matching the WebPPL code, which uses categorical to normalize each component independently). We achieve this without ℚ division by computing: P(p) ∝ φ · BW_s(p) · Z_n + (1-φ) · BW_n(p) · Z_s

Verified Predictions #

#FindingPriorp_refTheorem
1"Dogs bark" endorsedbark95%bark_endorsed
2"Kangaroos have spots" NOT endorsedhasSpots10%spots_not_endorsed
3"Sharks don't eat people" NOT endorseddontEatPeople80%dontEatPeople_not_endorsed
4"Robins lay eggs" endorsed despite 50%laysEggs50%laysEggs_endorsed
5"Robins are female" borderline at 50%isFemale50%isFemale_borderline
6"Mosquitos carry malaria" endorsed at 10%carriesMalaria10%malaria_endorsed
7Max prevalence satisfies all thresholdsgeneric_top_true
8Zero prevalence fails all thresholdsgeneric_zero_false
9Only rareWeak endorsed at 20%all four causal20%causal_20pct_pattern
103/4 causal conditions endorsed at 70%all four causal70%causal_70pct_pattern
11Endorsement ⟺ exceeds E[kprior]
@[reducible, inline]

Discretized prevalence: 0%, 5%, ..., 100% (21 values). Structurally identical to @cite{lassiter-goodman-2017}'s Height.

Equations
Instances For
    @[reducible, inline]

    Threshold values θ₀–θ₁₉ (20 values).

    Equations
    Instances For

      Prevalence at p% (bins at 5% increments, so p must be a multiple of 5). Uses a macro so the division is computed at elaboration time.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        Threshold at t% (bins at 5% increments, so t must be a multiple of 5). Uses a macro so the division is computed at elaboration time.

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Generic vs null utterance. The endorsement model decides between producing the generalization and staying silent.

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              Equations
              • One or more equations did not get rendered due to their size.

              ⟦gen⟧(p, θ) = p > θ.

              This IS positiveMeaning from Semantics.Degree — the generic meaning function is literally the positive scalar adjective meaning applied to the prevalence scale. Grounded by construction.

              Equations
              Instances For

                Mixture-of-Betas infrastructure #

                The paper models prevalence priors as mixtures of two Beta distributions: a stable component (property-specific) and a null component (Beta(1,50), representing categories without the causal mechanism).

                Each component is normalized before mixing (matching the WebPPL code where categorical normalizes each component independently). We achieve this without ℚ division by computing:

                P(k) ∝ φ · BW_stable(k) · Z_null + (1-φ) · BW_null(k) · Z_stable
                

                This is proportional to the correctly normalized mixture since: P(k) = Z_n · Z_s · [φ · BW_s(k)/Z_s + (1-φ) · BW_n(k)/Z_n]

                Unnormalized Beta(a,b) weight at bin k ∈ {0,...,20}. Proportional to Beta(a,b) PDF at x = k/20.

                Equations
                Instances For

                  Normalized mixture-of-Betas prevalence prior, discretized to 21 bins.

                  • Stable component: Beta(as, bs) with mixture weight φ
                  • Null component: Beta(na, nb) with mixture weight (1-φ)

                  Each component is normalized before mixing by cross-multiplying with the other component's total weight:

                  P(k) ∝ φ · BW_stable(k) · Z_null + (1-φ) · BW_null(k) · Z_stable
                  

                  This avoids ℚ division while preserving the correct mixture ratio.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    "Bark" prior: bimodal at 0 and ~90% (Figure 2, column 1). Stable Beta(5,1), φ = 0.4.

                    Equations
                    Instances For

                      "Have spots" prior: bimodal at 0 and ~90% (Figure 2, column 2). Stable Beta(5,1), φ = 0.7. Higher φ than bark — more animal categories can have spots than bark.

                      Equations
                      Instances For

                        "Don't eat people" prior: near-unimodal at ~90% (Figure 2, column 3). Stable Beta(10,1), φ = 1.0. Paper uses Beta(50,1); we use Beta(10,1) for tractable arithmetic (avoids k^49 terms). Both predict NOT endorsed at 80%.

                        Equations
                        Instances For

                          "Lays eggs" prior: bimodal at 0 and ~50% (Figure 2, column 4). Stable Beta(10,10), φ = 0.2. Most animal categories don't have egg-layers (peak at 0); among those that do, only females lay eggs (~50% prevalence).

                          Equations
                          Instances For

                            "Is female" prior: unimodal at ~50% (Figure 2, column 5). Stable Beta(10,10), φ = 1.0. Almost all animal categories have ~50% female members.

                            Equations
                            Instances For

                              "Carries malaria" prior: extreme low prevalence (Figure 2, column 6). Stable Beta(1,30), φ = 0.1. Very few animal categories carry diseases (90% null component). Among those that do, prevalence is very low (Beta(1,30) peaked near 0).

                              Equations
                              Instances For

                                Cast a ℚ-valued prior to ℝ.

                                Equations
                                Instances For
                                  @[reducible]

                                  Parametric RSAConfig for threshold-based generic endorsement.

                                  The threshold θ is marginalized analytically into the meaning function: meaning(u, p) = P(p) · |{θ : ⟦u⟧(p,θ) = true}|

                                  This matches the paper's endorsement model structure (Eq. 3): S(u | p) ∝ (∫_θ L(p, θ | u) dθ)^λ

                                  The marginalization happens BEFORE exponentiation (matching the paper), not after (as would happen with θ as a latent variable in RSAConfig). With Latent = Unit, S1 scores the marginalized L0 directly.

                                  The paper uses α = 2 (experimental fit: 2.47), but the binary comparison S1(generic) > S1(silent) is α-invariant for any α > 0, since rpow preserves order. We use α = 1 for tractable interval arithmetic.

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For
                                    @[reducible]

                                    "Bark" config: peaked high prior (Figure 2, column 1).

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For
                                      @[reducible]

                                      "Have spots" config: peaked high prior (Figure 2, column 2).

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For
                                        @[reducible]

                                        "Don't eat people" config: peaked very high prior (Figure 2, column 3).

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For
                                          @[reducible]

                                          "Lays eggs" config: bimodal prior (Figure 2, column 4).

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For
                                            @[reducible]

                                            "Is female" config: unimodal prior at 50% (Figure 2, column 5).

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For
                                              @[reducible]

                                              "Carries malaria" config: extreme low prior (Figure 2, column 6).

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Prevalence 100% satisfies the generic for all thresholds.

                                                Generic meaning at prevalence 0% is false for all thresholds.

                                                The bimodal "lays eggs" prior peaks at zero prevalence.

                                                Endorsement model (Eq. 3) #

                                                The paper's key predictions are endorsement rates: given referent prevalence p for a kind k, does the speaker produce the generic?

                                                S(u | p) ∝ (∫_θ L(p,θ|u) dθ)^λ
                                                

                                                Endorsement > 50% ⟺ S1(generic | p) > S1(silent | p).

                                                The binary comparison is equivalent to tc(p) > E[tc | prior], i.e., the referent prevalence (in threshold-count units) exceeds the prior expected prevalence. This is the paper's central insight: the SAME prevalence can produce different endorsement rates depending on the prior (Figure 2).

                                                "Dogs bark" endorsed at 95% prevalence (Table 1: 95%; Figure 2, column 1: 0.88).

                                                "Robins lay eggs" endorsed at 50% prevalence (Figure 2, column 4: 0.95). Despite only 50% prevalence, the bimodal prior (peaked at 0 and 50%) makes the generic highly informative — it rules out the absent component.

                                                "Mosquitos carry malaria" endorsed at 10% prevalence (Figure 2, column 6: 0.97). The prior expects near-zero prevalence, so even low prevalence is highly informative. This is the model's explanation of "striking property" generics: rare properties have low prior expectations.

                                                "Kangaroos have spots" NOT endorsed at 10% prevalence (Figure 2, column 2: 0.02). Even though the prior has a null component, φ = 0.7 means 70% of the prior mass comes from the stable Beta(5,1) peaked near 100%. At 10% prevalence, the generic is uninformative relative to this high-prevalence expectation.

                                                "Sharks don't eat people" NOT endorsed at 80% prevalence (Figure 2, column 3: 0.41). Even though 80% is high in absolute terms, the prior (φ=1, Beta(10,1)) concentrates nearly all mass above 80%. The generic is uninformative because the listener already expects very high prevalence.

                                                "Robins are female" borderline at 50% prevalence (Figure 2, column 5: 0.50). The unimodal prior peaks at 50% with φ = 1.0, so the prior expected prevalence is exactly 50%. At the referent prevalence of 50%, the generic is exactly as informative as silence — endorsement is 0.5.

                                                Analytical endorsement condition #

                                                The paper's central analytical result (Appendix A) is that the endorsement comparison reduces to a cue validity test:

                                                S1(generic | p) > S1(silent | p) ⟺ p.toNat > E[k | prior]
                                                

                                                i.e., the referent prevalence bin exceeds the prior expected bin.

                                                Proof sketch: S1(u|p) ∝ rpow(L0(p|u), α). Since rpow is monotone for α > 0, the comparison reduces to L0(p|generic) > L0(p|silent). Expanding:

                                                L0(p|u) = meaning(u,p) / Z_u = prior(p) · tc(u,p) / Z_u
                                                

                                                For the generic, Z_gen = Σ_w prior(w) · w.toNat; for silence, Z_sil = 20 · Z_prior. Dividing by prior(p) > 0 and cross-multiplying:

                                                p.toNat / Z_gen > 20 / Z_sil ⟺ p.toNat > Z_gen / Z_prior = E[k | prior]
                                                

                                                Expected prevalence bin under a prior: E[k | prior] = Σ_k k·P(k) / Σ_k P(k).

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For
                                                  theorem Phenomena.Generics.Studies.TesslerGoodman2019.endorsement_iff_exceeds_expected (prior : Prevalence) (hp : ∀ (p : Prevalence), 0 prior p) (p : Prevalence) (hp_pos : 0 < prior p) (hZ : 0 < w : Prevalence, prior w) :

                                                  The endorsement condition reduces to a cue validity comparison: a generic is endorsed iff the referent prevalence bin exceeds the prior expected bin. This is the paper's central analytical result (Appendix A).

                                                  Proof: S1 policy comparison reduces to S1 score comparison (same denominator at world p), which equals L0 policy (rpow with α=1). The L0 policy comparison cross-multiplies to p.toNat × Σ prior > Σ prior × toNat, i.e., p.toNat > E[k|prior].

                                                  The classic prevalence asymmetry is EXPLAINED by the endorsement model: same prevalence (50%), different prior shapes → different S1 endorsement rates.

                                                  "Robins lay eggs" (true, ~50% prevalence) vs "Robins are female" (odd, ~50% prevalence). @cite{leslie-2008} documents the empirical observation; @cite{tessler-goodman-2019} derives the asymmetry from prior shape differences.

                                                  laysEggs_endorsed and isFemale_borderline (above) derive the predictions.

                                                  As α → ∞, the endorsement model sharpens to a categorical decision: endorsed generics get probability 1, non-endorsed get probability 0.

                                                  By `rpow_luce_eq_softmax` (Core), every rpow-based Luce choice rule IS
                                                  softmax over log scores. The endorsement model inherits all softmax
                                                  limit theorems for free. 
                                                  

                                                  L0 score for utterance u at prevalence p (unnormalized).

                                                  Equations
                                                  Instances For
                                                    theorem Phenomena.Generics.Studies.TesslerGoodman2019.endorsement_eq_softmax (prior : Prevalence) (p : Prevalence) (α : ) (hl0 : ∀ (u : Utterance), 0 < l0Score prior u p) :
                                                    l0Score prior Utterance.generic p ^ α / u : Utterance, l0Score prior u p ^ α = Core.softmax (fun (u : Utterance) => Real.log (l0Score prior u p)) α Utterance.generic

                                                    The endorsement rate equals softmax over log-L0 scores. Immediate from rpow_luce_eq_softmax: the endorsement model IS softmax.

                                                    When l0_gen > l0_sil (endorsed generic), the endorsement rate → 1 as α → ∞. Direct corollary of Softmax.tendsto_softmax_infty_at_max.

                                                    When l0_gen < l0_sil (non-endorsed generic), the endorsement rate → 0.

                                                    Case Study 2: Habitual Language #

                                                    @cite{tessler-goodman-2019} (Case Study 2) extend the generic endorsement model to habituals. The key insight: habituals ("John runs") use the same threshold semantics as generics ("Birds fly"), with Prevalence now interpreted as frequency of activity across occasions rather than proportion of a kind with a property.

                                                    Paper's actual prior model (Eq. 4): The paper uses a log-normal + delta mixture:

                                                    φ ~ Beta(γ, ξ)
                                                    ln(frequency) ~ Gaussian(μ, σ)  with probability φ
                                                    frequency = 0.01               with probability (1 - φ)
                                                    

                                                    The Beta parameters (γ, ξ) and Gaussian parameters (μ, σ) are fit to empirical frequency estimates from participants. We approximate the fitted priors with Beta mixtures that capture the qualitative predictions:

                                                    The paper reports a model fit of r²(93) = 0.894 on habitual endorsement data.

                                                    See also: Semantics.Lexical.Verb.Habituals.hab_reduces_to_threshold for the formal bridge from the traditional HAB operator to threshold semantics, completing the pipeline: HAB → threshold → uncertain threshold → RSA endorsement.

                                                    Frequency prior for "runs": moderate expectation. Approximates the paper's fitted log-normal prior with a Beta(5,3) mixture. The paper fits (γ, ξ, μ, σ) to participant frequency estimates; the exact fitted values are in analysis/model-simulations.Rmd.

                                                    Equations
                                                    Instances For

                                                      Frequency prior for "climbs mountains": rare activity. Approximates the paper's fitted log-normal prior with a Beta(2,6) mixture.

                                                      Equations
                                                      Instances For

                                                        Frequency prior for "drinks coffee": high-frequency activity. Approximates the paper's fitted log-normal prior with a Beta(7,2) mixture.

                                                        Equations
                                                        Instances For
                                                          @[reducible]
                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For
                                                            @[reducible]
                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For
                                                              @[reducible]
                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                "John runs" endorsed at 75% frequency (moderate freq exceeds moderate prior).

                                                                Habitual prior asymmetry: at the same 25% frequency, "climbs mountains" is endorsed but "drinks coffee" is not — paralleling the generic prevalence asymmetry.

                                                                Case Study 3: Causal Language #

                                                                @cite{tessler-goodman-2019} (Case Study 3, Experiments 3A–3B) extend the model to causal generics ("Herb X makes cheebas sleepy"). Here Prevalence is reinterpreted as the causal rate — the proportion of cases where the cause produces the effect.

                                                                Experimental design: In Experiment 3A, participants see "previous experimental results" (a table of substances tested on 100 subjects) that follow one of four distributions, manipulated between subjects:

                                                                In Experiment 3B, participants see one of two referent causal rates (20% or 70%) and judge whether the causal generalization holds ("Herb C makes cheebas sleepy").

                                                                We model the four conditions as different prevalence priors, varying the mixture weight φ (common → high φ, rare → low φ) and the stable Beta parameters (strong → high-mean Beta, weak → low-mean Beta). These are approximations of the empirically elicited priors from Experiment 3A, not exact replications.

                                                                The paper reports a model fit of r²(8) = 0.835 on causal endorsement data (Figure 11B).

                                                                Prior for common-strong cause: most categories have the mechanism (φ=0.75), and the mechanism is highly effective (Beta(10,1) peaked near 100%).

                                                                Equations
                                                                Instances For

                                                                  Prior for common-weak cause: most categories have the mechanism (φ=0.75), but the mechanism is weakly effective (Beta(2,8) peaked near 20%).

                                                                  Equations
                                                                  Instances For

                                                                    Prior for rare-strong cause: few categories have the mechanism (φ=0.25), but when present it is highly effective (Beta(10,1)).

                                                                    Equations
                                                                    Instances For

                                                                      Prior for rare-weak cause: few categories have the mechanism (φ=0.25), and the mechanism is weakly effective (Beta(2,8)).

                                                                      Equations
                                                                      Instances For
                                                                        @[reducible]
                                                                        Equations
                                                                        • One or more equations did not get rendered due to their size.
                                                                        Instances For
                                                                          @[reducible]
                                                                          Equations
                                                                          • One or more equations did not get rendered due to their size.
                                                                          Instances For
                                                                            @[reducible]
                                                                            Equations
                                                                            • One or more equations did not get rendered due to their size.
                                                                            Instances For
                                                                              @[reducible]
                                                                              Equations
                                                                              • One or more equations did not get rendered due to their size.
                                                                              Instances For

                                                                                Rare-weak cause endorsed at 20% causal rate: low prior expectation makes even 20% informative.

                                                                                Common-strong cause NOT endorsed at 20% causal rate: high prior expectation (peaked near 100%) makes 20% uninformative.

                                                                                Common-strong cause NOT endorsed at 50% causal rate: high prior (Beta(10,1), φ=0.75) puts expected rate near 70%, so 50% is uninformative. Note: the paper tests at 20% and 70%. At 70%, the comparison is borderline (E[k|prior] ≈ 14 ≈ bin(70%)), matching the paper's ~50% endorsement rate at referent prevalence 0.7 for common-strong (Figure 11B).

                                                                                Rare-strong cause NOT endorsed at 20% causal rate (Figure 11B: ~35% endorsement). Despite fewer competing causes than common-strong, the prior still concentrates enough mass above 20% (via Beta(10,1)) to make 20% uninformative.

                                                                                Common-weak cause endorsed at 70% causal rate (Figure 11B: ~75% endorsement). With Beta(2,8) peaked near 20%, a referent rate of 70% far exceeds the prior expectation.

                                                                                Causal prior asymmetry (Experiment 3B): at 20% referent rate, only rare-weak is endorsed; the other three conditions are not. This matches the paper's Figure 11B (left panel).

                                                                                At 70% referent rate, all conditions except common-strong are endorsed (Figure 11B). Common-strong is borderline (~50% endorsement in the paper), matching our model's E[k|prior] ≈ bin(70%).

                                                                                Cue Validity and Endorsement #

                                                                                @cite{tessler-goodman-2019} (pp. 29-30, Appendix A) show that endorsement in the infinite-rationality limit reduces to a cue validity comparison:

                                                                                endorsed ⟺ prevalence(f, k_ref) > E_prior[prevalence]
                                                                                         ⟺ cue_validity(f, k_ref) > 1
                                                                                

                                                                                where cue_validity(f, k) = prevalence(f, k) / E[prevalence].

                                                                                This connects the RSA model to the classical notion from @cite{rosch-mervis-1975}: a feature is diagnostic of a category exactly when the feature is more prevalent in that category than expected across categories — i.e., when cue validity > 1.

                                                                                In mkGenericCfg, the endorsement condition S1(generic | p_ref) > S1(silent | p_ref) reduces to p_ref.toNat > E[k | prior] after L0 normalization cancels the common factor. This is exactly the cue validity condition when the expected bin E[k | prior] serves as the denominator.

                                                                                def Phenomena.Generics.Studies.TesslerGoodman2019.cueValidity (referentPrevalence expectedPrior : ) :

                                                                                Cue validity: ratio of referent prevalence to expected prevalence under the prior.

                                                                                Equations
                                                                                Instances For
                                                                                  theorem Phenomena.Generics.Studies.TesslerGoodman2019.endorsed_iff_cue_validity_gt_one (referentPrev expectedPrior : ) (hE : 0 < expectedPrior) :
                                                                                  expectedPrior < referentPrev 1 < cueValidity referentPrev expectedPrior

                                                                                  A generic is endorsed (prevalence exceeds prior expectation) iff cue validity > 1.

                                                                                  Unified Architecture #

                                                                                  All three domains — generics, habituals, and causal language — are instances of mkGenericCfg with different prevalence priors. The threshold semantics, RSA inference, and endorsement mechanism are shared; only the prior varies.

                                                                                  This unification is structural (by construction), not proven post hoc. The integration pipeline is:

                                                                                  1. Traditional operator (GEN/HAB) reduces to threshold semantics (CovertQuantifier.reduces_to_threshold, Habituals.hab_reduces_to_threshold)
                                                                                  2. Threshold semantics with uncertain threshold → marginalized L0
                                                                                  3. RSA endorsement (mkGenericCfg) decides between generic and silence
                                                                                  4. Endorsement ≈ cue validity (endorsed_iff_cue_validity_gt_one)
                                                                                  theorem Phenomena.Generics.Studies.TesslerGoodman2019.unification :
                                                                                  (∃ (pr : Prevalence) (hp : ∀ (p : Prevalence), 0 pr p), barkCfg = mkGenericCfg pr hp) (∃ (pr : Prevalence) (hp : ∀ (p : Prevalence), 0 pr p), runsCfg = mkGenericCfg pr hp) ∃ (pr : Prevalence) (hp : ∀ (p : Prevalence), 0 pr p), rareWeakCfg = mkGenericCfg pr hp

                                                                                  All three case studies use mkGenericCfg — the prior is the only free parameter.