@cite{tessler-goodman-2019}: The Language of Generalization #
@cite{tessler-goodman-2019} @cite{lassiter-goodman-2017}
Psychological Review, 126(3), 395–436.
Core Insight #
Generics ("Robins lay eggs") use the SAME uncertain threshold semantics as gradable adjectives. The scale is prevalence rather than height/degree:
⟦gen⟧(p, θ) = 1 if prevalence p > threshold θ
This IS positiveMeaning from Semantics.Degree — the generic meaning is
grounded in scalar adjective semantics by construction, not by bridge theorem.
Model #
Interpretation model (L0, Eq. 1): L(p, θ | u) ∝ δ_{⟦u⟧(p,θ)} · P(θ) · P(p)
Endorsement model (S1, Eq. 3): S(u | p) ∝ (∫_θ L(p, θ | u) dθ)^λ
The threshold θ is marginalized BEFORE exponentiation (matching the paper). With N discrete thresholds, the marginalized L0 is: L0(p | generic) ∝ P(p) · |{θ : p > θ}| = P(p) · p.toNat
This analytical marginalization eliminates the latent variable entirely,
so the RSAConfig has Latent = Unit. S1 then exponentiates the
marginalized L0, exactly matching the paper's endorsement model.
Parameters #
All parameters from the paper's code (analysis/model-simulations.Rmd,
exampleParameters list, GitHub: mhtess/genlang-paper):
- α = 2 in the paper (experimental fit: 2.47). We use α = 1 since the binary comparison S1(generic) > S1(silent) is α-invariant for α > 0
- Bins: paper uses 98 bins (0.01–0.98); we use 21 bins (0%, 5%, ..., 100%) for exact rational arithmetic. Qualitative predictions are preserved.
- Null component: Beta(1, 50)
| Property | Stable Beta | φ (mix) | Ref. prev. | Paper endorse |
|---|---|---|---|---|
| bark | Beta(5,1) | 0.4 | 95% | 0.88 |
| hasSpots | Beta(5,1) | 0.7 | 10% | 0.02 |
| dontEatPeople | Beta(10,1)* | 1.0 | 80% | 0.41 |
| laysEggs | Beta(10,10) | 0.2 | 50% | 0.95 |
| isFemale | Beta(10,10) | 1.0 | 50% | 0.50 |
| carriesMalaria | Beta(1,30) | 0.1 | 10% | 0.97 |
*Paper uses Beta(50,1); we use Beta(10,1) for tractable arithmetic (avoids k^49 terms). Both give the same qualitative prediction.
Prior Model #
Prevalence priors are mixtures of two Beta distributions (Figure 2): P(p) = φ · Beta_stable(p) / Z_s + (1-φ) · Beta_null(p) / Z_n
where φ is the probability a category has the stable causal mechanism, Beta_stable varies per property, and Beta_null = Beta(1,50) for all properties (representing categories lacking the property mechanism).
Each component is NORMALIZED before mixing (matching the WebPPL code,
which uses categorical to normalize each component independently).
We achieve this without ℚ division by computing:
P(p) ∝ φ · BW_s(p) · Z_n + (1-φ) · BW_n(p) · Z_s
Verified Predictions #
| # | Finding | Prior | p_ref | Theorem |
|---|---|---|---|---|
| 1 | "Dogs bark" endorsed | bark | 95% | bark_endorsed |
| 2 | "Kangaroos have spots" NOT endorsed | hasSpots | 10% | spots_not_endorsed |
| 3 | "Sharks don't eat people" NOT endorsed | dontEatPeople | 80% | dontEatPeople_not_endorsed |
| 4 | "Robins lay eggs" endorsed despite 50% | laysEggs | 50% | laysEggs_endorsed |
| 5 | "Robins are female" borderline at 50% | isFemale | 50% | isFemale_borderline |
| 6 | "Mosquitos carry malaria" endorsed at 10% | carriesMalaria | 10% | malaria_endorsed |
| 7 | Max prevalence satisfies all thresholds | — | — | generic_top_true |
| 8 | Zero prevalence fails all thresholds | — | — | generic_zero_false |
| 9 | Only rareWeak endorsed at 20% | all four causal | 20% | causal_20pct_pattern |
| 10 | 3/4 causal conditions endorsed at 70% | all four causal | 70% | causal_70pct_pattern |
| 11 | Endorsement ⟺ exceeds E[k | prior] | — | — |
Discretized prevalence: 0%, 5%, ..., 100% (21 values). Structurally identical to @cite{lassiter-goodman-2017}'s Height.
Instances For
Threshold values θ₀–θ₁₉ (20 values).
Instances For
Prevalence at p% (bins at 5% increments, so p must be a multiple of 5). Uses a macro so the division is computed at elaboration time.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Threshold at t% (bins at 5% increments, so t must be a multiple of 5). Uses a macro so the division is computed at elaboration time.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Generic vs null utterance. The endorsement model decides between producing the generalization and staying silent.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
⟦gen⟧(p, θ) = p > θ.
This IS positiveMeaning from Semantics.Degree — the generic meaning
function is literally the positive scalar adjective meaning applied to the
prevalence scale. Grounded by construction.
Equations
Instances For
Full meaning function: utterance × threshold → prevalence → Bool.
Equations
- Phenomena.Generics.Studies.TesslerGoodman2019.meaning Phenomena.Generics.Studies.TesslerGoodman2019.Utterance.generic θ p = Phenomena.Generics.Studies.TesslerGoodman2019.genericMeaning θ p
- Phenomena.Generics.Studies.TesslerGoodman2019.meaning Phenomena.Generics.Studies.TesslerGoodman2019.Utterance.silent θ p = true
Instances For
Mixture-of-Betas infrastructure #
The paper models prevalence priors as mixtures of two Beta distributions: a stable component (property-specific) and a null component (Beta(1,50), representing categories without the causal mechanism).
Each component is normalized before mixing (matching the WebPPL code where
categorical normalizes each component independently). We achieve this
without ℚ division by computing:
P(k) ∝ φ · BW_stable(k) · Z_null + (1-φ) · BW_null(k) · Z_stable
This is proportional to the correctly normalized mixture since: P(k) = Z_n · Z_s · [φ · BW_s(k)/Z_s + (1-φ) · BW_n(k)/Z_n]
Sum of Beta(a,b) weights across all 21 bins.
Equations
- Phenomena.Generics.Studies.TesslerGoodman2019.betaTotal a b = List.foldl (fun (acc k : ℕ) => acc + Phenomena.Generics.Studies.TesslerGoodman2019.betaWeight a b k) 0 (List.range 21)
Instances For
Normalized mixture-of-Betas prevalence prior, discretized to 21 bins.
- Stable component: Beta(as, bs) with mixture weight φ
- Null component: Beta(na, nb) with mixture weight (1-φ)
Each component is normalized before mixing by cross-multiplying with the other component's total weight:
P(k) ∝ φ · BW_stable(k) · Z_null + (1-φ) · BW_null(k) · Z_stable
This avoids ℚ division while preserving the correct mixture ratio.
Equations
- One or more equations did not get rendered due to their size.
Instances For
"Bark" prior: bimodal at 0 and ~90% (Figure 2, column 1). Stable Beta(5,1), φ = 0.4.
Equations
Instances For
"Have spots" prior: bimodal at 0 and ~90% (Figure 2, column 2). Stable Beta(5,1), φ = 0.7. Higher φ than bark — more animal categories can have spots than bark.
Equations
Instances For
"Don't eat people" prior: near-unimodal at ~90% (Figure 2, column 3). Stable Beta(10,1), φ = 1.0. Paper uses Beta(50,1); we use Beta(10,1) for tractable arithmetic (avoids k^49 terms). Both predict NOT endorsed at 80%.
Equations
Instances For
"Lays eggs" prior: bimodal at 0 and ~50% (Figure 2, column 4). Stable Beta(10,10), φ = 0.2. Most animal categories don't have egg-layers (peak at 0); among those that do, only females lay eggs (~50% prevalence).
Equations
Instances For
"Is female" prior: unimodal at ~50% (Figure 2, column 5). Stable Beta(10,10), φ = 1.0. Almost all animal categories have ~50% female members.
Equations
Instances For
"Carries malaria" prior: extreme low prevalence (Figure 2, column 6). Stable Beta(1,30), φ = 0.1. Very few animal categories carry diseases (90% null component). Among those that do, prevalence is very low (Beta(1,30) peaked near 0).
Equations
Instances For
Cast a ℚ-valued prior to ℝ.
Equations
- Phenomena.Generics.Studies.TesslerGoodman2019.priorR prior p = ↑(prior p)
Instances For
Number of thresholds θ ∈ {0,...,19} satisfying p > θ.
For generic: count = p.toNat (0 for p=0, 1 for p=1, ..., 20 for p=20). For silence: count = 20 (all thresholds pass).
Equations
Instances For
Parametric RSAConfig for threshold-based generic endorsement.
The threshold θ is marginalized analytically into the meaning function: meaning(u, p) = P(p) · |{θ : ⟦u⟧(p,θ) = true}|
This matches the paper's endorsement model structure (Eq. 3): S(u | p) ∝ (∫_θ L(p, θ | u) dθ)^λ
The marginalization happens BEFORE exponentiation (matching the paper),
not after (as would happen with θ as a latent variable in RSAConfig).
With Latent = Unit, S1 scores the marginalized L0 directly.
The paper uses α = 2 (experimental fit: 2.47), but the binary comparison S1(generic) > S1(silent) is α-invariant for any α > 0, since rpow preserves order. We use α = 1 for tractable interval arithmetic.
Equations
- One or more equations did not get rendered due to their size.
Instances For
"Bark" config: peaked high prior (Figure 2, column 1).
Equations
- One or more equations did not get rendered due to their size.
Instances For
"Have spots" config: peaked high prior (Figure 2, column 2).
Equations
- One or more equations did not get rendered due to their size.
Instances For
"Don't eat people" config: peaked very high prior (Figure 2, column 3).
Equations
- One or more equations did not get rendered due to their size.
Instances For
"Lays eggs" config: bimodal prior (Figure 2, column 4).
Equations
- One or more equations did not get rendered due to their size.
Instances For
"Is female" config: unimodal prior at 50% (Figure 2, column 5).
Equations
- One or more equations did not get rendered due to their size.
Instances For
"Carries malaria" config: extreme low prior (Figure 2, column 6).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Prevalence 100% satisfies the generic for all thresholds.
Generic meaning at prevalence 0% is false for all thresholds.
The bimodal "lays eggs" prior peaks at zero prevalence.
The unimodal "is female" prior peaks at 50%.
Endorsement model (Eq. 3) #
The paper's key predictions are endorsement rates: given referent prevalence p for a kind k, does the speaker produce the generic?
S(u | p) ∝ (∫_θ L(p,θ|u) dθ)^λ
Endorsement > 50% ⟺ S1(generic | p) > S1(silent | p).
The binary comparison is equivalent to tc(p) > E[tc | prior], i.e., the referent prevalence (in threshold-count units) exceeds the prior expected prevalence. This is the paper's central insight: the SAME prevalence can produce different endorsement rates depending on the prior (Figure 2).
"Dogs bark" endorsed at 95% prevalence (Table 1: 95%; Figure 2, column 1: 0.88).
"Robins lay eggs" endorsed at 50% prevalence (Figure 2, column 4: 0.95). Despite only 50% prevalence, the bimodal prior (peaked at 0 and 50%) makes the generic highly informative — it rules out the absent component.
"Mosquitos carry malaria" endorsed at 10% prevalence (Figure 2, column 6: 0.97). The prior expects near-zero prevalence, so even low prevalence is highly informative. This is the model's explanation of "striking property" generics: rare properties have low prior expectations.
"Kangaroos have spots" NOT endorsed at 10% prevalence (Figure 2, column 2: 0.02). Even though the prior has a null component, φ = 0.7 means 70% of the prior mass comes from the stable Beta(5,1) peaked near 100%. At 10% prevalence, the generic is uninformative relative to this high-prevalence expectation.
"Sharks don't eat people" NOT endorsed at 80% prevalence (Figure 2, column 3: 0.41). Even though 80% is high in absolute terms, the prior (φ=1, Beta(10,1)) concentrates nearly all mass above 80%. The generic is uninformative because the listener already expects very high prevalence.
"Robins are female" borderline at 50% prevalence (Figure 2, column 5: 0.50). The unimodal prior peaks at 50% with φ = 1.0, so the prior expected prevalence is exactly 50%. At the referent prevalence of 50%, the generic is exactly as informative as silence — endorsement is 0.5.
Analytical endorsement condition #
The paper's central analytical result (Appendix A) is that the endorsement comparison reduces to a cue validity test:
S1(generic | p) > S1(silent | p) ⟺ p.toNat > E[k | prior]
i.e., the referent prevalence bin exceeds the prior expected bin.
Proof sketch: S1(u|p) ∝ rpow(L0(p|u), α). Since rpow is monotone for α > 0, the comparison reduces to L0(p|generic) > L0(p|silent). Expanding:
L0(p|u) = meaning(u,p) / Z_u = prior(p) · tc(u,p) / Z_u
For the generic, Z_gen = Σ_w prior(w) · w.toNat; for silence, Z_sil = 20 · Z_prior. Dividing by prior(p) > 0 and cross-multiplying:
p.toNat / Z_gen > 20 / Z_sil ⟺ p.toNat > Z_gen / Z_prior = E[k | prior]
Expected prevalence bin under a prior: E[k | prior] = Σ_k k·P(k) / Σ_k P(k).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The endorsement condition reduces to a cue validity comparison: a generic is endorsed iff the referent prevalence bin exceeds the prior expected bin. This is the paper's central analytical result (Appendix A).
Proof: S1 policy comparison reduces to S1 score comparison (same denominator at world p), which equals L0 policy (rpow with α=1). The L0 policy comparison cross-multiplies to p.toNat × Σ prior > Σ prior × toNat, i.e., p.toNat > E[k|prior].
The classic prevalence asymmetry is EXPLAINED by the endorsement model: same prevalence (50%), different prior shapes → different S1 endorsement rates.
"Robins lay eggs" (true, ~50% prevalence) vs "Robins are female" (odd, ~50% prevalence). @cite{leslie-2008} documents the empirical observation; @cite{tessler-goodman-2019} derives the asymmetry from prior shape differences.
laysEggs_endorsed and isFemale_borderline (above) derive the predictions.
As α → ∞, the endorsement model sharpens to a categorical decision: endorsed generics get probability 1, non-endorsed get probability 0.
By `rpow_luce_eq_softmax` (Core), every rpow-based Luce choice rule IS
softmax over log scores. The endorsement model inherits all softmax
limit theorems for free.
L0 score for utterance u at prevalence p (unnormalized).
Equations
Instances For
The endorsement rate equals softmax over log-L0 scores.
Immediate from rpow_luce_eq_softmax: the endorsement model IS softmax.
When l0_gen > l0_sil (endorsed generic), the endorsement rate → 1
as α → ∞. Direct corollary of Softmax.tendsto_softmax_infty_at_max.
When l0_gen < l0_sil (non-endorsed generic), the endorsement rate → 0.
Case Study 2: Habitual Language #
@cite{tessler-goodman-2019} (Case Study 2) extend the generic endorsement model
to habituals. The key insight: habituals ("John runs") use the same threshold
semantics as generics ("Birds fly"), with Prevalence now interpreted as frequency
of activity across occasions rather than proportion of a kind with a property.
Paper's actual prior model (Eq. 4): The paper uses a log-normal + delta mixture:
φ ~ Beta(γ, ξ)
ln(frequency) ~ Gaussian(μ, σ) with probability φ
frequency = 0.01 with probability (1 - φ)
The Beta parameters (γ, ξ) and Gaussian parameters (μ, σ) are fit to empirical frequency estimates from participants. We approximate the fitted priors with Beta mixtures that capture the qualitative predictions:
- Rare-activity priors (e.g., "climbs mountains", "writes novels"): most people never do this → low expected frequency
- High-frequency priors (e.g., "drinks coffee", "drives to work"): common daily activity → high expected frequency
- Moderate priors (e.g., "runs", "cooks dinner"): regular but not constant
The paper reports a model fit of r²(93) = 0.894 on habitual endorsement data.
See also: Semantics.Lexical.Verb.Habituals.hab_reduces_to_threshold for
the formal bridge from the traditional HAB operator to threshold semantics,
completing the pipeline: HAB → threshold → uncertain threshold → RSA endorsement.
Frequency prior for "runs": moderate expectation.
Approximates the paper's fitted log-normal prior with a Beta(5,3) mixture.
The paper fits (γ, ξ, μ, σ) to participant frequency estimates;
the exact fitted values are in analysis/model-simulations.Rmd.
Equations
Instances For
Frequency prior for "climbs mountains": rare activity. Approximates the paper's fitted log-normal prior with a Beta(2,6) mixture.
Equations
Instances For
Frequency prior for "drinks coffee": high-frequency activity. Approximates the paper's fitted log-normal prior with a Beta(7,2) mixture.
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
"John runs" endorsed at 75% frequency (moderate freq exceeds moderate prior).
"John climbs mountains" endorsed at 25% frequency (low freq exceeds rare-activity prior).
"John drinks coffee" NOT endorsed at 25% frequency (low freq below high-frequency prior).
Habitual prior asymmetry: at the same 25% frequency, "climbs mountains" is endorsed but "drinks coffee" is not — paralleling the generic prevalence asymmetry.
Case Study 3: Causal Language #
@cite{tessler-goodman-2019} (Case Study 3, Experiments 3A–3B) extend the model to
causal generics ("Herb X makes cheebas sleepy"). Here Prevalence is
reinterpreted as the causal rate — the proportion of cases where the cause
produces the effect.
Experimental design: In Experiment 3A, participants see "previous experimental results" (a table of substances tested on 100 subjects) that follow one of four distributions, manipulated between subjects:
- common: all substances show similar efficacy (unimodal distribution)
- rare: some substances show no efficacy, others show high (bimodal distribution)
- strong: effective substances produce strong effects (avg ~98%)
- weak: effective substances produce weak effects (avg ~20%)
In Experiment 3B, participants see one of two referent causal rates (20% or 70%) and judge whether the causal generalization holds ("Herb C makes cheebas sleepy").
We model the four conditions as different prevalence priors, varying the mixture weight φ (common → high φ, rare → low φ) and the stable Beta parameters (strong → high-mean Beta, weak → low-mean Beta). These are approximations of the empirically elicited priors from Experiment 3A, not exact replications.
The paper reports a model fit of r²(8) = 0.835 on causal endorsement data (Figure 11B).
Prior for common-strong cause: most categories have the mechanism (φ=0.75), and the mechanism is highly effective (Beta(10,1) peaked near 100%).
Equations
Instances For
Prior for common-weak cause: most categories have the mechanism (φ=0.75), but the mechanism is weakly effective (Beta(2,8) peaked near 20%).
Equations
Instances For
Prior for rare-strong cause: few categories have the mechanism (φ=0.25), but when present it is highly effective (Beta(10,1)).
Equations
Instances For
Prior for rare-weak cause: few categories have the mechanism (φ=0.25), and the mechanism is weakly effective (Beta(2,8)).
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Rare-weak cause endorsed at 20% causal rate: low prior expectation makes even 20% informative.
Common-strong cause NOT endorsed at 20% causal rate: high prior expectation (peaked near 100%) makes 20% uninformative.
Rare-weak cause endorsed at 70% causal rate.
Common-strong cause NOT endorsed at 50% causal rate: high prior (Beta(10,1), φ=0.75) puts expected rate near 70%, so 50% is uninformative. Note: the paper tests at 20% and 70%. At 70%, the comparison is borderline (E[k|prior] ≈ 14 ≈ bin(70%)), matching the paper's ~50% endorsement rate at referent prevalence 0.7 for common-strong (Figure 11B).
Rare-strong cause NOT endorsed at 20% causal rate (Figure 11B: ~35% endorsement). Despite fewer competing causes than common-strong, the prior still concentrates enough mass above 20% (via Beta(10,1)) to make 20% uninformative.
Rare-strong cause endorsed at 70% causal rate (Figure 11B: ~90% endorsement).
Common-weak cause endorsed at 70% causal rate (Figure 11B: ~75% endorsement). With Beta(2,8) peaked near 20%, a referent rate of 70% far exceeds the prior expectation.
Causal prior asymmetry (Experiment 3B): at 20% referent rate, only rare-weak is endorsed; the other three conditions are not. This matches the paper's Figure 11B (left panel).
At 70% referent rate, all conditions except common-strong are endorsed (Figure 11B). Common-strong is borderline (~50% endorsement in the paper), matching our model's E[k|prior] ≈ bin(70%).
Cue Validity and Endorsement #
@cite{tessler-goodman-2019} (pp. 29-30, Appendix A) show that endorsement in the infinite-rationality limit reduces to a cue validity comparison:
endorsed ⟺ prevalence(f, k_ref) > E_prior[prevalence]
⟺ cue_validity(f, k_ref) > 1
where cue_validity(f, k) = prevalence(f, k) / E[prevalence].
This connects the RSA model to the classical notion from @cite{rosch-mervis-1975}: a feature is diagnostic of a category exactly when the feature is more prevalent in that category than expected across categories — i.e., when cue validity > 1.
In mkGenericCfg, the endorsement condition
S1(generic | p_ref) > S1(silent | p_ref) reduces to
p_ref.toNat > E[k | prior] after L0 normalization cancels the common factor.
This is exactly the cue validity condition when the expected bin E[k | prior]
serves as the denominator.
Cue validity: ratio of referent prevalence to expected prevalence under the prior.
Equations
- Phenomena.Generics.Studies.TesslerGoodman2019.cueValidity referentPrevalence expectedPrior = referentPrevalence / expectedPrior
Instances For
A generic is endorsed (prevalence exceeds prior expectation) iff cue validity > 1.
Unified Architecture #
All three domains — generics, habituals, and causal language — are instances of
mkGenericCfg with different prevalence priors. The threshold semantics, RSA
inference, and endorsement mechanism are shared; only the prior varies.
This unification is structural (by construction), not proven post hoc. The integration pipeline is:
- Traditional operator (GEN/HAB) reduces to threshold semantics
(
CovertQuantifier.reduces_to_threshold,Habituals.hab_reduces_to_threshold) - Threshold semantics with uncertain threshold → marginalized L0
- RSA endorsement (
mkGenericCfg) decides between generic and silence - Endorsement ≈ cue validity (
endorsed_iff_cue_validity_gt_one)
All three case studies use mkGenericCfg — the prior is the only free parameter.