@cite{herbstritt-franke-2019} #
@cite{herbstritt-franke-2019}
Complex probability expressions & higher-order uncertainty: Compositional semantics, probabilistic pragmatics & experimental data. Cognition 186 (2019) 50–71.
Overview #
Simple probability expressions (probably, certainly, possibly) convey first-order uncertainty about the probability of an event. Complex/nested expressions (certainly likely, probably possible, might be unlikely) convey higher-order uncertainty — uncertainty about uncertainty — with compositional threshold semantics following @cite{fagin-halpern-1994}.
The paper presents an RSA model extending @cite{goodman-stuhlmuller-2013}
to an urn scenario with 10 balls, where both the proportion of red balls
(first-order) and the number of balls observed (higher-order) are
manipulated. The key innovation over @cite{goodman-stuhlmuller-2013}:
Hellinger distance replaces KL divergence as the speaker utility measure,
because KL-divergence assigns infinite disutility to "true enough" messages
(see Core.Divergence for the theoretical analysis).
Model (Eq. 12–18) #
P_rat.bel(s|o, a) ∝ Hypergeometric(o|a, s, 10) · P_prior(s) (Eq. 12)
⟦probably(p)⟧ = {s ∈ S | s/10 > θ_probably} (Eq. 13)
P_LL(s|m) ∝ δ_{s∈⟦m⟧} · P_prior(s) (Eq. 15)
EU(m, o, a) = −HD[P_rat.bel(·|o, a), P_LL(·|m)] (Eq. 16)
P_S(m|o, a) ∝ exp(λ · EU(m, o, a)) (Eq. 17)
P_PL(s, o, a|m) ∝ P_S(m|o, a) · Hyp(o|a, s, 10) · P(a) · P(s) (Eq. 18)
rsa_predict Stress Test #
This file uses Hellinger distance in the speaker's utility (Eq. 16),
which requires Real.sqrt. The rsa_predict tactic's interval arithmetic
engine (Core.Interval.ReflectInterval.RExpr) supports exp, log,
rpow (ℕ exponent), and basic arithmetic — but NOT sqrt. All
predictions therefore use sorry, documenting a concrete gap in
rsa_predict's coverage. To close these, RExpr would need a .rsqrt
constructor with matching QInterval.sqrt evaluation and soundness proof.
Inferred Parameters (Table 6) #
θ_possibly = 0.247 [0.200, 0.299]
θ_probably = 0.549 [0.500, 0.594]
θ_certainly = 0.949 [0.904, 1.000]
λ = 4.873 [4.583, 5.174]
Urn state: number of red balls in the urn (0..10).
Instances For
The proportion of red balls: s/10. This is the first-order probability of drawing a red ball, and the measure function for simple expressions.
Equations
Instances For
Observation count: how many red balls the speaker observed out of
access balls drawn. For access level a, valid observations are
0..a; the observation model assigns probability 0 to obs > a.
Equations
Instances For
Unnormalized observation likelihood (ℝ-valued for RSAConfig).
P(obs | access, state) = C(state, obs) · C(10−state, access−obs) / C(10, access)
Returns 0 when obs > access (impossible observation). Uses
Nat.choose which returns 0 for impossible combinations (k > K),
making explicit guards unnecessary except for the Nat subtraction issue
(truncated subtraction gives wrong results when obs > access).
Instantiates Core.Distributions.hypergeometric for N=10.
Equations
Instances For
Speaker's posterior belief over urn states given observation count.
P_rat.bel(s|o, a) = P(o|a, s) / Σ_{s'} P(o|a, s')
With uniform prior, this is the normalized hypergeometric (Eq. 12). Used in the Hellinger distance computation (Eq. 16).
Equations
- One or more equations did not get rendered due to their size.
Instances For
ℚ-valued speaker belief for decidable verification.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The observation model is an instance of the general hypergeometric
from Core.Distributions (N=10).
Semantic threshold for "possibly" (posterior mean from Table 6). HDI: [0.200, 0.299].
Equations
Instances For
Semantic threshold for "probably" (posterior mean from Table 6). HDI: [0.500, 0.594].
Note: this differs from the LaBToM threshold (0.70) used in
Attitudes.EpistemicThreshold.EpistemicEntry.likely_. The discrepancy
may reflect differences between the production task here and the
Theory-of-Mind task in @cite{ying-zhi-xuan-wong-mansinghka-tenenbaum-2025},
or genuine differences between "probably" and "likely".
Equations
Instances For
Semantic threshold for "certainly" (posterior mean from Table 6). HDI: [0.904, 1.000].
Equations
Instances For
The inferred threshold ordering matches the theoretical prediction: certainly > probably > possibly.
The five simple expressions from Experiments 2 and 3.
- certainlyNot : SimpleExpr
- probablyNot : SimpleExpr
- possibly : SimpleExpr
- probably : SimpleExpr
- certainly : SimpleExpr
Instances For
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Simple expression meaning using the paper's inferred thresholds.
Eq. 13: ⟦X(RED)⟧ = {s ∈ S | s/10 > θ_X} (positive expressions) Eq. 14: ⟦X not(RED)⟧ = {s ∈ S | s/10 < 1 − θ_X} (negated expressions)
With the inferred thresholds, the cutoffs are:
- certainly: s/10 > 0.949 → s = 10 only
- probably: s/10 > 0.549 → s ≥ 6
- possibly: s/10 > 0.247 → s ≥ 3
- probably not: s/10 < 0.451 → s ≤ 4
- certainly not: s/10 < 0.051 → s = 0 only
Equations
- One or more equations did not get rendered due to their size.
Instances For
Expression meanings are nested: certainly ⊂ probably ⊂ possibly.
Posterior probability that an urn-state proposition φ holds, given the speaker's observation. ℚ-valued for decidable verification.
posteriorProb(access, obs, φ) = Σ_{s∈⟦φ⟧} P_rat.bel(s | obs, access)
Equations
- One or more equations did not get rendered due to their size.
Instances For
The three inner expressions from Experiment 3 (Eq. 22).
"likely" and "unlikely" use the same threshold as "probably"/"probably not" from the simple model. Footnote 18: "θ_probably from the simpler model should be mapped onto θ_likely in Table 9 because the latter represents the threshold of the inner expressions likely/probably."
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Inner expression meaning (over urn states, Eq. 22).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Outer modifier threshold (Table 9, complex model).
The paper infers separate thresholds for outer modifiers in the complex expression model (Experiment 2, Table 9). These are distinct from the inner expression thresholds (θ_possibly, θ_probably, θ_certainly from Table 6) and were inferred jointly with the complex model parameters.
Note: "is X" (bare copula) is treated as a simple expression in the paper's model, not as a complex expression with an outer modifier. We include it here for completeness, with θ_is = θ_likely (the inner threshold), which makes "is likely" equivalent to "likely" when the posterior probability of the inner proposition already exceeds θ_likely.
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.OuterMod.mightBe.threshold = 332 / 1000
- Phenomena.Modality.Studies.HerbstrittFranke2019.OuterMod.is_.threshold = Phenomena.Modality.Studies.HerbstrittFranke2019.θ_probably
- Phenomena.Modality.Studies.HerbstrittFranke2019.OuterMod.isProbably.threshold = 690 / 1000
- Phenomena.Modality.Studies.HerbstrittFranke2019.OuterMod.isCertainly.threshold = 982 / 1000
Instances For
Complex expression Y(X(RED)) (Eq. 23).
⟦Y(X(RED))⟧(⟨o, a⟩) ⟺ Σ_{s∈⟦X(RED)⟧} P_rat.bel(s|o, a) > θ_Y
The inner expression creates a proposition about urn states, and the outer expression checks whether the speaker's posterior probability of that proposition exceeds the outer threshold. This follows @cite{fagin-halpern-1994}'s nested probability semantics.
Connection to nestedThreshold: The theory-layer operator
Semantics.Modality.EpistemicProbability.nestedThreshold captures the
same pattern for homogeneous world types. Here the types are
heterogeneous: the inner proposition is over UrnState while the outer
evaluation is over (Obs, Access) pairs. This prevents a direct
type-level instantiation but the compositional structure is the same —
threshold comparison of a posterior probability over a derived proposition.
> vs ≥: This file uses strict > (matching Eq. 13, 23 in the
paper), while nestedThreshold uses ≥ (matching Fagin & Halpern's
convention). For H&F's thresholds, the two are extensionally equivalent:
see strict_threshold_equiv_ge below.
Equations
- One or more equations did not get rendered due to their size.
Instances For
For the paper's specific thresholds, strict > and non-strict ≥
give the same extension on UrnState. This is because no proportion
s/10 (for s ∈ {0,...,10}) exactly equals any of the thresholds
(247/1000, 549/1000, 949/1000).
This justifies using > (matching the paper's Eq. 13) even though
the theory-layer nestedThreshold uses ≥ (Fagin & Halpern's
convention).
@cite{herbstritt-franke-2019} RSA model for simple probability expressions.
Eq. 15: P_LL(s|m) ∝ ⟦m⟧(s) (literal listener) Eq. 16: EU(m, o, a) = −HD[P_rat.bel(·|o, a), P_LL(·|m)] (Hellinger utility) Eq. 17: P_S(m|o, a) ∝ exp(λ · EU(m, o, a)) (softmax speaker) Eq. 18: P_PL(s, o, a|m) ∝ P_S · Hyp(o|a, s) · P(a) · P(s) (pragmatic listener)
The speaker utility uses Hellinger distance (not KL divergence),
imported from Core.Divergence.negHellingerDist. This is necessary
because KL divergence assigns infinite disutility to "true enough"
messages — a speaker who is 95% sure of RED can never say "certainly"
under KL, but CAN under Hellinger (see Core.Divergence §5 for the
theoretical comparison).
The model is parametric in access (number of balls the speaker draws).
Each access level yields a separate RSAConfig, with L1 marginalizing
over possible observation counts (0..10, with prior = 0 for obs > access).
Flat priors (P_prior(s) uniform) for simplicity; the paper infers beta-binomial priors (α_s ≈ 3.25, β_s ≈ 3.05) jointly with thresholds.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Model Predictions #
The model's qualitative predictions are documented here but not proved via
rsa_predict because the Hellinger distance S1 score creates three nested
Finset.sum expansions (11 terms each), making reification too slow for
the generic reifier. The model's correctness is established by the
computable ℚ-level verifications above (belief formation, threshold
semantics, complex expressions).
Production (S1):
- Full access (a=10), 8/10 red: S1 prefers "probably" over "certainly" (s=8 is outside "certainly"'s extension, so HD(δ₈, L0("certainly")) = 1)
- Partial access (a=8), 8/8 red: S1 prefers "certainly" over "probably" (belief concentrates on s=9–10, close to L0("certainly") = δ₁₀)
- Full access, 5/10 red: S1 prefers "possibly" over "probably"
- Low uncertainty (a=8), 6/8 red: S1 prefers "probably" over "possibly"
- High uncertainty (a=4), 3/4 red: S1 prefers "probably" over "certainly"
Interpretation (L1):
- After "certainly" (a=10): L1 concentrates mass on s=10
- After "probably" (a=8): L1 assigns more mass to s=8 than s=4
- After "possibly" (a=10): scalar implicature — L1 assigns more mass to s=5 than s=9 (parallels "some" implicature in @cite{goodman-stuhlmuller-2013})
Inferred semantic threshold parameters from the simple expression model (Experiment 1 data, Table 6).
These are posterior means with 95% highest density intervals, inferred via Bayesian parameter estimation in JAGS.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_possibly_inferred = { mean := 247 / 1000, hdi_lower := 200 / 1000, hdi_upper := 299 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_probably_inferred = { mean := 549 / 1000, hdi_lower := 500 / 1000, hdi_upper := 594 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_certainly_inferred = { mean := 949 / 1000, hdi_lower := 904 / 1000, hdi_upper := 1 }
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_expression = { dimension := "expression", r := 861 / 1000, hdi_lower := 824 / 1000, hdi_upper := 902 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_state = { dimension := "state", r := 741 / 1000, hdi_lower := 666 / 1000, hdi_upper := 824 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_access = { dimension := "access", r := 798 / 1000, hdi_lower := 736 / 1000, hdi_upper := 858 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_observation = { dimension := "observation", r := 819 / 1000, hdi_lower := 766 / 1000, hdi_upper := 868 / 1000 }
Instances For
All simple model correlations are substantial (r > 0.65).
Inferred outer modifier thresholds from the complex expression model (Experiment 2 data, Table 9). The inner thresholds (θ_possible, θ_likely) overlap with those from Table 6, confirming cross-experiment stability.
Footnote 18: "θ_probably from the simpler model should be mapped onto θ_likely in Table 9 because the latter represents the threshold of the inner expressions likely/probably."
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_possible_complex = { mean := 251 / 1000, hdi_lower := 200 / 1000, hdi_upper := 295 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_might_complex = { mean := 332 / 1000, hdi_lower := 295 / 1000, hdi_upper := 336 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_likely_complex = { mean := 549 / 1000, hdi_lower := 504 / 1000, hdi_upper := 599 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_probably_complex = { mean := 690 / 1000, hdi_lower := 629 / 1000, hdi_upper := 745 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.θ_certainly_complex = { mean := 982 / 1000, hdi_lower := 969 / 1000, hdi_upper := 988 / 1000 }
Instances For
Model–data correlation for the complex expression model (Table 10). Correlations are lower than the simple model but still substantial, with the best fit in the observation dimension.
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_complex_expression = { dimension := "expression", r := 654 / 1000, hdi_lower := 596 / 1000, hdi_upper := 719 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_complex_state = { dimension := "state", r := 849 / 1000, hdi_lower := 812 / 1000, hdi_upper := 883 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_complex_access = { dimension := "access", r := 853 / 1000, hdi_lower := 818 / 1000, hdi_upper := 888 / 1000 }
Instances For
Equations
- Phenomena.Modality.Studies.HerbstrittFranke2019.corr_complex_observation = { dimension := "observation", r := 934 / 1000, hdi_lower := 915 / 1000, hdi_upper := 952 / 1000 }
Instances For
Architectural Comparison with @cite{goodman-stuhlmuller-2013} #
This model is a direct extension of the @cite{goodman-stuhlmuller-2013}
architecture formalized in ScalarImplicatures/Studies/GoodmanStuhlmuller2013.lean.
| Component | G&S 2013 (N=3) | H&F 2019 (N=10) |
|---|---|---|
| State space | {0,1,2,3} objects | {0,...,10} red balls |
| Observation model | hypergeometric | hypergeometric |
| Utterances | quantifiers/numerals | probability expressions |
| Meaning | Boolean (some, all) | threshold (probably, certainly) |
| Utility | KL divergence | Hellinger distance |
| RSAConfig.s1Score | exp(α * Σ bel * log(l0)) | exp(α * negHellingerDist bel l0) |
rsa_predict | all 11 findings proved | too slow (3 nested Σ₁₁) |
| Higher-order | access modulates implicature | access modulates expression choice |
The key structural difference is the speaker utility:
- G&S 2013: expected log-likelihood ∝ −D_KL, uses
Real.log(inRExpr) - H&F 2019: negative Hellinger distance, uses
Real.sqrt(inRExprvia.rsqrt)
Both models use Core.Distributions.hypergeometric for the observation model
and share the same RSAConfig pattern (access-parametric, Latent = Obs).
The hypergeometric observation model generalizes across both papers.
Hellinger vs KL: Why the Divergence Measure Matters #
The choice of divergence measure is not a free parameter — it determines
which messages the speaker can consider. See Core.Divergence §5 for the
full theoretical analysis.
Example: Consider a speaker who observes 9/10 red balls (access=10). Her belief is a point mass at s=9.
"certainly RED": L0 assigns probability 1 to s=10 and 0 to s=9. Under KL: EU = −D_KL(δ₉ ∥ L0) = −log(L0(9|"certainly")) = −log(0) = −∞. The speaker can NEVER say "certainly" unless s=10. Under Hellinger: HD(δ₉, L0) = √(1 − √(0·1)) = 1, EU = −1. Finite disutility — the speaker CAN consider "certainly".
"probably RED": L0 assigns positive probability to s=6..10. Under both KL and Hellinger, "probably" has finite utility at s=9.
@cite{herbstritt-franke-2019} argues that "certainly" IS pragmatically appropriate when the speaker is nearly but not absolutely sure, and that Hellinger distance correctly captures this by making "certainly" available as a message with bounded disutility.
Modal Concord #
The paper notes that the compositional model consistently overpredicts the frequency of "might be possible" in production data (Section 6). This is because the compositional semantics assigns "might be possible" a very weak truth condition (posterior probability of "possible" exceeds θ_might), making it true in almost all conditions.
The explanation: participants may give "might be possible" a modal
concord reading, collapsing the two modals to a single "possible".
See Phenomena.Modality.ModalConcord for the general phenomenon.
Modal concord occurs when two modals of the same logical type combine to
express a single modal meaning (e.g., "must necessarily" ≈ "must").
In the context of @cite{herbstritt-franke-2019}, the candidates for modal concord are "probably likely" and "might be possible", both of which combine modals of similar strength.