Model Components #
Decision Problem #
A decision problem D = ⟨W, A, U, π_Q^W⟩ consists of:
- W: world states
- A: actions
- U: W × A → ℝ utility function
- π_Q^W: questioner's prior beliefs over worlds
Base-level Respondent R0 #
Selects true AND safe responses uniformly: R0(r | w, q) ∝ 1 if r is true in w & safe for q, else 0
Questioner Q #
Chooses question by soft-maximizing expected value after responses: Q(q | D) = SM_α(E_w[E_r~R0[V(D|r,q) - w_c·C(r)]])
Pragmatic Respondent R1 #
Updates beliefs about decision problem via Bayesian ToM: π_R1^D|q(D) ∝ Q(q|D) π_R1^D(D)
Chooses response by soft-maximizing: (1-β)·(-KL) + β·V(D|r,q) - w_c·C(r)
Case Study 1: Credit Cards #
Replication/extension of @cite{clark-1979}, N = 25 participants.
Conditions #
- (3) "Do you accept American Express?" → "Yes, we accept AE and [exhaustive list]"
- (4) "Do you accept American Express?" → "No, we accept [exhaustive list]"
- (5) "Do you accept credit cards?" → "Yes, we accept [exhaustive list]"
Finding #
Probability of exhaustive-list answers: (4) ≥ (5) > (3)
Model predictions for exhaustive responses (Case Study 1, p. 6). The paper reports CS1 empirical data via regression coefficients (β = 3.39 for (5)>(3), β = 0.13 for (4)≥(5)). Model predictions stated on p. 6: "0.75 for (4), 0.66 for (5) and 0.12 for (3)".
Equations
Instances For
Key prediction: unavailable (4) ≥ general (5) > available (3)
Case Study 2: Iced Tea #
N = 162 participants, 30 vignettes.
Example: "You are a bartender. The bar serves soda, iced coffee and Chardonnay. A woman asks: 'Do you have iced tea?'"
Options #
- Competitor: iced coffee (most useful alternative)
- Same-category: soda (similar but less useful)
- Other-category: Chardonnay (unrelated)
Finding #
Response preference ordering: competitor > taciturn ≥ same-category > exhaustive
Equations
- One or more equations did not get rendered due to their size.
Instances For
Human response rates averaged across 30 vignettes.
UNVERIFIED: raw data in data/human/case_study_2/ at
https://github.com/polina-tsvilodub/prior-pq uses different
category labels (e.g., "alternative", "fullList") that require
recoding to match these five categories.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Model rates (p. 8): "62% for competitor, 22% for taciturn, 14% for same-category, < 1% for other-category and exhaustive". The < 1% values are approximated as 1/100 here.
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.cs2_model_rates = { competitor := 62 / 100, taciturn := 22 / 100, sameCategory := 14 / 100, exhaustive := 1 / 100, otherCategory := 1 / 100 }
Instances For
Model captures the qualitative ordering
Case Study 3: Context-Sensitivity #
12 paired vignettes testing whether the SAME question with the SAME alternatives elicits different responses in different contexts.
Example:
- Context 1 (sleepover): "Do you have a blanket?" → sleeping bag preferred
- Context 2 (transportation): "Do you have a blanket?" → bubble wrap preferred
Finding #
Participants mentioned context-congruent competitor significantly more often.
- Context 1 competitor in context 1 vs 2: β = -2.14 [-2.60, -1.71]
- Context 2 competitor in context 2 vs 1: β = 1.34 [0.92, 1.77]
Effect sizes for context-sensitivity (log odds)
Equations
Instances For
Equations
Instances For
Credible intervals exclude zero → significant context effects
Best-fitting model parameters from Table S2 of electronic supplementary material.
Fit by MCMC (100 burn-in, 5000 samples) to minimize error between model and human answer distributions. Parameters vary by case study.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
CS2 fitted parameters (Table S2, supplement p. 5). β ≈ 1 means almost pure action-relevance.
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.cs2Params = { α_respondent := 887 / 100, α_questioner := 373 / 100, α_policy := 4, β := 96 / 100, w_c := 96 / 100, U_fail := some (34 / 10) }
Instances For
CS1 fitted parameters (Table S2, supplement p. 5).
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.cs1Params = { α_respondent := 5, α_questioner := 3 / 2, α_policy := 5 / 2, β := 9 / 10, w_c := 3 / 10, U_fail := none }
Instances For
CS3 fitted parameters (Table S2, supplement p. 5).
β = 0.29 means mostly epistemic (contrast with CS2's β = 0.96).
NOTE: the GitHub repo (params_case_study_3.csv) has different values
from a different fitting run; we use the published supplement values.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Architectural contribution #
PRIOR-PQ models how respondents produce overinformative answers to polar questions. The respondent R₁ maps to RSAConfig.S1 and the questioner Q is modeled separately (§7 below, not as RSAConfig.L1):
| PRIOR-PQ agent | RSAConfig role | Knows | Uncertain about |
|---|---|---|---|
| R₁ (respondent) | S1 (speaker) | world w | decision problem D |
| Q (questioner) | (separate model, §7) | DP D | world w |
The outer inference loop (Q → DP posterior → R₁) is NOT an RSAConfig.L1: RSAConfig captures R₁'s response selection, while Q's question-selection model lives in the multi-question formalization below (§7–§10).
Decision-problem marginalization is baked into s1Score (Latent = Unit),
making R₁ a standard RSAConfig. This shows that the same machinery
handles both assertion-based RSA and question-answering RSA.
Model equations #
R₁'s utility for response r in world w:
U(r, w) = (1−β)·log L0(w|r) + β·E_D[V(D, r)] − w_c·C(r)
- log L0(w|r): standard RSA informativity (surprisal at true world)
- E_D[V(D, r)]: expected action-relevance under inferred DP posterior
- C(r): response cost (utterance length)
The DP posterior π(D|q) is derived from the Q model (§2(c)): asking about iced tea signals wanting the target item, concentrating the posterior on wantTarget.
Simplified model #
The RSAConfig below is a simplified abstraction: 5 responses × 8 worlds × 4 DPs,
with pre-computed expectedActionValue from DP posterior weights (5:1:1:1). The
full computational model has 30+ responses × 16 worlds and a Q₁ pipeline computing
DP posteriors from the questioner's rationality model. Action values and fitted
parameters (β = 24/25, w_c = 24/25) are from the paper's supplementary material.
The DP posterior weights (5:1:1:1) are calibrated for the simplified model; the
full model derives them from Q₁.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Decision problem D = ⟨W, A, U, π_Q^W⟩ (defined in §2(b)). Each DP is defined by which item type the questioner wants. The utility function U(w, a) is elicited empirically (Table S1).
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
A response is true iff mentioned items are actually available.
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.responseTruth Phenomena.Questions.Studies.HawkinsEtAl2025.Response.taciturn x✝ = true
- Phenomena.Questions.Studies.HawkinsEtAl2025.responseTruth Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC x✝ = x✝.hasIC
- Phenomena.Questions.Studies.HawkinsEtAl2025.responseTruth Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionSoda x✝ = x✝.hasSoda
- Phenomena.Questions.Studies.HawkinsEtAl2025.responseTruth Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionChard x✝ = x✝.hasChard
- Phenomena.Questions.Studies.HawkinsEtAl2025.responseTruth Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive x✝ = (x✝.hasIC && x✝.hasSoda && x✝.hasChard)
Instances For
Action relevance V(D, r): utility of the item revealed by response r,
given decision problem D. Taciturn reveals nothing: V = U_fail = 3.4.
Exhaustive reveals all: V = max utility for that DP.
wantTarget values from Table S1 (supplement p. 3, ÷ 10).
Cross-DP values from prior elicitation means (see itemUtility).
NOTE: This ℝ definition is unused; actionValueQ (ℚ, §8) is the
authoritative version used in theorems.
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue x✝ Phenomena.Questions.Studies.HawkinsEtAl2025.Response.taciturn = 17 / 5
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 5693 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionSoda = 3611 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionChard = 2369 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive = 5693 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantSameCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 3959 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantSameCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionSoda = 9504 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantSameCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive = 9504 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantOtherCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 2547 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValue Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantOtherCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive = 9565 / 1000
Instances For
DP posterior π(D|q_tea) ∝ Q(q|D) · π₀(D) (§2(c), unnumbered). Unnormalized weights approximating the full Q₁ posterior. Asking "Do you have iced tea?" most benefits wantTarget (the questioner probably wants what they asked for), so wantTarget dominates 5:1:1:1.
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPrior Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget = 5
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPrior Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantCompetitor = 1
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPrior Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantSameCat = 1
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPrior Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantOtherCat = 1
Instances For
Expected action relevance E_D[V(D, r)], marginalized over DPs.
Precomputed from dpPrior (5:1:1:1, total = 8) and actionValue:
- taciturn: 17/5 = 3.4 (U_fail, same for all DPs)
- mentionIC: (5·5693 + 9521 + 3959 + 2547) / 8000 = 11123/2000 ≈ 5.56
- mentionSoda: (5·3611 + 3815 + 9504 + 2537) / 8000 = 33911/8000 ≈ 4.24
- mentionChard: (5·2369 + 2485 + 2615 + 9565) / 8000 = 2651/800 ≈ 3.31
- exhaustive: (5·5693 + 9521 + 9504 + 9565) / 8000 = 11411/1600 ≈ 7.13
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.expectedActionValue Phenomena.Questions.Studies.HawkinsEtAl2025.Response.taciturn = 17 / 5
- Phenomena.Questions.Studies.HawkinsEtAl2025.expectedActionValue Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 11123 / 2000
- Phenomena.Questions.Studies.HawkinsEtAl2025.expectedActionValue Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionSoda = 33911 / 8000
- Phenomena.Questions.Studies.HawkinsEtAl2025.expectedActionValue Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionChard = 2651 / 800
- Phenomena.Questions.Studies.HawkinsEtAl2025.expectedActionValue Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive = 11411 / 1600
Instances For
Response cost C(r) = number of items mentioned. Taciturn ("No") mentions 0 items; single mentions cost 1; exhaustive ("No, but we have IC, soda, Chardonnay") costs 3.
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.cost Phenomena.Questions.Studies.HawkinsEtAl2025.Response.taciturn = 0
- Phenomena.Questions.Studies.HawkinsEtAl2025.cost Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 1
- Phenomena.Questions.Studies.HawkinsEtAl2025.cost Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionSoda = 1
- Phenomena.Questions.Studies.HawkinsEtAl2025.cost Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionChard = 1
- Phenomena.Questions.Studies.HawkinsEtAl2025.cost Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive = 3
Instances For
β: weight on action-relevance vs informativity. Fitted value: 0.96 ≈ 24/25 (Table S2). Almost pure action-relevance: the respondent optimizes for the questioner's inferred decision problem.
Equations
Instances For
w_c: cost weight. Fitted value: 0.96 ≈ 24/25 (Table S2). Each mentioned item incurs substantial cost in the utility function.
Equations
Instances For
PRIOR-PQ as RSAConfig.
The respondent (R₁) IS S1. Decision-problem marginalization is baked into s1Score (Latent = Unit). The questioner (Q) is modeled separately in §7 below, not as RSAConfig.L1.
s1Score(L0, α, w, r) = if L0(w|r) = 0 then 0 else exp(α · ((1−β)·log L0(w|r) + β·E_D[V(D,r)] − w_c·C(r)))
Equations
- One or more equations did not get rendered due to their size.
Instances For
The actual world: all 3 alternatives in stock.
Equations
Instances For
Prediction 1: Competitor (iced coffee) preferred over taciturn.
MentionIC wins on action-relevance (E[V] = 11123/2000 ≈ 5.56 vs 17/5 = 3.4). MentionIC also has higher informativity: log L0(w|mentionIC) = log(1/4) > log(1/8) = log L0(w|taciturn) (mentionIC narrows to 4 worlds; taciturn is consistent with all 8). Despite higher cost (1 vs 0), the action-relevance advantage dominates with β = 24/25.
Prediction 2: Taciturn preferred over same-category (soda).
Despite soda's higher informativity (L0 = 1/4 vs 1/8) and action-relevance (E[V] = 33911/8000 ≈ 4.24 vs 17/5 = 3.4), taciturn wins on cost (0 vs 1). Reduces to: log 2 < 3.87.
Prediction 3: Competitor > same-category.
Both have same informativity (L0 = 1/4) and cost (1), but mentionIC has higher action-relevance (11123/2000 vs 33911/8000) because the DP posterior concentrates on wantTarget, where competitor is the best available substitute. Pure rational comparison: 44492 > 33911.
Prediction 4: Same-category > other-category (chardonnay).
Both have same informativity (L0 = 1/4) and cost (1). MentionSoda has higher action-relevance (33911/8000 vs 2651/800) because the DP posterior favors wantTarget, where soda (same-category) is a better substitute than Chardonnay (other-category). Pure rational comparison: 33911 > 26510.
Prediction 5: Competitor > exhaustive.
Despite exhaustive having much higher action-relevance (E[V] = 11411/1600 ≈ 7.13 vs 11123/2000 ≈ 5.56), mentionIC wins because exhaustive incurs 3× the cost (3 vs 1). With w_c = 24/25, the cost difference outweighs the action-relevance gain.
Q selects questions to maximize expected decision value #
PRIOR-PQ's Q (eq. 2.3) IS an optimal experiment designer:
- Experiment = question q
- Observation = R₀'s response r
- Observation model = R₀'s literal semantics (truth-conditional)
- Value function = expected decision value under DP posterior
The connection is structural: Q's utility U_Q(q) = E_{w~prior}[E_{r~R₀}[V(D,r,q)]] is exactly the EIG of the experiment q under the observation model R₀.
This section makes the connection explicit by constructing the observation model
from R₀ and showing Q is an optimalExperiment instance.
Number of true responses in world w (for uniform R₀ normalization).
Equations
- One or more equations did not get rendered due to their size.
Instances For
R₀ as an observation model: the literal respondent's truth-conditional semantics define a stochastic observation model.
P(r|w,q) = 1/|{r : responseTruth r w}| if responseTruth r w, else 0.
R₀ selects uniformly among true responses (literal respondent). The experiment is trivial (Unit) because we model a single question.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All responses, as a concrete list for dpValueR iteration.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Expected decision value: the value of holding posterior beliefs post.
V(post) = max_r Σ_w post(w) · E_D[V(D, r)]
where E_D[V(D, r)] = expectedActionValue r (marginalized over DPs
using dpPrior). This is the value function for Q's experiment
design problem: how useful is it to hold beliefs post?
Equations
- One or more equations did not get rendered due to their size.
Instances For
The questioner Q IS an optimal experiment designer.
Q selects questions to maximize expected decision value after observing R₀'s response. This is eq. 2.3 of @cite{hawkins-etal-2025}:
U_Q(q) = E_{w~prior}[E_{r~R₀(·|w,q)}[V(D^{r,q})]]
which is exactly eig r0ObservationModel worldPrior questionerValue.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Deriving the DP posterior from the questioner model #
The DP posterior π(D|q) is the paper's core innovation (§2(c)):
π(D|q) ∝ Q(q|D) · π₀(D)
where Q(q|D) = SM_αQ(EU_Q(q, D)) is a softmax over the set of questions (eq. 2.3). The questioner chooses which question to ask based on their DP.
The key structural argument for why π(D|q_tea) concentrates on wantTarget:
Each DP has a preferred question. For wantTarget, asking "Do you have iced tea?" directly addresses the goal. For wantCompetitor, asking "Do you have iced coffee?" would be strictly better.
Q(q|D) is high when q matches D. By the symmetry of the scenario (each item has its own question and DP), Q(q_X|wantX) > Q(q_X|wantY) for Y ≠ X. The person asking about iced tea is most likely someone who wants iced tea.
The posterior inverts Q. Since Q(q_tea|wantTarget) > Q(q_tea|D) for D ≠ wantTarget, and π₀ is uniform, the posterior concentrates on wantTarget. The 5:1:1:1 weights in
dpPriorapproximate this.
To formalize this, we define a multi-question Q model with 4 questions (one per item), compute the expected value of each (question, DP) pair, and prove that each DP's target question dominates.
V(D^{r,q}): value of the updated decision problem #
After hearing response r to question q, the questioner updates beliefs about the world (eq. 2.4): π_Q^{W|r,q}(w) ∝ R₀(r|w,q) · π_Q^W(w). The value V(D^{r,q}) is the maximum expected utility under updated beliefs, using an argmax action policy (α_κ → ∞ simplification):
V(D^{r,q}) = max_a Σ_w π_Q^{W|r,q}(w) · U(w, a)
For q_tea, response r reveals information about the true world. With 4 items and 2^4 = 16 full worlds, each response partitions worlds by which items are mentioned as available.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Whether a question's target item is available in world w.
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.questionTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Question.tea x✝ = x✝.hasTea
- Phenomena.Questions.Studies.HawkinsEtAl2025.questionTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Question.ic x✝ = x✝.hasIC
- Phenomena.Questions.Studies.HawkinsEtAl2025.questionTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Question.soda x✝ = x✝.hasSoda
- Phenomena.Questions.Studies.HawkinsEtAl2025.questionTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Question.chard x✝ = x✝.hasChard
Instances For
Utility U(w, a) for DP D: the value of choosing item a in world w. Values on 0-100 slider scale (stored as centesimals). U = item utility if available, else U_fail = 34/10 (Table S2). Actions: choose target, choose IC, choose soda, choose chard, or leave.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Whether an item is available in the full world.
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.itemAvailable Phenomena.Questions.Studies.HawkinsEtAl2025.Item.tea x✝ = x✝.hasTea
- Phenomena.Questions.Studies.HawkinsEtAl2025.itemAvailable Phenomena.Questions.Studies.HawkinsEtAl2025.Item.ic x✝ = x✝.hasIC
- Phenomena.Questions.Studies.HawkinsEtAl2025.itemAvailable Phenomena.Questions.Studies.HawkinsEtAl2025.Item.soda x✝ = x✝.hasSoda
- Phenomena.Questions.Studies.HawkinsEtAl2025.itemAvailable Phenomena.Questions.Studies.HawkinsEtAl2025.Item.chard x✝ = x✝.hasChard
- Phenomena.Questions.Studies.HawkinsEtAl2025.itemAvailable Phenomena.Questions.Studies.HawkinsEtAl2025.Item.leave x✝ = true
Instances For
Utility of choosing item a when you have DP D and item is available.
If unavailable, U_fail = 34/10. wantTarget row verified against
Table S1 (supplement p. 3): Target=96.18, Competitor=56.93,
Same=36.11, Other=23.69. Cross-DP rows (wantCompetitor, wantSameCat,
wantOtherCat) are from the prior elicitation experiment but not shown
in Table S1; values are per-scenario means from the raw data at
https://github.com/polina-tsvilodub/prior-pq
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
After hearing the answer to question q, the questioner's posterior beliefs concentrate on worlds consistent with the answer. P(w | answer, q) ∝ 1 if answer consistent with w, else 0. (R₀ answers truthfully, so the answer is deterministic given w.)
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.answerConsistent q Phenomena.Questions.Studies.HawkinsEtAl2025.PolarAnswer.yes w = Phenomena.Questions.Studies.HawkinsEtAl2025.questionTarget q w
- Phenomena.Questions.Studies.HawkinsEtAl2025.answerConsistent q Phenomena.Questions.Studies.HawkinsEtAl2025.PolarAnswer.no w = !Phenomena.Questions.Studies.HawkinsEtAl2025.questionTarget q w
Instances For
V(D^{answer,q}): value of updated DP after hearing answer to question q. = max_item Σ_{w consistent} (1/|consistent|) · U(w, item) Uses argmax policy (α_κ → ∞ simplification of eq. 2.2).
Equations
- One or more equations did not get rendered due to their size.
Instances For
EU_Q(q, D): questioner's expected utility for asking question q given DP D. = Σ_w π(w) · [V(D^{answer(w,q), q}) - w_c · 0] Question cost C(q) = 0 (all questions are equally costly). Since answer is deterministic given w, this simplifies to: = Σ_w (1/16) · V(D^{answer(w,q), q})
Equations
- One or more equations did not get rendered due to their size.
Instances For
Each DP's target question yields the strictly highest EU. Q(q_X|wantX) > Q(q_X|wantY) because asking about X directly addresses the wantX goal.
DP posterior concentration: Q(q_tea|wantTarget) > Q(q_tea|D). Since Q is softmax and exp is monotone, this follows from EU_Q(tea, wantTarget) > EU_Q(tea, D). With uniform π₀, the posterior π(D|q_tea) ∝ Q(q_tea|D) concentrates on wantTarget.
The 5:1:1:1 weights in dpPrior are consistent with the derived posterior
concentration.
Action value V(D, r) in ℚ for decidable computation.
Same values as actionValue (see itemUtility docstring for sources).
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValueQ x✝ Phenomena.Questions.Studies.HawkinsEtAl2025.Response.taciturn = 17 / 5
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValueQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 5693 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValueQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionSoda = 3611 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValueQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive = 5693 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValueQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantSameCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 3959 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValueQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantSameCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.exhaustive = 9504 / 1000
- Phenomena.Questions.Studies.HawkinsEtAl2025.actionValueQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantOtherCat Phenomena.Questions.Studies.HawkinsEtAl2025.Response.mentionIC = 2547 / 1000
Instances For
DP prior weights in ℚ (unnormalized, 5:1:1:1).
Equations
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPriorQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantTarget = 5
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPriorQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantCompetitor = 1
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPriorQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantSameCat = 1
- Phenomena.Questions.Studies.HawkinsEtAl2025.dpPriorQ Phenomena.Questions.Studies.HawkinsEtAl2025.DP.wantOtherCat = 1
Instances For
E_D[V(D, r)] computed by marginalizing over DPs with dpPriorQ weights.
Verifies the pre-computed expectedActionValue values.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The ℚ marginalization matches the pre-computed ℝ values.
Action value ordering for wantTarget: IC is the best substitute, then soda, then Chardonnay. This drives the competitor preference in the response ordering.
Each DP's own item has the highest action value (diagonal dominance). This is why the DP posterior matters: if the questioner wants IC, mentioning IC has utility 95.21 vs 56.93 if they want tea.
PRIOR-PQ's Q IS Van Rooy's rational questioner #
@cite{van-rooy-2003} defines the expected utility value of a question Q:
EUV(Q) = Σ_{cell ∈ Q} P(cell) · UV(cell)
where UV(cell) = V(D|cell) - V(D) is the utility value of learning cell
(Core.DecisionTheory.utilityValue).
PRIOR-PQ's questionerEU(q, D) computes the expected value of asking q:
EU_Q(q, D) = Σ_w π(w) · V(D^{answer(w,q), q})
Since the answer to a polar question deterministically partitions worlds into "yes" and "no" cells, this sum decomposes as:
EU_Q(q, D) = P(yes) · V(D|yes) + P(no) · V(D|no)
= EUV(Q_q, D) + V(D)
where Q_q is the binary partition induced by question q.
This correspondence shows that @cite{hawkins-etal-2025}'s questioner IS @cite{van-rooy-2003}'s rational questioner, specialized to polar questions. The softmax (eq. 2.3) adds probabilistic selection on top of Van Rooy's deterministic framework.
Map PRIOR-PQ decision problem to Core.DecisionTheory.DecisionProblem.
Uniform prior over 16 full worlds.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All items as a list for Core.DecisionTheory functions.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The uniform prior on toCoreDP sums to 1.
questionerEU computes the weighted sum of valueAfterLearning:
questionerEU(q, D) = Σ_{cell} P(cell) · V(D|cell).
This is the computational linkage between the Hawkins-specific definitions
(dpValueAfterAnswer, questionerEU) and Core.DecisionTheory's
generic types (valueAfterLearning, cellProbability).
Van Rooy correspondence: PRIOR-PQ's questionerEU equals Van Rooy's
questionUtility plus the baseline decision problem value.
questionerEU(q, D) = EUV(Q_q, D) + V(D)
Proved in two steps:
questionerEUcomputes the weighted sum Σ P(cell)·V(D|cell) (questionerEU_eq_weighted_value, verified bynative_decide)- Σ P(cell)·V(D|cell) = EUV + V(D) for any binary partition
(
binary_question_value_decomposition, structural algebraic identity fromCore.DecisionTheory)
This connects @cite{hawkins-etal-2025}'s Q model to @cite{van-rooy-2003}'s decision-theoretic question framework.
Question ordering is preserved: since dpValue depends only on D (not q),
comparing questionerEU across questions (same D) is equivalent to
comparing Van Rooy's questionUtility.
Softmax questioner at α → ∞ recovers Van Rooy's deterministic questioner #
@cite{van-rooy-2003}'s framework selects questions deterministically by
argmax of questionUtility. @cite{hawkins-etal-2025}'s PRIOR-PQ uses a
softmax questioner Q(q|D) = SM_{αQ}(questionerEU(q, D)), adding noise.
Two mathematical facts connect them:
Translation invariance (
dpPosterior_eq_vanRooy): SincequestionerEU = questionUtility + dpValue(§9) and softmax is translation-invariant,dpValue(D)drops out. The DP posterior usingquestionerEUIS the posterior using Van Rooy'squestionUtility, for ALL α — not just in the limit.Limit concentration (
dpPosterior_tendsto_one): BysoftmaxObserver_tendsto_one, π(wantTarget | q_tea, αQ) → 1 as αQ → ∞. At high questioner rationality, hearing "Do you have iced tea?" gives near-certain evidence that the questioner wants iced tea — recovering Van Rooy's deterministic framework.
questionerEU cast to ℝ for softmax limit theorems.
Equations
Instances For
Van Rooy's questionUtility cast to ℝ.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Baseline dpValue cast to ℝ (constant across questions for fixed D).
Named to avoid shadowing Core.ExperimentDesign.dpValueR.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Uniform prior over DPs (ℝ).
Equations
Instances For
Van Rooy decomposition in ℝ.
Translation invariance: the DP posterior using PRIOR-PQ's questionerEU
IS the posterior using Van Rooy's questionUtility, for ALL α.
dpValue(D) is absorbed by softmax_add_const.
Tea is the uniquely optimal question for wantTarget (strict, ℚ).
For each D ≠ wantTarget, some question strictly beats tea (ℚ).
Strict alignment cast to ℝ.
Strict non-optimality cast to ℝ.
Main limit theorem: π(wantTarget | q_tea, αQ) → 1 as αQ → ∞.
The respondent's DP posterior concentrates on wantTarget — the DP
for which asking about tea maximizes Van Rooy's questionUtility.
PRIOR-PQ's soft BToM inference recovers Van Rooy's deterministic
questioner in the high-rationality limit.
Connections to other modules #
Decision theory (Core.Agent.DecisionTheory): The toCoreDP bridge (§9)
maps PRIOR-PQ's decision problems to Core.DecisionTheory.DecisionProblem,
enabling reuse of questionUtility, dpValue, and
binary_question_value_decomposition. The vanRooy_correspondence theorem
proves this mapping is faithful.
Experiment design (Core.Agent.ExperimentDesign): The questioner_as_experiment
definition (§6) constructs Q as an optimalExperiment instance, showing that
question selection IS optimal experiment design. The observation model
r0ObservationModel IS R₀'s literal semantics.
Relevance theories (Comparisons/RelevanceTheories): The Van Rooy
correspondence (§9) instantiates the general result that QUD-based and
decision-theoretic relevance coincide (Blackwell bridge). PRIOR-PQ's polar
question partition is a binary QUD; the vanRooy_question_ordering theorem
shows that comparing questionerEU across questions reduces to comparing
Van Rooy's questionUtility — exactly the ordering that
Comparisons.Relevance.blackwell_unifies_relevance proves is equivalent to
QUD refinement.
Pragmatic answerhood (Phenomena/Questions/PragmaticAnswerhood): PRIOR-PQ's
respondent R₁ selects pragmatic answers sensitive to the questioner's inferred
decision problem. The "iced coffee" answer is pragmatically optimal because
R₁ infers that the questioner wants the target item (via BToM over Q), making
the competitor the most action-relevant alternative. This is a formal instance
of G&S's observation that pragmatic answerhood depends on the questioner's
information state — here, their decision problem replaces their factual
information set J.
Polar answers (Phenomena/Questions/PolarAnswers): The base-level respondent
R₀ produces literal polar answers (taciturn = "No"). R₁'s overinformative
responses ("No, but we have iced coffee") go beyond the polar answer, adding
a mention response. The responseTruth predicate ensures mentioned items are
truthful, connecting to G&S's requirement that answers be true in the
actual world.
When does cost select mention-some over mention-all? #
@cite{van-rooy-2003}'s value saturation shows that mention-some and mention-all partitions extract equal decision-relevant information. But the standard RSA informativity term (log L0) DOES distinguish them: the finer mention-all answer is more informative in the Shannon sense.
PRIOR-PQ's s1Score has three components:
score(u, w) = (1−β) · log L0(w|u) + β · V(D,u) − w_c · C(u)
├─ informativity ─┤ ├ relevance ┤ ├─ cost ─┤
Given value saturation (V equal), the mention-some vs mention-all comparison reduces to a trade-off between informativity loss and cost saving. The precise boundary is:
w_c · ΔC > (1 − β) · Δ(log L0)
where ΔC = C(mention-all) − C(mention-some) > 0 (cost saving) and Δ(log L0) = log L0(w|ma) − log L0(w|ms) ≥ 0 (informativity gap).
Special cases (corollaries of priorPQ_cost_dominance):
β = 1 (pure action-relevance): the informativity term vanishes. Any positive cost difference suffices (
pure_action_relevance). This is the Van Rooy limit — question interpretation is entirely determined by the decision problem.β = 0 (pure informativity): cost must overcome the full informativity gap. Mention-some wins only when the cost saving exceeds the Shannon information gained by being more specific.
0 < β < 1: the informativity gap is discounted by (1−β). Higher β makes it easier for cost to dominate. Hawkins's CS2 fitted value β = 0.96 is near the Van Rooy limit.
From score to S1: Since s1Score = exp(α · score) and exp is
strictly monotone (exp_lt_exp), S1(u₁|w) > S1(u₂|w) iff
score(u₁,w) > score(u₂,w) (when both L0(w|u) > 0). So the
score-level characterization fully determines the S1 preference.
The PRIOR-PQ score for a single (utterance, world) pair.
This is the exponent in PRIOR-PQ's s1Score (when L0(w|u) > 0):
s1Score = exp(α · priorPQScore ...).
The three components are explicitly separated:
logInfo: log L0(w|u), Shannon informativityactionVal: E_D[V(D,u)], action-relevance (from DP)utterCost: C(u), utterance cost
Equations
Instances For
Cost-dominance characterization (iff).
Given value saturation (equal action-relevance), one utterance scores higher than another if and only if its cost saving exceeds the informativity gap discounted by (1 − β).
This is the precise boundary between mention-some and mention-all preference in any PRIOR-PQ style model.
Score comparison lifts to S1 comparison via exp monotonicity.
Since S1(u|w) = exp(α · score(u,w)) / Z and Z is constant across utterances at a fixed world, S1(u₁|w) > S1(u₂|w) iff exp(α · score₁) > exp(α · score₂) iff score₁ > score₂.
Combined: S1 prefers u₁ over u₂ (at score level) iff cost-dominance holds.
This chains exp monotonicity with the cost-dominance characterization:
the S1 comparison, given value saturation, reduces to the single
inequality w_c · ΔC > (1 − β) · Δ(log L0).
Corollary (β = 1): At pure action-relevance, informativity drops out entirely. Any positive cost difference suffices.
This is the formal content of @cite{van-rooy-2003}'s economy argument: when the speaker cares only about the questioner's decision problem, the cheapest adequate answer always wins. The mention-some preference follows from value saturation alone — no parameter tuning needed.
Corollary (β < 1): At mixed action-relevance/informativity, the cost saving must exceed the informativity gap scaled by (1 − β).
The higher β, the smaller the effective informativity gap, and the easier it is for cost to dominate. Hawkins's CS2 fitted β = 0.96 means the informativity gap is discounted to 4% of its raw value.
Monotonicity in β: increasing action-relevance weight weakly increases the mention-some advantage (when mention-some is cheaper and mention-all is more informative).
The advantage score(ms) - score(ma) = w_c·ΔC - (1-β)·Δ(logL0) is
increasing in β (when Δ(logL0) ≥ 0). So raising β always pushes
toward mention-some.
Concrete instance: the newspaper example #
The newspaper scenario from @cite{van-rooy-2003} is the β = 1 case. The questioner's DP fully determines the question interpretation, and cost selects mention-some.
In the newspaper scenario, "At Shop A" (cost 1) is strictly preferred over "At Shop A and Shop B" (cost 2) for any w_c > 0.
This is the β = 1 instantiation of priorPQ_cost_dominance:
value saturation (newspaper_value_saturation_A) cancels the
partitionValue terms, leaving the cost difference as sole discriminant.
The mention-some advantage is exactly one unit of cost: the savings from mentioning one fewer shop.