Tsvilodub, Mulligan, Snider, Hawkins & Franke (2026) #
@cite{tsvilodub-etal-2026}
Act or Clarify? Modeling Sensitivity to Uncertainty and Cost in Communication.
Overview #
When should an agent ask a clarification question (CQ) vs. act under uncertainty? This paper predicts and confirms that the decision depends on both uncertainty (ε) about the interlocutor's goals and the cost (δ) of available actions. The interaction is captured by expected regret (= EVPI, @cite{raiffa-schlaifer-1961}).
Two-layer model #
The paper's model is a direct softmax over expected utility (NOT RSA):
- CQ gate:
P(CQ) = Logistic(τ · (ExpRegret(r*) − c))— when to clarify - Behavioral policy:
π(r) = SoftMax(α · EU(r))— what to say
We reinterpret the behavioral policy through RSA, connecting it to @cite{hawkins-etal-2025}'s action-utility scoring.
Connection to @cite{hawkins-etal-2025} #
The behavioral policy π corresponds to @cite{hawkins-etal-2025}'s respondent R₁ at β = 1, w_c = 0: pure action utility. R₁ scores responses by:
(1 − β) · log L0(w|r) + β · E_D[V(D, r)] − w_c · C(r)
At β = 1, w_c = 0 this reduces to V(D, r) — the action value. For the
bartender, V(g, r) = U(g, r) is the paper's utility table: 1 for matching
mention-some, 0 for mismatching, 1−δ for exhaustive.
The L0 gate (if L0 = 0 then 0) ensures truth-conditional consistency:
S1 never recommends a response that doesn't apply to the goal. This is
structurally identical to @cite{hawkins-etal-2025}'s responseTruth gate.
Action utility and δ-sensitivity #
With action-utility scoring, S1 preferences are δ-sensitive: the
exhaustive response's S1 score is exp(α · (1−δ)), which decreases
with δ. At high δ (large option space), S1 strongly prefers targeted
responses over exh. At low δ (small option space), exh is nearly as
good as targeted — S1's preference weakens.
This connects to the paper's experimental predictions (Exp 1: TL;JUSTASK, NONEEDTOASK, JUSTLISTTHEMALL, TOOMANYTOLIST; Exp 2: WORTHASKING, etc.) through the S1 behavioral policy:
- At high δ, exh is expensive → S1 concentrates on targeted responses → commitment under uncertainty is riskier → more to gain from clarifying
- At low δ, exh is cheap → S1 assigns near-equal weight to exh and targeted → safe to commit even under uncertainty → less need to clarify
The questioner's latent goal (e.g., preferred drink category).
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Boolean applicability: does response r address goal g?
Parallel to @cite{hawkins-etal-2025}'s responseTruth.
Equations
- Phenomena.Clarification.Studies.TsvilodubEtAl2026.respApplies Phenomena.Clarification.Studies.TsvilodubEtAl2026.Response.ms1 Phenomena.Clarification.Studies.TsvilodubEtAl2026.Goal.g₁ = true
- Phenomena.Clarification.Studies.TsvilodubEtAl2026.respApplies Phenomena.Clarification.Studies.TsvilodubEtAl2026.Response.ms2 Phenomena.Clarification.Studies.TsvilodubEtAl2026.Goal.g₂ = true
- Phenomena.Clarification.Studies.TsvilodubEtAl2026.respApplies Phenomena.Clarification.Studies.TsvilodubEtAl2026.Response.exh x✝ = true
- Phenomena.Clarification.Studies.TsvilodubEtAl2026.respApplies x✝¹ x✝ = false
Instances For
Bartender RSA with action-utility scoring.
Parameterized by exhVal = 1 − δ (utility of exhaustive response)
and goal prior weights.
s1Score(L0, α, g, r) = if L0(g|r) = 0 then 0 else exp(α · U(g, r))
This is @cite{hawkins-etal-2025}'s priorPQScore at β = 1, w_c = 0:
pure action utility. The L0 gate ensures truth-conditional consistency.
Note: the paper's α has prior N(5, 1); we use α = 1 here. The
qualitative predictions (all > theorems below) hold for any α > 0.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Large option space (δ = 0.32), low uncertainty (ε = 0.17).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Large option space (δ = 0.32), high uncertainty (ε = 0.49).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Small option space (δ = 0.11), low uncertainty (ε = 0.17).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Small option space (δ = 0.11), high uncertainty (ε = 0.49).
Equations
- One or more equations did not get rendered due to their size.
Instances For
S1 captures the behavioral policy: how the responder acts when they know
the goal. With action-utility scoring, S1 preferences are δ-sensitive:
the exhaustive response scores exp(α · (1−δ)), which drops with δ.
S1 facing g₁ prefers ms1 over exh in the large option space. U(g₁, ms1) = 1 > U(g₁, exh) = 0.68.
S1 facing g₁ prefers ms1 over exh in the small option space too. U(g₁, ms1) = 1 > U(g₁, exh) = 0.89. The gap is smaller than large δ.
S1 facing g₁ never produces ms2 (mismatching response). L0(g₁|ms2) = 0, so the L0 gate zeros the score.
The key RSA prediction: exh is relatively more viable in the small option space (δ = 0.11) than the large (δ = 0.32). This captures the EVPI effect through action utility: when the safe response is cheap, there's less need to clarify.
WorthAsking at S1 level: exh is more viable in small option space. S1(exh|g₁, small) > S1(exh|g₁, large) because exp(0.89α) > exp(0.68α).
Targeted responses (ms1, ms2) are fully informative: S1 never produces ms1 for g₂ (L0 gate), so L1(g₁|ms1) = 1 regardless of prior or δ.
Exhaustive response transmits the prior: S1(exh|g₁) = S1(exh|g₂) (symmetric utility), so L1(g|exh) ∝ P(g). Under asymmetric prior (ε < 0.5), exh reveals the listener's prior belief about the goal.
L1 hearing ms1 infers g₁ with certainty.
L1 hearing ms2 infers g₂ (symmetric).
L1 hearing exh leans toward g₁ (prior = 83:17). Since S1(exh|g₁) = S1(exh|g₂), L1(g|exh) ∝ P(g) — exh transmits the prior rather than being uninformative.
Targeted responses remain fully informative at high uncertainty.