Documentation

Linglib.Phenomena.Clarification.Studies.TsvilodubEtAl2026

Tsvilodub, Mulligan, Snider, Hawkins & Franke (2026) #

@cite{tsvilodub-etal-2026}

Act or Clarify? Modeling Sensitivity to Uncertainty and Cost in Communication.

Overview #

When should an agent ask a clarification question (CQ) vs. act under uncertainty? This paper predicts and confirms that the decision depends on both uncertainty (ε) about the interlocutor's goals and the cost (δ) of available actions. The interaction is captured by expected regret (= EVPI, @cite{raiffa-schlaifer-1961}).

Two-layer model #

The paper's model is a direct softmax over expected utility (NOT RSA):

CQ gate: P(CQ) = Logistic(τ · (ExpRegret(r*) − c)) — when to clarify
Behavioral policy: π(r) = SoftMax(α · EU(r)) — what to say

We reinterpret the behavioral policy through RSA, connecting it to @cite{hawkins-etal-2025}'s action-utility scoring.

Connection to @cite{hawkins-etal-2025} #

The behavioral policy π corresponds to @cite{hawkins-etal-2025}'s respondent R₁ at β = 1, w_c = 0: pure action utility. R₁ scores responses by:

(1 − β) · log L0(w|r) + β · E_D[V(D, r)] − w_c · C(r)

At β = 1, w_c = 0 this reduces to V(D, r) — the action value. For the bartender, V(g, r) = U(g, r) is the paper's utility table: 1 for matching mention-some, 0 for mismatching, 1−δ for exhaustive.

The L0 gate (if L0 = 0 then 0) ensures truth-conditional consistency: S1 never recommends a response that doesn't apply to the goal. This is structurally identical to @cite{hawkins-etal-2025}'s responseTruth gate.

Action utility and δ-sensitivity #

With action-utility scoring, S1 preferences are δ-sensitive: the exhaustive response's S1 score is exp(α · (1−δ)), which decreases with δ. At high δ (large option space), S1 strongly prefers targeted responses over exh. At low δ (small option space), exh is nearly as good as targeted — S1's preference weakens.

This connects to the paper's experimental predictions (Exp 1: TL;JUSTASK, NONEEDTOASK, JUSTLISTTHEMALL, TOOMANYTOLIST; Exp 2: WORTHASKING, etc.) through the S1 behavioral policy:

At high δ, exh is expensive → S1 concentrates on targeted responses → commitment under uncertainty is riskier → more to gain from clarifying
At low δ, exh is cheap → S1 assigns near-equal weight to exh and targeted → safe to commit even under uncertainty → less need to clarify

inductive Phenomena.Clarification.Studies.TsvilodubEtAl2026.Goal :

The questioner's latent goal (e.g., preferred drink category).

g₁ : Goal
g₂ : Goal

Instances For

instance Phenomena.Clarification.Studies.TsvilodubEtAl2026.instDecidableEqGoal :

DecidableEq Goal

Equations

Phenomena.Clarification.Studies.TsvilodubEtAl2026.instDecidableEqGoal x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

instance Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprGoal :

Equations

Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprGoal = { reprPrec := Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprGoal.repr }

def Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprGoal.repr :

Goal → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

instance Phenomena.Clarification.Studies.TsvilodubEtAl2026.instFintypeGoal :

Equations

One or more equations did not get rendered due to their size.

inductive Phenomena.Clarification.Studies.TsvilodubEtAl2026.Response :

Available non-CQ responses.

ms1 : Response
ms2 : Response
exh : Response

Instances For

instance Phenomena.Clarification.Studies.TsvilodubEtAl2026.instDecidableEqResponse :

DecidableEq Response

Equations

Phenomena.Clarification.Studies.TsvilodubEtAl2026.instDecidableEqResponse x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

instance Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprResponse :

Equations

Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprResponse = { reprPrec := Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprResponse.repr }

def Phenomena.Clarification.Studies.TsvilodubEtAl2026.instReprResponse.repr :

Response → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

instance Phenomena.Clarification.Studies.TsvilodubEtAl2026.instFintypeResponse :

Fintype Response

Equations

One or more equations did not get rendered due to their size.

def Phenomena.Clarification.Studies.TsvilodubEtAl2026.respApplies :

Response → Goal → Bool

Boolean applicability: does response r address goal g? Parallel to @cite{hawkins-etal-2025}'s responseTruth.

Equations

Instances For

noncomputable def Phenomena.Clarification.Studies.TsvilodubEtAl2026.mkBartenderRSA (exhVal prior_g1 prior_g2 : ℝ) (hExh : 0 ≤ exhVal) (h1 : 0 ≤ prior_g1) (h2 : 0 ≤ prior_g2) :

RSA.RSAConfig Response Goal

Bartender RSA with action-utility scoring.

Parameterized by exhVal = 1 − δ (utility of exhaustive response) and goal prior weights.

s1Score(L0, α, g, r) = if L0(g|r) = 0 then 0 else exp(α · U(g, r))

This is @cite{hawkins-etal-2025}'s priorPQScore at β = 1, w_c = 0: pure action utility. The L0 gate ensures truth-conditional consistency.

Note: the paper's α has prior N(5, 1); we use α = 1 here. The qualitative predictions (all > theorems below) hold for any α > 0.

Equations

One or more equations did not get rendered due to their size.

Instances For

noncomputable def Phenomena.Clarification.Studies.TsvilodubEtAl2026.cfgLargeLow :

RSA.RSAConfig Response Goal

Large option space (δ = 0.32), low uncertainty (ε = 0.17).

Equations

One or more equations did not get rendered due to their size.

Instances For

noncomputable def Phenomena.Clarification.Studies.TsvilodubEtAl2026.cfgLargeHigh :

RSA.RSAConfig Response Goal

Large option space (δ = 0.32), high uncertainty (ε = 0.49).

Equations

One or more equations did not get rendered due to their size.

Instances For

noncomputable def Phenomena.Clarification.Studies.TsvilodubEtAl2026.cfgSmallLow :

RSA.RSAConfig Response Goal

Small option space (δ = 0.11), low uncertainty (ε = 0.17).

Equations

One or more equations did not get rendered due to their size.

Instances For

noncomputable def Phenomena.Clarification.Studies.TsvilodubEtAl2026.cfgSmallHigh :

RSA.RSAConfig Response Goal

Small option space (δ = 0.11), high uncertainty (ε = 0.49).

Equations

One or more equations did not get rendered due to their size.

Instances For

S1 captures the behavioral policy: how the responder acts when they know the goal. With action-utility scoring, S1 preferences are δ-sensitive: the exhaustive response scores exp(α · (1−δ)), which drops with δ.

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.S1_large_g1_prefers_ms1 :

cfgLargeLow.S1 () Goal.g₁ Response.ms1 > cfgLargeLow.S1 () Goal.g₁ Response.exh

S1 facing g₁ prefers ms1 over exh in the large option space. U(g₁, ms1) = 1 > U(g₁, exh) = 0.68.

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.S1_small_g1_prefers_ms1 :

cfgSmallLow.S1 () Goal.g₁ Response.ms1 > cfgSmallLow.S1 () Goal.g₁ Response.exh

S1 facing g₁ prefers ms1 over exh in the small option space too. U(g₁, ms1) = 1 > U(g₁, exh) = 0.89. The gap is smaller than large δ.

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.S1_g1_avoids_ms2 :

cfgLargeLow.S1 () Goal.g₁ Response.ms1 > cfgLargeLow.S1 () Goal.g₁ Response.ms2

S1 facing g₁ never produces ms2 (mismatching response). L0(g₁|ms2) = 0, so the L0 gate zeros the score.

The key RSA prediction: exh is relatively more viable in the small option space (δ = 0.11) than the large (δ = 0.32). This captures the EVPI effect through action utility: when the safe response is cheap, there's less need to clarify.

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.S1_small_exh_more_viable :

cfgSmallLow.S1 () Goal.g₁ Response.exh > cfgLargeLow.S1 () Goal.g₁ Response.exh

WorthAsking at S1 level: exh is more viable in small option space. S1(exh|g₁, small) > S1(exh|g₁, large) because exp(0.89α) > exp(0.68α).

Targeted responses (ms1, ms2) are fully informative: S1 never produces ms1 for g₂ (L0 gate), so L1(g₁|ms1) = 1 regardless of prior or δ.

Exhaustive response transmits the prior: S1(exh|g₁) = S1(exh|g₂) (symmetric utility), so L1(g|exh) ∝ P(g). Under asymmetric prior (ε < 0.5), exh reveals the listener's prior belief about the goal.

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.L1_ms1_infers_g1 :

cfgLargeLow.L1 Response.ms1 Goal.g₁ > cfgLargeLow.L1 Response.ms1 Goal.g₂

L1 hearing ms1 infers g₁ with certainty.

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.L1_ms2_infers_g2 :

cfgLargeLow.L1 Response.ms2 Goal.g₂ > cfgLargeLow.L1 Response.ms2 Goal.g₁

L1 hearing ms2 infers g₂ (symmetric).

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.L1_exh_transmits_prior :

cfgLargeLow.L1 Response.exh Goal.g₁ > cfgLargeLow.L1 Response.exh Goal.g₂

L1 hearing exh leans toward g₁ (prior = 83:17). Since S1(exh|g₁) = S1(exh|g₂), L1(g|exh) ∝ P(g) — exh transmits the prior rather than being uninformative.

theorem Phenomena.Clarification.Studies.TsvilodubEtAl2026.L1_high_ms1_still_certain :

cfgLargeHigh.L1 Response.ms1 Goal.g₁ > cfgLargeHigh.L1 Response.ms1 Goal.g₂

Targeted responses remain fully informative at high uncertainty.