Documentation

Linglib.Phenomena.Clarification.Studies.TsvilodubEtAl2026

Tsvilodub, Mulligan, Snider, Hawkins & Franke (2026) #

@cite{tsvilodub-etal-2026}

Act or Clarify? Modeling Sensitivity to Uncertainty and Cost in Communication.

Overview #

When should an agent ask a clarification question (CQ) vs. act under uncertainty? This paper predicts and confirms that the decision depends on both uncertainty (ε) about the interlocutor's goals and the cost (δ) of available actions. The interaction is captured by expected regret (= EVPI, @cite{raiffa-schlaifer-1961}).

Two-layer model #

The paper's model is a direct softmax over expected utility (NOT RSA):

  1. CQ gate: P(CQ) = Logistic(τ · (ExpRegret(r*) − c)) — when to clarify
  2. Behavioral policy: π(r) = SoftMax(α · EU(r)) — what to say

We reinterpret the behavioral policy through RSA, connecting it to @cite{hawkins-etal-2025}'s action-utility scoring.

Connection to @cite{hawkins-etal-2025} #

The behavioral policy π corresponds to @cite{hawkins-etal-2025}'s respondent R₁ at β = 1, w_c = 0: pure action utility. R₁ scores responses by:

(1 − β) · log L0(w|r) + β · E_D[V(D, r)] − w_c · C(r)

At β = 1, w_c = 0 this reduces to V(D, r) — the action value. For the bartender, V(g, r) = U(g, r) is the paper's utility table: 1 for matching mention-some, 0 for mismatching, 1−δ for exhaustive.

The L0 gate (if L0 = 0 then 0) ensures truth-conditional consistency: S1 never recommends a response that doesn't apply to the goal. This is structurally identical to @cite{hawkins-etal-2025}'s responseTruth gate.

Action utility and δ-sensitivity #

With action-utility scoring, S1 preferences are δ-sensitive: the exhaustive response's S1 score is exp(α · (1−δ)), which decreases with δ. At high δ (large option space), S1 strongly prefers targeted responses over exh. At low δ (small option space), exh is nearly as good as targeted — S1's preference weakens.

This connects to the paper's experimental predictions (Exp 1: TL;JUSTASK, NONEEDTOASK, JUSTLISTTHEMALL, TOOMANYTOLIST; Exp 2: WORTHASKING, etc.) through the S1 behavioral policy:

The questioner's latent goal (e.g., preferred drink category).

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      Equations
      • One or more equations did not get rendered due to their size.

      Available non-CQ responses.

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          Equations
          • One or more equations did not get rendered due to their size.
          noncomputable def Phenomena.Clarification.Studies.TsvilodubEtAl2026.mkBartenderRSA (exhVal prior_g1 prior_g2 : ) (hExh : 0 exhVal) (h1 : 0 prior_g1) (h2 : 0 prior_g2) :

          Bartender RSA with action-utility scoring.

          Parameterized by exhVal = 1 − δ (utility of exhaustive response) and goal prior weights.

          s1Score(L0, α, g, r) = if L0(g|r) = 0 then 0 else exp(α · U(g, r))

          This is @cite{hawkins-etal-2025}'s priorPQScore at β = 1, w_c = 0: pure action utility. The L0 gate ensures truth-conditional consistency.

          Note: the paper's α has prior N(5, 1); we use α = 1 here. The qualitative predictions (all > theorems below) hold for any α > 0.

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            Large option space (δ = 0.32), low uncertainty (ε = 0.17).

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Large option space (δ = 0.32), high uncertainty (ε = 0.49).

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                Small option space (δ = 0.11), low uncertainty (ε = 0.17).

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Small option space (δ = 0.11), high uncertainty (ε = 0.49).

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    S1 captures the behavioral policy: how the responder acts when they know the goal. With action-utility scoring, S1 preferences are δ-sensitive: the exhaustive response scores exp(α · (1−δ)), which drops with δ.

                    S1 facing g₁ prefers ms1 over exh in the large option space. U(g₁, ms1) = 1 > U(g₁, exh) = 0.68.

                    S1 facing g₁ prefers ms1 over exh in the small option space too. U(g₁, ms1) = 1 > U(g₁, exh) = 0.89. The gap is smaller than large δ.

                    S1 facing g₁ never produces ms2 (mismatching response). L0(g₁|ms2) = 0, so the L0 gate zeros the score.

                    The key RSA prediction: exh is relatively more viable in the small option space (δ = 0.11) than the large (δ = 0.32). This captures the EVPI effect through action utility: when the safe response is cheap, there's less need to clarify.

                    WorthAsking at S1 level: exh is more viable in small option space. S1(exh|g₁, small) > S1(exh|g₁, large) because exp(0.89α) > exp(0.68α).

                    Targeted responses (ms1, ms2) are fully informative: S1 never produces ms1 for g₂ (L0 gate), so L1(g₁|ms1) = 1 regardless of prior or δ.

                    Exhaustive response transmits the prior: S1(exh|g₁) = S1(exh|g₂) (symmetric utility), so L1(g|exh) ∝ P(g). Under asymmetric prior (ε < 0.5), exh reveals the listener's prior belief about the goal.

                    L1 hearing exh leans toward g₁ (prior = 83:17). Since S1(exh|g₁) = S1(exh|g₂), L1(g|exh) ∝ P(g) — exh transmits the prior rather than being uninformative.