RSAEval — Sound QInterval-Based RSA Evaluation #
QInterval evaluator for the L0→S1→L1 RSA pipeline, with bottom-up soundness proofs composed from QInterval operation lemmas.
Replaces the sorry'd Bounds-based pipeline in RSAVerify.lean.
Architecture #
Each evaluation function computes a QInterval that provably contains
the corresponding ℝ value from RSAConfig:
evalL0Exact : exact ℚ (no interval) ✓ L0agent.policy
evalS1Score : QInterval ✓ S1ScoreSpec.toS1Score
evalS1Policy : QInterval ✓ S1agent.policy
evalL1Score : QInterval ✓ L1agent.score
Separation checks reduce to hi₂ < lo₁ on ℚ intervals, yielding
Bool-valued functions decidable by native_decide.
Sum QIntervals over a Fintype: fold add over Finset.univ.
Equations
Instances For
Compute L0 policy as exact ℚ: meaning(l,u,w) / Σ_w' meaning(l,u,w'). Returns 0 if total is 0 (no world has nonzero meaning).
Equations
- RSA.Eval.evalL0Exact meaning l u w = if Finset.univ.sum (meaning l u) = 0 then 0 else meaning l u w / Finset.univ.sum (meaning l u)
Instances For
Compute S1 score as QInterval, dispatching on the scoring specification.
For beliefBased: exact when α ∈ ℕ (lo = hi = L0^α),
otherwise exp(α · log L0) via interval arithmetic.
For beliefAction: exact base × Padé exp discount.
For actionBased: Padé exp directly.
For beliefWeighted: full interval pipeline (sum of log intervals).
The qudProj parameter applies QUD projection to L0 before scoring:
when some project, L0(w|u) becomes Σ_{w'∼w} L0(w'|u).
Equations
- One or more equations did not get rendered due to their size.
- RSA.Eval.evalS1Score RSA.S1ScoreSpec.beliefBased meaning α l w u = RSA.Eval.powQInterval✝ (RSA.Eval.evalL0Exact meaning l u w) α
- RSA.Eval.evalS1Score RSA.S1ScoreSpec.beliefBased meaning α l w u (some project) = RSA.Eval.powQInterval✝ ({w' : W | project w' l = project w l}.sum (RSA.Eval.evalL0Exact meaning l u)) α
- RSA.Eval.evalS1Score (RSA.S1ScoreSpec.actionBased cost) meaning α l w u = Linglib.Interval.expPoint (α * (RSA.Eval.evalL0Exact meaning l u w - cost u))
Instances For
Compute S1 policy as QInterval: score(u) / Σ_{u'} score(u').
Conservative fallback to [0, 1] when total.lo ≤ 0 (can't prove positivity). Exact [1, 1] when sole-utterance shortcut applies.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Compute L1 unnormalized score as QInterval: L1_score(u,w) = worldPrior(w) · Σ_l latentPrior(w,l) · S1_policy(l,w,u).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Check that L1 score for (u₁,w₁) is strictly greater than for (u₂,w₂).
Equations
- RSA.Eval.checkL1ScoreGt d u₁ w₁ u₂ w₂ = decide ((RSA.Eval.evalL1Score d u₂ w₂).hi < (RSA.Eval.evalL1Score d u₁ w₁).lo)
Instances For
Check that L1 score for (u₁,w₁) is NOT strictly greater than for (u₂,w₂).
Equations
- RSA.Eval.checkL1ScoreNotGt d u₁ w₁ u₂ w₂ = decide ((RSA.Eval.evalL1Score d u₁ w₁).hi ≤ (RSA.Eval.evalL1Score d u₂ w₂).lo)
Instances For
Check that S1 policy for (l,w,u₁) is strictly greater than for (l,w,u₂). Same (l,w) → same denominator → score ordering = policy ordering.
Equations
- RSA.Eval.checkS1PolicyGt d l w u₁ u₂ = decide ((RSA.Eval.evalS1Score d.s1Spec d.meaning d.α l w u₂ d.qudProject).hi < (RSA.Eval.evalS1Score d.s1Spec d.meaning d.α l w u₁ d.qudProject).lo)
Instances For
Check that S1 policy for (l,w,u₁) is NOT strictly greater than for (l,w,u₂).
Equations
- One or more equations did not get rendered due to their size.
Instances For
L0 exact computation matches the ℝ L0 policy.
S1 score interval contains the ℝ S1 score (after effectiveL0 projection).
S1 policy interval contains the ℝ S1 policy.
L1 score interval contains the ℝ L1 unnormalized score.
L1 score comparison implies L1 posterior comparison. The shared normalizer Z(u) = Σ_w L1_score(u,w) cancels.
Same-utterance L1 comparison: L1(u, w₁) > L1(u, w₂). Follows from score comparison since both share the normalizer Z(u).
If checkL1ScoreNotGt returns true, then ¬(L1 u w₁ > L1 u w₂).
If checkS1PolicyGt returns true, then S1 l w u₁ > S1 l w u₂.
If checkS1PolicyNotGt returns true, then ¬(S1 l w u₁ > S1 l w u₂).
Compute S2 score as QInterval, dispatching on S2ScoreSpec. Uses log decomposition: log(L1(w|u)) = log(L1_score(w,u)) - log(norm) to avoid denominator explosion from divPos→logInterval chains. Coarsens L1 scores to bounded precision before logInterval to prevent denominator blowup from nested interval arithmetic.
Equations
- One or more equations did not get rendered due to their size.
- RSA.Eval.evalS2Score d RSA.S2ScoreSpec.endorsement w u = RSA.Eval.evalL1Score d u w
Instances For
Check that S2 score for (w,u₁) is strictly greater than for (w,u₂).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Soundness: if checkS2ScoreGt returns true, then S2(u₁|w) > S2(u₂|w).