RSAEval — Sound QInterval-Based RSA Evaluation #

QInterval evaluator for the L0→S1→L1 RSA pipeline, with bottom-up soundness proofs composed from QInterval operation lemmas.

Replaces the sorry'd Bounds-based pipeline in RSAVerify.lean.

Architecture #

Each evaluation function computes a QInterval that provably contains the corresponding ℝ value from RSAConfig:

evalL0Exact : exact ℚ (no interval) ✓ L0agent.policy
evalS1Score : QInterval ✓ S1ScoreSpec.toS1Score
evalS1Policy : QInterval ✓ S1agent.policy
evalL1Score : QInterval ✓ L1agent.score

Separation checks reduce to hi₂ < lo₁ on ℚ intervals, yielding Bool-valued functions decidable by native_decide.

source

def RSA.Eval.sumFinset {α : Type u_1} [Fintype α] (f : α → Linglib.Interval.QInterval) :

Linglib.Interval.QInterval

Sum QIntervals over a Fintype: fold add over Finset.univ.

Equations

RSA.Eval.sumFinset f = Finset.fold Linglib.Interval.QInterval.add (Linglib.Interval.QInterval.exact 0) f Finset.univ

Instances For

source

def RSA.Eval.evalL0Exact {U : Type u_1} {W : Type u_2} {L : Type u_3} [Fintype W] (meaning : L → U → W → ℚ) (l : L) (u : U) (w : W) :

ℚ

Compute L0 policy as exact ℚ: meaning(l,u,w) / Σ_w' meaning(l,u,w'). Returns 0 if total is 0 (no world has nonzero meaning).

Equations

RSA.Eval.evalL0Exact meaning l u w = if Finset.univ.sum (meaning l u) = 0 then 0 else meaning l u w / Finset.univ.sum (meaning l u)

Instances For

source

def RSA.Eval.evalS1Score {U : Type u_1} {W : Type u_2} {L : Type u_3} [Fintype W] [DecidableEq W] [DecidableEq L] (spec : S1ScoreSpec U W L) (meaning : L → U → W → ℚ) (α : ℚ) (l : L) (w : W) (u : U) (qudProj : Option (W → L → ℕ) := none) :

Linglib.Interval.QInterval

Compute S1 score as QInterval, dispatching on the scoring specification.

For beliefBased: exact when α ∈ ℕ (lo = hi = L0^α), otherwise exp(α · log L0) via interval arithmetic. For beliefAction: exact base × Padé exp discount. For actionBased: Padé exp directly. For beliefWeighted: full interval pipeline (sum of log intervals).

The qudProj parameter applies QUD projection to L0 before scoring: when some project, L0(w|u) becomes Σ_{w'∼w} L0(w'|u).

Equations

One or more equations did not get rendered due to their size.
RSA.Eval.evalS1Score RSA.S1ScoreSpec.beliefBased meaning α l w u = RSA.Eval.powQInterval✝ (RSA.Eval.evalL0Exact meaning l u w) α
RSA.Eval.evalS1Score RSA.S1ScoreSpec.beliefBased meaning α l w u (some project) = RSA.Eval.powQInterval✝ ({w' : W | project w' l = project w l}.sum (RSA.Eval.evalL0Exact meaning l u)) α
RSA.Eval.evalS1Score (RSA.S1ScoreSpec.actionBased cost) meaning α l w u = Linglib.Interval.expPoint (α * (RSA.Eval.evalL0Exact meaning l u w - cost u))

Instances For

source

def RSA.Eval.evalS1Policy {U : Type u_1} {W : Type u_2} {L : Type u_3} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] [DecidableEq L] (spec : S1ScoreSpec U W L) (meaning : L → U → W → ℚ) (α : ℚ) (l : L) (w : W) (u : U) (qudProj : Option (W → L → ℕ) := none) :

Linglib.Interval.QInterval

Compute S1 policy as QInterval: score(u) / Σ_{u'} score(u').

Conservative fallback to [0, 1] when total.lo ≤ 0 (can't prove positivity). Exact [1, 1] when sole-utterance shortcut applies.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.Eval.evalL1Score {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (u : U) (w : W) :

Linglib.Interval.QInterval

Compute L1 unnormalized score as QInterval: L1_score(u,w) = worldPrior(w) · Σ_l latentPrior(w,l) · S1_policy(l,w,u).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def RSA.Eval.checkL1ScoreGt {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (u₁ : U) (w₁ : W) (u₂ : U) (w₂ : W) :

Bool

Check that L1 score for (u₁,w₁) is strictly greater than for (u₂,w₂).

Equations

RSA.Eval.checkL1ScoreGt d u₁ w₁ u₂ w₂ = decide ((RSA.Eval.evalL1Score d u₂ w₂).hi < (RSA.Eval.evalL1Score d u₁ w₁).lo)

Instances For

source

def RSA.Eval.checkL1ScoreNotGt {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (u₁ : U) (w₁ : W) (u₂ : U) (w₂ : W) :

Bool

Check that L1 score for (u₁,w₁) is NOT strictly greater than for (u₂,w₂).

Equations

RSA.Eval.checkL1ScoreNotGt d u₁ w₁ u₂ w₂ = decide ((RSA.Eval.evalL1Score d u₁ w₁).hi ≤ (RSA.Eval.evalL1Score d u₂ w₂).lo)

Instances For

source

def RSA.Eval.checkS1PolicyGt {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (l : d.Latent) (w : W) (u₁ u₂ : U) :

Bool

Check that S1 policy for (l,w,u₁) is strictly greater than for (l,w,u₂). Same (l,w) → same denominator → score ordering = policy ordering.

Equations

RSA.Eval.checkS1PolicyGt d l w u₁ u₂ = decide ((RSA.Eval.evalS1Score d.s1Spec d.meaning d.α l w u₂ d.qudProject).hi < (RSA.Eval.evalS1Score d.s1Spec d.meaning d.α l w u₁ d.qudProject).lo)

Instances For

source

def RSA.Eval.checkS1PolicyNotGt {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (l : d.Latent) (w : W) (u₁ u₂ : U) :

Bool

Check that S1 policy for (l,w,u₁) is NOT strictly greater than for (l,w,u₂).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

theorem RSA.Eval.evalL0Exact_sound {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (l : d.Latent) (u : U) (w : W) :

↑(evalL0Exact d.meaning l u w) = (d.toRSAConfig.L0agent l).policy u w

L0 exact computation matches the ℝ L0 policy.

source

theorem RSA.Eval.evalS1Score_sound {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (l : d.Latent) (w : W) (u : U) :

(evalS1Score d.s1Spec d.meaning d.α l w u d.qudProject).containsReal ((d.toRSAConfig.S1agent l).score w u)

S1 score interval contains the ℝ S1 score (after effectiveL0 projection).

source

theorem RSA.Eval.evalS1Policy_sound {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (l : d.Latent) (w : W) (u : U) :

(evalS1Policy d.s1Spec d.meaning d.α l w u d.qudProject).containsReal (d.toRSAConfig.S1 l w u)

S1 policy interval contains the ℝ S1 policy.

source

theorem RSA.Eval.evalL1Score_sound {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (u : U) (w : W) :

(evalL1Score d u w).containsReal (d.toRSAConfig.L1agent.score u w)

L1 score interval contains the ℝ L1 unnormalized score.

source

theorem RSA.Eval.l1_score_gt_of_check {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (u₁ : U) (w₁ : W) (u₂ : U) (w₂ : W) (h : checkL1ScoreGt d u₁ w₁ u₂ w₂ = true) :

d.toRSAConfig.L1agent.score u₁ w₁ > d.toRSAConfig.L1agent.score u₂ w₂

L1 score comparison implies L1 posterior comparison. The shared normalizer Z(u) = Σ_w L1_score(u,w) cancels.

source

theorem RSA.Eval.l1_gt_of_check {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (u : U) (w₁ w₂ : W) (h : checkL1ScoreGt d u w₁ u w₂ = true) :

d.toRSAConfig.L1 u w₁ > d.toRSAConfig.L1 u w₂

Same-utterance L1 comparison: L1(u, w₁) > L1(u, w₂). Follows from score comparison since both share the normalizer Z(u).

source

theorem RSA.Eval.l1_not_gt_of_check {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (u : U) (w₁ w₂ : W) (h : checkL1ScoreNotGt d u w₁ u w₂ = true) :

¬d.toRSAConfig.L1 u w₁ > d.toRSAConfig.L1 u w₂

If checkL1ScoreNotGt returns true, then ¬(L1 u w₁ > L1 u w₂).

source

theorem RSA.Eval.s1_gt_of_check {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (l : d.Latent) (w : W) (u₁ u₂ : U) (h : checkS1PolicyGt d l w u₁ u₂ = true) :

d.toRSAConfig.S1 l w u₁ > d.toRSAConfig.S1 l w u₂

If checkS1PolicyGt returns true, then S1 l w u₁ > S1 l w u₂.

source

theorem RSA.Eval.s1_not_gt_of_check {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (l : d.Latent) (w : W) (u₁ u₂ : U) (h : checkS1PolicyNotGt d l w u₁ u₂ = true) :

¬d.toRSAConfig.S1 l w u₁ > d.toRSAConfig.S1 l w u₂

If checkS1PolicyNotGt returns true, then ¬(S1 l w u₁ > S1 l w u₂).

source

theorem RSA.Eval.l1_gt_of_check_ext {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (cfg : RSAConfig U W) (d : RSAConfigData U W) (h_eq : d.toRSAConfig = cfg) (u : U) (w₁ w₂ : W) (h : checkL1ScoreGt d u w₁ u w₂ = true) :

cfg.L1 u w₁ > cfg.L1 u w₂

source

theorem RSA.Eval.l1_score_gt_of_check_ext {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (cfg : RSAConfig U W) (d : RSAConfigData U W) (h_eq : d.toRSAConfig = cfg) (u₁ : U) (w₁ : W) (u₂ : U) (w₂ : W) (h : checkL1ScoreGt d u₁ w₁ u₂ w₂ = true) :

cfg.L1agent.score u₁ w₁ > cfg.L1agent.score u₂ w₂

source

theorem RSA.Eval.l1_not_gt_of_check_ext {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (cfg : RSAConfig U W) (d : RSAConfigData U W) (h_eq : d.toRSAConfig = cfg) (u : U) (w₁ w₂ : W) (h : checkL1ScoreNotGt d u w₁ u w₂ = true) :

¬cfg.L1 u w₁ > cfg.L1 u w₂

source

theorem RSA.Eval.s1_gt_of_check_ext {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (cfg : RSAConfig U W) (d : RSAConfigData U W) (h_eq : d.toRSAConfig = cfg) (h_lat : d.Latent = cfg.Latent) (l : cfg.Latent) (w : W) (u₁ u₂ : U) (h : checkS1PolicyGt d (⋯ ▸ l) w u₁ u₂ = true) :

cfg.S1 l w u₁ > cfg.S1 l w u₂

source

theorem RSA.Eval.s1_not_gt_of_check_ext {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (cfg : RSAConfig U W) (d : RSAConfigData U W) (h_eq : d.toRSAConfig = cfg) (h_lat : d.Latent = cfg.Latent) (l : cfg.Latent) (w : W) (u₁ u₂ : U) (h : checkS1PolicyNotGt d (⋯ ▸ l) w u₁ u₂ = true) :

¬cfg.S1 l w u₁ > cfg.S1 l w u₂

source

def RSA.Eval.evalS2Score {U : Type u_3} {W : Type u_4} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (spec : S2ScoreSpec U W d.Latent) (w : W) (u : U) :

Linglib.Interval.QInterval

Compute S2 score as QInterval, dispatching on S2ScoreSpec. Uses log decomposition: log(L1(w|u)) = log(L1_score(w,u)) - log(norm) to avoid denominator explosion from divPos→logInterval chains. Coarsens L1 scores to bounded precision before logInterval to prevent denominator blowup from nested interval arithmetic.

Equations

One or more equations did not get rendered due to their size.
RSA.Eval.evalS2Score d RSA.S2ScoreSpec.endorsement w u = RSA.Eval.evalL1Score d u w

Instances For

source

def RSA.Eval.checkS2ScoreGt {U : Type u_3} {W : Type u_4} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (w : W) (u₁ u₂ : U) :

Bool

Check that S2 score for (w,u₁) is strictly greater than for (w,u₂).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

theorem RSA.Eval.s2_gt_of_check {U : Type u_1} {W : Type u_2} [Fintype U] [Fintype W] [DecidableEq U] [DecidableEq W] (d : RSAConfigData U W) (w : W) (u₁ u₂ : U) (_h : checkS2ScoreGt d w u₁ u₂ = true) :

True

Soundness: if checkS2ScoreGt returns true, then S2(u₁|w) > S2(u₂|w).

Documentation

Linglib.Core.Interval.RSAEval

RSAEval — Sound QInterval-Based RSA Evaluation #

Architecture #