Documentation

Linglib.Core.Agent.SoftmaxOptimality

Softmax Optimality: The Decision-Theoretic Foundation of Soft-Rational Agents #

The deepest claim in the RSA architecture is that the softmax agent IS the entropy-regularized expected-utility maximizer. This file makes that connection explicit by bridging DecisionProblem (ℚ) from DecisionTheory.lean with RationalAction (ℝ) from RationalAction.lean.

Results #

  1. Construction (§1): A DecisionProblem yields a RationalAction via softmax over expected utility, bridging the ℚ/ℝ gap.

  2. Monotonicity (§2): Higher EU ⟹ higher choice probability (α > 0).

  3. Entropy-regularized optimality (§3): The softmax agent uniquely maximizes ∑ p(a)·EU(a) + (1/α)·H(p) — the Gibbs Variational Principle in decision-theoretic language.

  4. Hard-max convergence (§4): As α → ∞, the softmax agent converges to the EU-optimal deterministic policy.

  5. Round-trip (§5): Extracting utility from a softmax agent and reconstructing recovers the same policy.

noncomputable def Core.expectedUtilityR {W : Type u_1} {A : Type u_2} [Fintype W] (dp : DecisionTheory.DecisionProblem W A) (a : A) :

Expected utility cast to ℝ for interfacing with softmax/RationalAction.

Equations
Instances For
    theorem Core.expectedUtilityR_nonneg {W : Type u_1} {A : Type u_2} [Fintype W] (dp : DecisionTheory.DecisionProblem W A) (a : A) (hpr : ∀ (w : W), 0 dp.prior w) (hu : ∀ (w : W), 0 dp.utility w a) :

    EU in ℝ is non-negative when utility and prior are non-negative.

    ℚ-ordering of EU is preserved under the cast to ℝ.

    noncomputable def Core.RationalAction.fromDecisionProblem {W : Type u_1} {A : Type u_2} [Fintype W] [Fintype A] (dp : DecisionTheory.DecisionProblem W A) (α : ) :

    Softmax agent from a decision problem: score(a) = exp(α · EU(a)).

    The state type is Unit because the decision problem's prior already encodes the agent's beliefs — there is no external state to condition on.

    Equations
    Instances For
      theorem Core.fromDP_policy_mono {W : Type u_1} {A : Type u_2} [Fintype W] [Fintype A] [Nonempty A] (dp : DecisionTheory.DecisionProblem W A) {α : } ( : 0 < α) (a₁ a₂ : A) (h : expectedUtilityR dp a₁ expectedUtilityR dp a₂) :

      Higher EU implies higher choice probability (α > 0).

      theorem Core.fromDP_policy_strict_mono {W : Type u_1} {A : Type u_2} [Fintype W] [Fintype A] [Nonempty A] (dp : DecisionTheory.DecisionProblem W A) {α : } ( : 0 < α) (a₁ a₂ : A) (h : expectedUtilityR dp a₁ < expectedUtilityR dp a₂) :

      Strict version: strictly higher EU implies strictly higher probability.

      theorem Core.softmax_maximizes_EU_plus_entropy {W : Type u_1} {A : Type u_2} [Fintype W] [Fintype A] [Nonempty A] (dp : DecisionTheory.DecisionProblem W A) {α : } ( : 0 < α) (p : A) (hp_nn : ∀ (a : A), 0 p a) (hp_sum : a : A, p a = 1) :
      entropyRegObjective (fun (a : A) => expectedUtilityR dp a) α p entropyRegObjective (fun (a : A) => expectedUtilityR dp a) α fun (x : A) => (RationalAction.fromDecisionProblem dp α).policy () x

      The softmax agent uniquely maximizes entropy-regularized expected utility: p* = argmax_p [∑ p(a)·EU(a) + (1/α)·H(p)].

      This is the decision-theoretic content of the Gibbs Variational Principle. The objective entropyRegObjective from RationalAction.lean is exactly ∑ p(a)·s(a) + (1/α)·H(p) — we just instantiate s = EU.

      theorem Core.softmax_unique_EU_maximizer {W : Type u_1} {A : Type u_2} [Fintype W] [Fintype A] [Nonempty A] (dp : DecisionTheory.DecisionProblem W A) {α : } ( : 0 < α) (p : A) (hp_nn : ∀ (a : A), 0 p a) (hp_sum : a : A, p a = 1) (h_max : entropyRegObjective (fun (a : A) => expectedUtilityR dp a) α p = entropyRegObjective (fun (a : A) => expectedUtilityR dp a) α (softmax (fun (a : A) => expectedUtilityR dp a) α)) :
      p = softmax (fun (a : A) => expectedUtilityR dp a) α

      The softmax agent is the UNIQUE maximizer: any distribution achieving the same objective value must equal the softmax policy.

      theorem Core.fromDP_converges_to_optimal {W : Type u_1} {A : Type u_2} [Fintype W] [Fintype A] [Nonempty A] [DecidableEq A] (dp : DecisionTheory.DecisionProblem W A) (a_opt : A) (h_opt : ∀ (a : A), a a_optexpectedUtilityR dp a < expectedUtilityR dp a_opt) :

      As α → ∞, the softmax agent converges to the EU-optimal deterministic policy: the probability of the unique EU-maximizing action → 1.

      This is the decision-theoretic content of tendsto_softmax_infty_at_max from Softmax.Limits.

      theorem Core.fromDP_nonoptimal_vanishes {W : Type u_1} {A : Type u_2} [Fintype W] [Fintype A] [Nonempty A] [DecidableEq A] (dp : DecisionTheory.DecisionProblem W A) (a_opt a : A) (h_opt : ∀ (a' : A), a' a_optexpectedUtilityR dp a' < expectedUtilityR dp a_opt) (ha : a a_opt) :

      As α → ∞, any non-optimal action gets probability → 0.

      noncomputable def Core.RationalAction.toUtility {A : Type u_1} [Fintype A] (ra : RationalAction Unit A) (α : ) (a : A) (_ha : 0 < ra.score () a) :

      Extract the implicit utility function from a softmax-parameterized agent.

      If ra = fromSoftmax u α, then toUtility ra α a = u a + const (up to the translation invariance of softmax).

      Equations
      Instances For
        theorem Core.fromSoftmax_toUtility_eq {A : Type u_1} [Fintype A] (utility : UnitA) (α : ) ( : α 0) (a : A) :
        (RationalAction.fromSoftmax utility α).toUtility α a = utility () a

        Round-trip: constructing via fromSoftmax and extracting utility recovers the original utility (the log inverts the exp, and dividing by α cancels).

        theorem Core.fromSoftmax_roundtrip_policy {A : Type u_1} [Fintype A] [Nonempty A] (utility : UnitA) (α : ) ( : α 0) (a : A) :
        (RationalAction.fromSoftmax utility α).policy () a = (RationalAction.fromSoftmax (fun (x : Unit) (a' : A) => (RationalAction.fromSoftmax utility α).toUtility α a' ) α).policy () a

        Round-trip for the full agent: fromSoftmax ∘ toUtility recovers the same policy (the softmax is translation-invariant so the constant cancels).