Documentation

Linglib.Core.Agent.RationalAction

Rational Action @cite{luce-1959} #

@cite{cover-thomas-2006} @cite{zaslavsky-hu-levy-2020} @cite{adams-messick-1958}The mathematical foundation for all soft-rational agents: RSA speakers/listeners, BToM agents, and decision-theoretic actors.

Architecture #

A RationalAction agent selects actions with probability proportional to a non-negative score function — the Luce choice rule. This is the unique choice rule satisfying IIA (independence of irrelevant alternatives): the relative probability of two actions depends only on their scores.

The key mathematical results characterizing this choice rule are:

  1. Softmax (§2): The exponential parameterization score = exp(α · utility) gives policy = softmax(utility, α). This is the standard form in RSA.

  2. Gibbs Variational Principle (§3): Softmax uniquely maximizes H(p) + α · ⟨p, s⟩ on the probability simplex. This is the mathematical foundation for RSA convergence.

  3. Maximum Entropy (§4): Softmax is the max-entropy distribution subject to an expected-utility constraint. Equivalently, it minimizes free energy (the Boltzmann distribution from statistical mechanics).

  4. Bayesian Optimality (§5): The Bayesian posterior maximizes expected log-likelihood. This is the listener half of RSA convergence.

structure Core.RationalAction (State : Type u_1) (Action : Type u_2) [Fintype Action] :
Type (max u_1 u_2)

A rational action agent: selects actions with probability ∝ score(state, action).

This is the Luce choice rule. The score function is unnormalized; normalization to a proper distribution is derived (see policy).

Key instances:

  • RSA L0: score(u, w) = prior(w) · meaning(u, w)
  • RSA S1: score(w, u) = rpow(informativity(w, u), α) · exp(-α · cost(u))
  • BToM agent: score(state, action) = exp(β · Q(state, action))
  • score : StateAction

    Unnormalized score: policy(a|s) ∝ score(s, a).

  • score_nonneg (s : State) (a : Action) : 0 self.score s a

    Scores are non-negative.

Instances For
    noncomputable def Core.RationalAction.totalScore {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :

    Total score across all actions in a state (normalization constant).

    Equations
    Instances For
      theorem Core.RationalAction.totalScore_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :
      noncomputable def Core.RationalAction.policy {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :

      Normalized policy: P(a|s) = score(s,a) / Σ_a' score(s,a'). Returns 0 for all actions if totalScore = 0.

      Equations
      Instances For
        theorem Core.RationalAction.policy_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :
        0 ra.policy s a
        theorem Core.RationalAction.policy_sum_eq_one {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (h : ra.totalScore s 0) :
        a : A, ra.policy s a = 1

        Policy sums to 1 when totalScore is nonzero.

        theorem Core.RationalAction.policy_monotone {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ ra.score s a₂) :
        ra.policy s a₁ ra.policy s a₂

        Policy monotonicity: higher score → higher probability.

        theorem Core.RationalAction.policy_eq_zero_of_score_eq_zero {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h : ra.score s a = 0) :
        ra.policy s a = 0

        Zero score implies zero policy, regardless of whether totalScore is zero.

        theorem Core.RationalAction.policy_eq_of_score_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ = ra.score s a₂) :
        ra.policy s a₁ = ra.policy s a₂

        Policy respects score equality: equal scores → equal probabilities. Follows directly from the Luce rule: both sides are score / totalScore with the same denominator.

        theorem Core.RationalAction.policy_eq_one_of_totalScore_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h_sum : ra.totalScore s = ra.score s a) (h_pos : 0 < ra.score s a) :
        ra.policy s a = 1

        When totalScore equals the score of action a, the policy for a is 1. Used by the compositional proof builder when all other scores are zero, so totalScore = score a + 0 +... + 0 = score a, making policy = 1.

        theorem Core.RationalAction.policy_not_gt_of_score_le {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ ra.score s a₂) :
        ¬ra.policy s a₁ > ra.policy s a₂

        Score ordering implies ¬(policy strict ordering). Used by compositional proof builder for ¬(L1 w₁ > L1 w₂) goals.

        theorem Core.RationalAction.policy_gt_of_score_gt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (hgt : ra.score s a₁ > ra.score s a₂) :
        ra.policy s a₁ > ra.policy s a₂

        Strict policy monotonicity: strictly higher score → strictly higher probability.

        Used by rsa_decide to eliminate shared denominator computations: when comparing policy s a₁ > policy s a₂ (same state), it suffices to show score s a₁ > score s a₂, skipping the expensive totalScore computation in the proof term.

        theorem Core.RationalAction.policy_gt_cross {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_pos₁ : 0 < ra.totalScore s₁) (h_pos₂ : 0 < ra.totalScore s₂) (h_cross : ra.score s₁ a * ra.totalScore s₂ > ra.score s₂ a * ra.totalScore s₁) :
        ra.policy s₁ a > ra.policy s₂ a

        Cross-state policy comparison: compares policy values at different states (different denominators). Used for S2 cross-world comparisons where S2(u|w₁) vs S2(u|w₂) have different normalization constants.

        The cross-product condition score(s₁,a) * total(s₂) > score(s₂,a) * total(s₁) is equivalent to score(s₁,a)/total(s₁) > score(s₂,a)/total(s₂) when both totals are positive.

        theorem Core.RationalAction.policy_gt_cross_of_cross_gt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_cross : ra.score s₁ a * ra.totalScore s₂ > ra.score s₂ a * ra.totalScore s₁) :
        ra.policy s₁ a > ra.policy s₂ a

        Cross-state policy comparison with positivity derived from the cross-product.

        Like policy_gt_cross but derives the totalScore > 0 conditions from the cross-product inequality itself: if score(s₁,a) * total(s₂) > score(s₂,a) * total(s₁) ≥ 0, then score(s₁,a) * total(s₂) > 0, so both score(s₁,a) > 0 and total(s₂) > 0. And score(s₁,a) ≤ total(s₁), so total(s₁) > 0.

        Used by rsa_predict for cross-utterance L1 comparisons where the two sides have different normalization constants.

        theorem Core.RationalAction.policy_list_sum_gt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (as₁ as₂ : List A) (h : (List.map (ra.score s) as₁).sum > (List.map (ra.score s) as₂).sum) (htot : 0 < ra.totalScore s) :
        (List.map (ra.policy s) as₁).sum > (List.map (ra.policy s) as₂).sum

        Score-sum ordering implies policy-sum ordering when both sides share the same state (same denominator). Used by rsa_predict for marginal L1 comparisons where the worlds being summed differ but the utterance and config are shared.

        theorem Core.RationalAction.finset_sum_policy_gt_of_sum_score_gt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (F₁ F₂ : Finset A) (h : F₁.sum (ra.score s) > F₂.sum (ra.score s)) :
        F₁.sum (ra.policy s) > F₂.sum (ra.policy s)

        Finset-sum ordering implies policy-sum ordering when both sides share the same state (same denominator). Like policy_list_sum_gt but for Finset.sum.

        Derives totalScore positivity from the score ordering itself, so no extra hypothesis is needed: if Σ_{F₁} score > Σ_{F₂} score ≥ 0, then some score is positive, so totalScore > 0.

        Used by rsa_predict for denominator cancellation in marginal comparisons.

        Luce's Choice Axiom #

        showed that the ratio rule P(a|s) = v(a)/Σv(b) is characterized by the independence of irrelevant alternatives (IIA): the relative probability of two actions depends only on their scores, not on what other actions are available.

        We formalize:

        theorem Core.RationalAction.policy_ratio {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :
        ra.policy s a₁ * ra.score s a₂ = ra.policy s a₂ * ra.score s a₁

        Constant Ratio Rule (Theorem 2): policy(a₁) · score(a₂) = policy(a₂) · score(a₁). The odds ratio policy(a₁)/policy(a₂) = score(a₁)/score(a₂).

        noncomputable def Core.RationalAction.pChoice {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :

        Choice probability from a subset: pChoice(a, T) = score(a) / Σ_{b∈T} score(b). Returns 0 if a ∉ T or the subset total is 0.

        Equations
        Instances For
          theorem Core.RationalAction.pChoice_univ {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (a : A) :
          ra.pChoice s Finset.univ a = ra.policy s a

          pChoice on the full set equals policy.

          theorem Core.RationalAction.pChoice_nonneg {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :
          0 ra.pChoice s T a

          pChoice is non-negative.

          theorem Core.RationalAction.pChoice_sum_eq_one {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (hT : bT, ra.score s b 0) :
          aT, ra.pChoice s T a = 1

          pChoice sums to 1 over the subset when the subset total is nonzero.

          theorem Core.RationalAction.pChoice_ratio {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a₁ a₂ : A) (h₁ : a₁ T) (h₂ : a₂ T) :
          ra.pChoice s T a₁ * ra.score s a₂ = ra.pChoice s T a₂ * ra.score s a₁

          IIA core: the ratio of pChoice values in any subset equals the score ratio. For a₁, a₂ ∈ T with score(a₂) > 0: pChoice(a₁, T) · score(a₂) = pChoice(a₂, T) · score(a₁) (Axiom 1).

          theorem Core.RationalAction.iia {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' T) (a : A) (ha : a S') (hS_pos : bS', ra.score s b 0) (hT_pos : bT, ra.score s b 0) :
          ra.pChoice s S' a = ra.pChoice s T a / bS', ra.pChoice s T b

          IIA (Axiom 1): P(a, S) = P(a, T) / Σ_{b∈S} P(b, T) for S ⊆ T. Choice probability from a subset is the conditional probability.

          theorem Core.RationalAction.product_rule {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' T) (a : A) (ha : a S') (hS_pos : bS', ra.score s b 0) (hT_pos : bT, ra.score s b 0) :
          ra.pChoice s T a = ra.pChoice s S' a * ((∑ bS', ra.score s b) / bT, ra.score s b)

          Product rule (Theorem 1): P(a, T) = P(a, S) · P(S, T) for a ∈ S ⊆ T, where P(S, T) = Σ_{b∈S} score(b) / Σ_{b∈T} score(b).

          noncomputable def Core.RationalAction.scaleBy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (k : ) (hk : 0 < k) :

          Scale all scores by a positive constant k.

          Equations
          • ra.scaleBy k hk = { score := fun (s : S) (a : A) => k * ra.score s a, score_nonneg := }
          Instances For
            theorem Core.RationalAction.scaleBy_policy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (k : ) (hk : 0 < k) :
            (ra.scaleBy k hk).policy s a = ra.policy s a

            Scale invariance (Theorem 5): scaling scores by k > 0 preserves policy.

            theorem Core.RationalAction.policy_eq_of_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra ra' : RationalAction S A) (s : S) (k : ) (hk : 0 < k) (h : ∀ (a : A), ra'.score s a = k * ra.score s a) (a : A) :
            ra'.policy s a = ra.policy s a

            Uniqueness (forward direction, Theorem 4): If scores are proportional (score'(s,a) = k · score(s,a) for some k > 0), then both agents have the same policy.

            Softmax Function #

            The softmax function σ(s, α)ᵢ = exp(α · sᵢ) / Σⱼ exp(α · sⱼ) is the exponential parameterization of the Luce choice rule. Following Franke & Degen (submitted), we establish Facts 1–8.

            noncomputable def Core.softmax {ι : Type u_3} [Fintype ι] (s : ι) (α : ) :
            ι

            The softmax function: softmax(s, α)ᵢ = exp(α · sᵢ) / Σⱼ exp(α · sⱼ).

            Equations
            Instances For
              noncomputable def Core.partitionFn {ι : Type u_3} [Fintype ι] (s : ι) (α : ) :

              The partition function (normalizing constant) Z = Σⱼ exp(α · sⱼ).

              Equations
              Instances For
                noncomputable def Core.logSumExp {ι : Type u_3} [Fintype ι] (s : ι) (α : ) :

                Log-sum-exp: log of partition function.

                Equations
                Instances For
                  theorem Core.partitionFn_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                  0 < partitionFn s α

                  The partition function is always positive.

                  theorem Core.partitionFn_ne_zero {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                  theorem Core.softmax_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                  0 < softmax s α i

                  Each softmax probability is positive. (Fact 1, part 1)

                  theorem Core.softmax_sum_eq_one {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                  i : ι, softmax s α i = 1

                  Softmax probabilities sum to 1. (Fact 1, part 2)

                  theorem Core.softmax_nonneg {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                  0 softmax s α i

                  Softmax is non-negative.

                  theorem Core.softmax_le_one {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                  softmax s α i 1

                  Softmax is at most 1.

                  theorem Core.softmax_odds {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i j : ι) :
                  softmax s α i / softmax s α j = Real.exp (α * (s i - s j))

                  Fact 2: Odds are determined by score differences: pᵢ/pⱼ = exp(α(sᵢ - sⱼ)).

                  theorem Core.log_softmax_odds {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i j : ι) :
                  Real.log (softmax s α i / softmax s α j) = α * (s i - s j)

                  Log-odds equal scaled score difference.

                  theorem Core.softmax_ratio {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i j : ι) :
                  softmax s α i = softmax s α j * Real.exp (α * (s i - s j))

                  Ratio form of Fact 2.

                  noncomputable def Core.logistic (x : ) :

                  The logistic (sigmoid) function: S(x) = 1 / (1 + exp(−x)).

                  Equations
                  Instances For
                    noncomputable def Core.logit (p : ) :

                    The logit function: L(p) = log(p / (1 − p)). Inverse of logistic on (0, 1).

                    Equations
                    Instances For
                      theorem Core.logit_logistic (x : ) :

                      logit inverts logistic: logit(logistic(x)) = x.

                      theorem Core.logistic_logit {p : } (hp0 : 0 < p) (hp1 : p < 1) :

                      logistic inverts logit for 0 < p < 1: logistic(logit(p)) = p.

                      theorem Core.softmax_binary (s : Fin 2) (α : ) :
                      softmax s α 0 = logistic (α * (s 0 - s 1))

                      Fact 3: For n = 2, softmax reduces to logistic.

                      theorem Core.logit_softmax_binary (s : Fin 2) (α : ) :
                      logit (softmax s α 0) = α * (s 0 - s 1)

                      Softmax log-odds equals logit of the binary softmax probability (when there are exactly two alternatives).

                      theorem Core.softmax_add_const {ι : Type u_3} [Fintype ι] (s : ι) (α c : ) :
                      softmax (fun (i : ι) => s i + c) α = softmax s α

                      Fact 6: Softmax is translation invariant.

                      theorem Core.softmax_scale {ι : Type u_3} [Fintype ι] (s : ι) (α a : ) (ha : a 0) :
                      softmax (fun (i : ι) => a * s i) (α / a) = softmax s α

                      Fact 8: Multiplicative scaling can be absorbed into α.

                      theorem Core.softmax_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) {α : } ( : 0 < α) (i j : ι) (hij : s i s j) :
                      softmax s α i softmax s α j

                      Higher scores get higher probabilities (for α > 0).

                      theorem Core.softmax_strict_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) {α : } ( : 0 < α) (i j : ι) (hij : s i < s j) :
                      softmax s α i < softmax s α j

                      Strict monotonicity.

                      theorem Core.softmax_zero {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) :
                      softmax s 0 = fun (x : ι) => 1 / (Fintype.card ι)

                      At α = 0, softmax is uniform.

                      theorem Core.softmax_neg_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) {α : } ( : α < 0) (i j : ι) (hij : s i s j) :
                      softmax s α j softmax s α i

                      For α < 0, lower scores get higher probabilities.

                      theorem Core.log_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                      Real.log (softmax s α i) = α * s i - Real.log (partitionFn s α)

                      Log of softmax = score minus log partition function.

                      noncomputable def Core.softmax1 {ι : Type u_3} [Fintype ι] (s : ι) :
                      ι

                      Softmax with default α = 1.

                      Equations
                      Instances For
                        noncomputable def Core.softmaxTemp {ι : Type u_3} [Fintype ι] (s : ι) (τ : ) :
                        ι

                        Temperature form: τ = 1/α.

                        Equations
                        Instances For
                          theorem Core.softmax_exponential_family {ι : Type u_3} [Fintype ι] (s : ι) (α : ) (i : ι) [Nonempty ι] :
                          softmax s α i = Real.exp (α * s i - logSumExp s α)

                          Softmax is an exponential family distribution.

                          theorem Core.rpow_luce_eq_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (f : ι) (α : ) (hf : ∀ (i : ι), 0 < f i) (i : ι) :
                          f i ^ α / j : ι, f j ^ α = softmax (fun (j : ι) => Real.log (f j)) α i

                          Luce choice with rpow scores equals softmax over log scores.

                          f(i)^α / Σⱼ f(j)^α = softmax(log ∘ f, α)(i) when all f(i) > 0.

                          This is the general identity connecting belief-based RSA (which uses rpow) to the softmax framework (which uses exp). Every S1 model with s1Score = rpow(l0, α) inherits all softmax limit theorems via this identity: as α → ∞, rpow-based Luce choice concentrates on the argmax of f, i.e., the most informative utterance.

                          Why Softmax? The Fechnerian Characterization #

                          The exponential parameterization score = exp(α · utility) is not a design choice — it is the unique transformation connecting Luce's ratio scale to a utility (interval) scale (§2.A; @cite{adams-messick-1958}).

                          Ratio vs interval scales. Luce's Axiom 1 (IIA) yields a ratio scale v: only ratios v(a)/v(b) are meaningful (Theorem 4). Fechner's psychophysics requires an interval scale u: only differences u(a) - u(b) are meaningful. The question: how are v and u related?

                          Derivation. From P(a,b) = v(a)/(v(a)+v(b)), the odds ratio is v(a)/v(b) = g(u(a) - u(b)) for some function g. Transitivity of ratios (v(a)/v(c) = [v(a)/v(b)] · [v(b)/v(c)]) forces Cauchy's multiplicative functional equation: g(s + t) = g(s) · g(t). The unique monotone increasing solution is g(s) = exp(k · s) (cauchy_mul_exp), giving:

                          The parameter k > 0 is the rationality parameter α in RSA.

                          theorem Core.cauchy_mul_exp (g : ) (hg_mul : ∀ (s t : ), g (s + t) = g s * g t) (hg_mono : StrictMono g) :
                          ∃ (k : ), 0 < k g 0 = 1 ∀ (s : ), g s = Real.exp (k * s)

                          Cauchy's multiplicative functional equation (classical): If g : ℝ → ℝ satisfies g(s + t) = g(s) · g(t) and is strictly monotone increasing, then g(s) = exp(k · s) for some k > 0.

                          The proof reduces to the additive Cauchy equation via log: setting h = log ∘ g, the multiplicative equation becomes h(s+t) = h(s) + h(t). The key lemma (cauchy_monotone_additive_linear) shows that a strictly monotone additive function must be linear, by density of ℚ in ℝ: h agrees with x ↦ k·x on rationals (by induction), and any deviation on an irrational x would violate monotonicity via a rational witness between x and h(x)/k.

                          theorem Core.luce_fechnerian_exp {X : Type u_3} (v u : X) (g : ) (hv_pos : ∀ (x : X), 0 < v x) (h_ratio : ∀ (x y : X), v x / v y = g (u x - u y)) (hg_mul : ∀ (s t : ), g (s + t) = g s * g t) (hg_mono : StrictMono g) :
                          ∃ (k : ), 0 < k ∀ (x₀ x : X), v x = v x₀ * Real.exp (k * (u x - u x₀))

                          Fechnerian uniqueness (§2.A; @cite{adams-messick-1958}): If a ratio scale v and interval scale u represent the same ordering via v(x)/v(y) = g(u(x) - u(y)) for a strictly monotone multiplicative g, then v is the exponential of u.

                          This is WHY fromSoftmax uses exp(α · utility): the exponential is forced by the requirement that log-odds be linear in utility differences. It is the unique bridge between Luce's ratio scale (Chapter 1) and Fechner's interval scale (Chapter 2).

                          noncomputable def Core.RationalAction.fromSoftmax {S : Type u_1} {A : Type u_2} [Fintype A] (utility : SA) (α : ) :

                          Construct a RationalAction from a utility function via softmax.

                          The score is exp(α · utility(s, a)), so policy = softmax(utility, α). The exponential parameterization is forced by the Fechnerian characterization (luce_fechnerian_exp): it is the unique bridge from Luce's ratio scale to an additive utility scale.

                          Equations
                          Instances For
                            theorem Core.RationalAction.fromSoftmax_policy_eq {S : Type u_1} {A : Type u_2} [Fintype A] [Nonempty A] (utility : SA) (α : ) (s : S) (a : A) :
                            (fromSoftmax utility α).policy s a = softmax (utility s) α a

                            The policy of a softmax agent equals the softmax function.

                            KL Divergence and the Gibbs Variational Principle #

                            The softmax distribution uniquely maximizes entropy + expected score on the probability simplex. This is the mathematical foundation for RSA convergence (@cite{zaslavsky-hu-levy-2020}, Proposition 1).

                            Proof strategy #

                            The Gibbs VP reduces to KL non-negativity via three identities:

                            1. H(p) + KL(p‖q) = -∑ pᵢ log qᵢ (negMulLog + KL term telescope)
                            2. -∑ pᵢ log qᵢ = -α⟨p,s⟩ + log Z (substitute log qᵢ = α sᵢ - log Z)
                            3. H(q) + α⟨q,s⟩ = log Z (softmax self-information)

                            Combining: H(p) + α⟨p,s⟩ + KL = log Z = H(q) + α⟨q,s⟩, so KL ≥ 0 ⟹ LHS ≤ RHS.

                            noncomputable def Core.klFinite {ι : Type u_3} [Fintype ι] (p q : ι) :

                            Finite KL divergence: KL(p ‖ q) = Σ pᵢ · log(pᵢ / qᵢ). Convention: 0 · log(0/q) = 0.

                            Equations
                            Instances For
                              theorem Core.kl_eq_sum_klFun {ι : Type u_3} [Fintype ι] (p q : ι) (hq : ∀ (i : ι), 0 < q i) (hp : ∀ (i : ι), 0 p i) (hsum : i : ι, p i = i : ι, q i) :
                              klFinite p q = i : ι, q i * InformationTheory.klFun (p i / q i)

                              Finite KL divergence equals Σ qᵢ · klFun(pᵢ/qᵢ) when Σpᵢ = Σqᵢ.

                              theorem Core.kl_nonneg {ι : Type u_3} [Fintype ι] (p q : ι) (hq : ∀ (i : ι), 0 < q i) (hp : ∀ (i : ι), 0 p i) (hsum : i : ι, p i = i : ι, q i) :

                              Gibbs' inequality (finite form): KL(p ‖ q) ≥ 0.

                              For distributions p, q with qᵢ > 0, pᵢ ≥ 0, and Σpᵢ = Σqᵢ: Σᵢ pᵢ · log(pᵢ/qᵢ) ≥ 0

                              Proof via Mathlib's klFun_nonneg: klFun(x) ≥ 0 for x ≥ 0.

                              theorem Core.kl_nonneg' {ι : Type u_3} [Fintype ι] [Nonempty ι] {p q : ι} (hp_nonneg : ∀ (i : ι), 0 p i) (hq_pos : ∀ (i : ι), 0 < q i) (hp_sum : i : ι, p i = 1) (hq_sum : i : ι, q i = 1) :

                              Alternative KL non-negativity with distribution hypotheses.

                              noncomputable def Core.speakerObj {ι : Type u_3} [Fintype ι] (v : ι) (α : ) (s : ι) :

                              The per-meaning speaker objective: f(s) = Σᵤ [negMulLog(sᵤ) + α · sᵤ · vᵤ].

                              This is the function that the speaker maximizes for each meaning m, where vᵤ = log L(m|u) is the listener utility of utterance u.

                              Equations
                              Instances For
                                theorem Core.speakerObj_at_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (v : ι) (α : ) :
                                speakerObj v α (softmax v α) = logSumExp v α

                                The softmax achieves f(s*) = log Z, where Z is the partition function.

                                theorem Core.gibbs_variational {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :
                                i : ι, (p i).negMulLog + α * i : ι, p i * s i i : ι, (softmax s α i).negMulLog + α * i : ι, softmax s α i * s i

                                Gibbs Variational Principle: softmax maximizes entropy + expected score.

                                For any distribution p on ι and scores s : ι → ℝ: H(p) + α·⟨p, s⟩ ≤ H(q) + α·⟨q, s⟩ where q = softmax(s, α) and H(p) = Σ negMulLog(pᵢ).

                                Softmax Concentration at High Rationality #

                                As the rationality parameter α → ∞, softmax concentrates all probability mass on the action with highest utility — i.e., softmax converges to argmax. This connects:

                                Proof sketch #

                                From softmax_odds, we have σᵢ / σⱼ = exp(α(sᵢ − sⱼ)). When sᵢ > sⱼ, this ratio → ∞ as α → ∞, so σⱼ / σᵢ → 0. Since Σ σₖ = 1, the maximizer's probability → 1 by squeezing: 1 - σ_max = Σ_{k≠max} σₖ, and each non-maximal term → 0 (bounded by exp(-α · gap) where gap = sᵢ - sⱼ > 0).

                                theorem Core.softmax_argmax_limit {ι : Type u_3} [Fintype ι] [Nonempty ι] [DecidableEq ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), j i_maxs j < s i_max) (ε : ) :
                                ε > 0∃ (α₀ : ), α > α₀, |softmax s α i_max - 1| < ε

                                Softmax → argmax limit: as α → ∞, softmax concentrates on the unique maximizer. For any i_max with s i_max strictly greater than all other scores, softmax(s, α)_{i_max} → 1.

                                This is the formal connection between MaxEnt (soft optimization) and OT (hard optimization): OT is the α → ∞ limit of MaxEnt.

                                theorem Core.softmax_nonmax_limit {ι : Type u_3} [Fintype ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), j i_maxs j < s i_max) (j : ι) (hj : j i_max) (ε : ) :
                                ε > 0∃ (α₀ : ), α > α₀, softmax s α j < ε

                                Complement of the limit: non-maximal actions get probability → 0.

                                noncomputable def Core.shannonEntropy {ι : Type u_3} [Fintype ι] (p : ι) :

                                Shannon entropy: H(p) = -Σᵢ pᵢ log pᵢ.

                                For a ℚ-valued counterpart suitable for decidable computation, see Core.InformationTheory.entropy in Linglib/Core/InformationTheory.lean.

                                Equations
                                Instances For
                                  theorem Core.shannonEntropy_nonneg {ι : Type u_3} [Fintype ι] (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :

                                  Entropy is non-negative for probability distributions.

                                  theorem Core.shannonEntropy_le_log_card {ι : Type u_3} [Fintype ι] [Nonempty ι] (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :

                                  Maximum entropy is achieved by uniform distribution.

                                  Proof: KL(p ‖ uniform) ≥ 0, and KL(p ‖ uniform) = log n - H(p).

                                  theorem Core.shannonEntropy_uniform {ι : Type u_3} [Fintype ι] [Nonempty ι] :
                                  (shannonEntropy fun (x : ι) => 1 / (Fintype.card ι)) = Real.log (Fintype.card ι)

                                  Entropy of uniform distribution.

                                  theorem Core.shannonEntropy_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                                  shannonEntropy (softmax s α) = Real.log (partitionFn s α) - α * i : ι, softmax s α i * s i

                                  Entropy of softmax: H(softmax(s, α)) = log Z - α · 𝔼[s].

                                  noncomputable def Core.entropyRegObjective {ι : Type u_3} [Fintype ι] (s : ι) (α : ) (p : ι) :

                                  Entropy-regularized objective: G_α(p, s) = ⟨s, p⟩ + (1/α) H(p).

                                  Equations
                                  Instances For
                                    theorem Core.entropyRegObjective_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) :

                                    The maximum value of the entropy-regularized objective.

                                    theorem Core.softmax_maximizes_entropyReg {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :

                                    Fact 5: Softmax maximizes the entropy-regularized objective.

                                    Proof: gibbs_variational gives H(p) + α⟨p,s⟩ ≤ H(q) + α⟨q,s⟩; dividing by α > 0 yields the result.

                                    theorem Core.softmax_unique_maximizer {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) (h_max : entropyRegObjective s α p = entropyRegObjective s α (softmax s α)) :
                                    p = softmax s α

                                    Softmax is the unique maximizer.

                                    Proof: equality in the objective ⟹ KL(p ‖ softmax) = 0 (via speakerObj_plus_kl), hence p = softmax (via kl_eq_zero_imp_eq).

                                    noncomputable def Core.freeEnergy {ι : Type u_3} [Fintype ι] (s : ι) (α : ) (p : ι) :

                                    Free energy (from statistical mechanics).

                                    Equations
                                    Instances For
                                      theorem Core.softmax_minimizes_freeEnergy {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :
                                      freeEnergy s α (softmax s α) freeEnergy s α p

                                      Softmax is the Boltzmann distribution: minimizes free energy.

                                      theorem Core.logSumExp_convex {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) :
                                      ConvexOn Set.univ fun (α : ) => logSumExp s α

                                      The log-partition function is convex in α.

                                      Proof: By Hölder's inequality. For 0 < a, b with a + b = 1: ∑ exp(x·sᵢ)^a · exp(y·sᵢ)^b ≤ (∑ exp(x·sᵢ))^a · (∑ exp(y·sᵢ))^b Since exp(x·sᵢ)^a · exp(y·sᵢ)^b = exp((ax+by)·sᵢ), taking logs gives logSumExp(s, ax+by) ≤ a·logSumExp(s, x) + b·logSumExp(s, y).

                                      theorem Core.deriv_logSumExp {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                                      deriv (fun (α : ) => logSumExp s α) α = i : ι, softmax s α i * s i

                                      Derivative of log-partition gives expected value: d/dα log(Σ exp(α sᵢ)) = Σ softmax(s,α)ᵢ · sᵢ.

                                      Proof via chain rule on log ∘ Σ exp(α · sᵢ), then hasDerivAt_finset_sum.

                                      noncomputable def Core.partitionFnOffset {ι : Type u_3} [Fintype ι] (s r : ι) (α : ) :

                                      Partition function with per-element offsets: Z(α) = Σⱼ exp(α · sⱼ + rⱼ).

                                      Equations
                                      Instances For
                                        theorem Core.partitionFnOffset_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (α : ) :
                                        noncomputable def Core.logSumExpOffset {ι : Type u_3} [Fintype ι] (s r : ι) (α : ) :

                                        Log-sum-exp with offsets: log Σ exp(α · sᵢ + rᵢ).

                                        Equations
                                        Instances For
                                          noncomputable def Core.softmaxOffset {ι : Type u_3} [Fintype ι] (s r : ι) (α : ) (i : ι) :

                                          Softmax with offsets: exp(α · sᵢ + rᵢ) / Z(α).

                                          Equations
                                          Instances For
                                            theorem Core.hasDerivAt_logSumExpOffset {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (α : ) :
                                            HasDerivAt (logSumExpOffset s r) (∑ i : ι, softmaxOffset s r α i * s i) α

                                            Derivative of offset log-partition gives weighted expected value: d/dα log(Σ exp(α·sᵢ + rᵢ)) = Σ softmaxOffset(s,r,α)ᵢ · sᵢ.

                                            theorem Core.logSumExpOffset_convex {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) :

                                            The offset log-partition function is convex in α.

                                            Same Hölder argument as logSumExp_convex: the key factoring exp((ax+by)·sᵢ + rᵢ) = exp(x·sᵢ + rᵢ)^a · exp(y·sᵢ + rᵢ)^b holds because a + b = 1, absorbing the offset.

                                            theorem Core.logConditional_concaveOn {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (y : ι) :
                                            ConcaveOn Set.univ fun (α : ) => α * s y + r y - logSumExpOffset s r α

                                            The log conditional likelihood α ↦ (α · sᵧ + rᵧ) − logSumExpOffset(s,r,α) is concave (affine minus convex).

                                            theorem Core.hasDerivAt_logConditional {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (y : ι) (α : ) :
                                            HasDerivAt (fun (w : ) => w * s y + r y - logSumExpOffset s r w) (s y - i : ι, softmaxOffset s r α i * s i) α

                                            The derivative of log conditional likelihood α ↦ (α·sᵧ + rᵧ) − logSumExpOffset is the observed feature value minus the expected value: d/dα = sᵧ − Σᵢ softmaxOffset(s,r,α)ᵢ · sᵢ.

                                            theorem Core.max_entropy_duality {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (c α : ) (_hα : 0 < α) (h_constraint : i : ι, softmax s α i * s i = c) :

                                            Strong duality: max entropy = min free energy.

                                            theorem Core.bayesian_maximizes {ι : Type u_3} [Fintype ι] (w : ι) (hw_nonneg : ∀ (i : ι), 0 w i) (hC_pos : 0 < i : ι, w i) (L : ι) (hL_pos : ∀ (i : ι), 0 < L i) (hL_sum : i : ι, L i = 1) :
                                            i : ι, w i * Real.log (L i) i : ι, w i * Real.log (w i / j : ι, w j)

                                            Bayesian optimality: The normalized posterior maximizes weighted log-likelihood on the probability simplex.

                                            For weights wᵢ ≥ 0 with C = Σwᵢ > 0, and any distribution L on the simplex (Lᵢ > 0, Σ Lᵢ = 1), the normalized posterior p*ᵢ = wᵢ/C satisfies:

                                            Σᵢ wᵢ · log(Lᵢ) ≤ Σᵢ wᵢ · log(p*ᵢ)

                                            This is used for listener optimality: with wᵢ = P(m)·S(u|m), the Bayesian listener L*(m|u) = wᵢ/C maximizes Σ_m P(m)·S(u|m)·log L(m|u).

                                            Uniqueness of the Ratio Scale #

                                            Theorem 4 (proved earlier as policy_eq_of_proportional) shows that proportional scores yield the same policy. The converse — that identical policies force proportional scores — completes the uniqueness characterization: same policy ↔ proportional scores, with proportionality constant k = totalScore₂/totalScore₁.

                                            Note: This is distinct from the "Independence-of-Unit Condition" in §1.F (pp. 28–33), which concerns state-dependent transformations f satisfying f(kv) = kf(v). That condition addresses how scale values transform across experimental conditions, not the uniqueness of the scale within a condition.

                                            theorem Core.RationalAction.proportional_of_policy_eq {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) (hpol : ∀ (a : A), ra₁.policy s a = ra₂.policy s a) (a : A) :
                                            ra₂.score s a = ra₂.totalScore s / ra₁.totalScore s * ra₁.score s a

                                            Converse of Theorem 4 (uniqueness of ratio scale): If two agents with positive total scores have the same policy, then their scores are proportional with constant k = total₂/total₁.

                                            theorem Core.RationalAction.policy_eq_iff_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) :
                                            (∀ (a : A), ra₁.policy s a = ra₂.policy s a) ∃ (k : ), 0 < k ∀ (a : A), ra₂.score s a = k * ra₁.score s a

                                            Full uniqueness characterization (Theorem 4 and its converse): Two agents with positive total scores have the same policy if and only if their scores are proportional.

                                            Alternative Forms of Axiom 1 #

                                            proves three equivalent formulations of the choice axiom:

                                            (a) Ratio form: There exists a positive function v such that P(x, T) = v(x) / Σ_{y∈T} v(y) for all x ∈ T.

                                            (b) Product rule: P(x, T) = P(x, S) · P(S, T) for x ∈ S ⊆ T, where P(S, T) = Σ_{y∈S} P(y, T).

                                            (c) Pairwise independence: The odds ratio P(x,{x,y}) · P(y,T) = P(y,{x,y}) · P(x,T) — pairwise ratios are preserved in any superset.

                                            The ratio form (a) is the definition of RationalAction; (a)→(b) is product_rule and (a)→(c) is pChoice_ratio.

                                            structure Core.ChoiceFn (A : Type u_4) [DecidableEq A] :
                                            Type u_4

                                            A general choice function on finite subsets: the minimal structure for stating Axiom 1 equivalences without assuming a ratio scale a priori.

                                            • prob : Finset AA

                                              P(a, T): probability of choosing a from set T

                                            • prob_nonneg (T : Finset A) (a : A) : 0 self.prob T a

                                              Probabilities are non-negative

                                            • prob_zero_outside (T : Finset A) (a : A) : aTself.prob T a = 0

                                              Zero probability outside the choice set

                                            Instances For

                                              Axiom 1, Form (a): ratio scale representation. There exists v > 0 such that P(x, T) = v(x) / Σ v(y).

                                              Equations
                                              Instances For

                                                Axiom 1, Form (b): product rule. P(x, T) = P(x, S) · Σ_{y∈S} P(y, T) for x ∈ S ⊆ T.

                                                Equations
                                                Instances For

                                                  Axiom 1, Form (c): pairwise independence (IIA). The odds ratio is preserved in any superset: P(x,T) · P(y,{x,y}) = P(y,T) · P(x,{x,y}).

                                                  Equations
                                                  Instances For

                                                    (a) → (b): Ratio form implies product rule (Appendix 1).

                                                    (a) → (c): Ratio form implies pairwise IIA (Appendix 1).

                                                    theorem Core.pairwiseIIA_implies_ratio {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hIIA : cf.hasPairwiseIIA) (hsum : ∀ (T : Finset A), T.NonemptyaT, cf.prob T a = 1) (hpos : ∀ (T : Finset A), aT, 0 < cf.prob T a) :

                                                    (c) → (a): Pairwise IIA implies ratio form (Appendix 1). The ratio scale is constructed by fixing a reference element x₀ and setting v(x) = P(x, {x, x₀}) / P(x₀, {x, x₀}). Requires normalization (probabilities sum to 1) and strict positivity for elements in the choice set.

                                                    theorem Core.axiom1_ratio_iff_pairwiseIIA {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hsum : ∀ (T : Finset A), T.NonemptyaT, cf.prob T a = 1) (hpos : ∀ (T : Finset A), aT, 0 < cf.prob T a) :

                                                    Axiom 1 equivalence (Appendix 1): Ratio form ↔ pairwise IIA (under normalization and positivity).