Documentation

Linglib.Theories.Pragmatics.RSA.Core.Softmax.Limits

Softmax Function: Limit Behavior #

§1. Pointwise limits: α → 0 (uniform), α → ∞ (argmax), α → -∞ (argmin). §2. BToM observer: softmaxObserver_tendsto_one — an observer watching a softmax agent concentrates on the uniquely optimal state as α → ∞. This is the RSA–exhaustivity bridge: L1 at α → ∞ computes exh.

def Softmax.argmaxSet {ι : Type u_1} (s : ι) :
Set ι

The set of indices achieving the maximum score.

Equations
Instances For
    def Softmax.argminSet {ι : Type u_1} (s : ι) :
    Set ι

    The set of indices achieving the minimum score.

    Equations
    Instances For
      noncomputable def Softmax.maxScore {ι : Type u_1} [Nonempty ι] (s : ι) :

      Maximum score value.

      Equations
      Instances For
        noncomputable def Softmax.minScore {ι : Type u_1} [Nonempty ι] (s : ι) :

        Minimum score value.

        Equations
        Instances For
          theorem Softmax.tendsto_softmax_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i : ι) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i) (nhds 0) (nhds (1 / (Fintype.card ι)))

          Fact 4: As α → 0, softmax converges to uniform distribution.

          theorem Softmax.softmax_ratio_tendsto_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i j : ι) (hij : s i < s j) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i / Core.softmax s α j) Filter.atTop (nhds 0)

          The ratio of non-max to max probability vanishes as α → ∞.

          theorem Softmax.tendsto_softmax_infty_at_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i_max) Filter.atTop (nhds 1)

          At the maximum, softmax → 1 as α → ∞. Helper lemma.

          theorem Softmax.tendsto_softmax_infty_unique_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) (i : ι) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atTop (nhds (if i = i_max then 1 else 0))

          When there's a unique maximum, softmax concentrates on it as α → ∞.

          theorem Softmax.log_softmax_ratio_tendsto {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i j : ι) (hij : s i < s j) :

          Log-probability difference grows unboundedly.

          theorem Softmax.tendsto_softmax_neg_infty_unique_min {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_min : ι) (h_unique : ∀ (j : ι), j i_mins i_min < s j) (i : ι) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atBot (nhds (if i = i_min then 1 else 0))

          As α → -∞, softmax concentrates on the minimum.

          noncomputable def Softmax.hardmax {ι : Type u_1} [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) :
          ι

          The IBR limit: hardmax selector.

          Equations
          Instances For
            theorem Softmax.softmax_tendsto_hardmax {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) (i : ι) :
            Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atTop (nhds (hardmax s i_max h_unique i))

            Softmax converges to hardmax as α → ∞ (when maximum is unique).

            noncomputable def Softmax.entropy {ι : Type u_1} [Fintype ι] [Nonempty ι] (p : ι) :

            Shannon entropy of a distribution.

            Equations
            Instances For
              noncomputable def Softmax.maxEntropy (ι : Type u_2) [Fintype ι] :

              Maximum possible entropy (uniform distribution).

              Equations
              Instances For
                theorem Softmax.entropy_tendsto_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) :
                Filter.Tendsto (fun (α : ) => entropy (Core.softmax s α)) (nhds 0) (nhds (maxEntropy ι))

                As α → 0, entropy of softmax approaches maximum.

                theorem Softmax.entropy_tendsto_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) :

                As α → ∞ (with unique max), entropy approaches 0.

                theorem Softmax.softmax_exponential_decay {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), s j s i_max) (i : ι) (hi : s i < s i_max) :
                C > 0, α > 0, Core.softmax s α i C * Real.exp (-α * (s i_max - s i))

                Exponential rate of concentration.

                theorem Softmax.softmax_negligible {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), s j s i_max) (ε : ) ( : 0 < ε) (gap : ) (hgap : 0 < gap) (h_gap_bound : ∀ (j : ι), j i_maxs i_max - s j gap) (α : ) :
                α > 1 / gap * |Real.log ε|∀ (j : ι), j i_maxCore.softmax s α j < ε

                For practical computation: when is softmax close enough to hardmax?

                theorem Softmax.tendsto_softmax_infty_not_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i j : ι) (hij : s i < s j) :
                Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atTop (nhds 0)

                If action j has strictly higher score than i, softmax(i, α) → 0 as α → ∞. Weaker than tendsto_softmax_infty_unique_max: does not require a unique maximum, just that i is beaten by some j.

                noncomputable def Softmax.softmaxObserver {ι : Type u_1} [Fintype ι] {σ : Type u_2} [Fintype σ] [Nonempty ι] (score : ισ) (prior : σ) (α : ) (i : ι) (s : σ) :

                Observer posterior for a softmax agent: BToM inference about the state.

                An observer sees a softmax agent choose action i, and infers the state: P(s | i, α) = softmax(score(·,s), α)(i) · prior(s) / Σ_{s'} ...

                In RSA: the pragmatic listener L1 observes the speaker's utterance u and infers the world w. This is L1(w | u) parameterized by α for limits.

                Equations
                Instances For
                  theorem Softmax.softmaxObserver_tendsto_one {ι : Type u_1} [Fintype ι] [DecidableEq ι] {σ : Type u_2} [Fintype σ] [DecidableEq σ] [Nonempty ι] [Nonempty σ] (score : ισ) (prior : σ) (i₀ : ι) (s₀ : σ) (h_opt : ∀ (j : ι), j i₀score j s₀ < score i₀ s₀) (h_nonopt : ∀ (s : σ), s s₀∃ (j : ι), score i₀ s < score j s) (h_prior : 0 < prior s₀) :
                  Filter.Tendsto (fun (α : ) => softmaxObserver score prior α i₀ s₀) Filter.atTop (nhds 1)

                  BToM inference about a softmax agent concentrates on the optimal state.

                  An observer watches a softmax agent and infers the hidden state. If action i₀ is the unique best action at state s₀, and is not the best at any other state, the observer's posterior P(s₀ | i₀, α) → 1 as α → ∞.

                  RSA–exhaustivity bridge: the pragmatic listener (L1) observes a softmax speaker (S1) and does Bayesian inversion. At α → ∞, L1 concentrates on worlds where the heard utterance was the speaker's uniquely optimal choice. This IS the exhaustivity operator: L1 at α → ∞ computes exh(u).

                  For a simple scale {some, all} with worlds {someNotAll, all}:

                  • "all" is more informative at the "all" world → S1 says "all" there
                  • "some" is the only true utterance at "someNotAll" → S1 says "some" there
                  • L1 hearing "some" → concentrates on "someNotAll" → exh(some) = some ∧ ¬all
                  theorem Softmax.softmaxObserver_add_const {ι : Type u_1} [Fintype ι] [DecidableEq ι] {σ : Type u_2} [Fintype σ] [DecidableEq σ] [Nonempty ι] (score : ισ) (prior c : σ) (α : ) (i : ι) (s : σ) :
                  softmaxObserver (fun (j : ι) (t : σ) => score j t + c t) prior α i s = softmaxObserver score prior α i s

                  softmaxObserver is invariant under state-dependent constant shifts.

                  Adding c(s) to all action scores at state s doesn't change the observer's posterior, because softmax is translation-invariant (softmax_add_const).