Softmax Function: Limit Behavior #
§1. Pointwise limits: α → 0 (uniform), α → ∞ (argmax), α → -∞ (argmin).
§2. BToM observer: softmaxObserver_tendsto_one — an observer watching a
softmax agent concentrates on the uniquely optimal state as α → ∞.
This is the RSA–exhaustivity bridge: L1 at α → ∞ computes exh.
Fact 4: As α → 0, softmax converges to uniform distribution.
The ratio of non-max to max probability vanishes as α → ∞.
At the maximum, softmax → 1 as α → ∞. Helper lemma.
When there's a unique maximum, softmax concentrates on it as α → ∞.
Log-probability difference grows unboundedly.
As α → -∞, softmax concentrates on the minimum.
The IBR limit: hardmax selector.
Instances For
Softmax converges to hardmax as α → ∞ (when maximum is unique).
Maximum possible entropy (uniform distribution).
Equations
- Softmax.maxEntropy ι = Real.log ↑(Fintype.card ι)
Instances For
As α → 0, entropy of softmax approaches maximum.
As α → ∞ (with unique max), entropy approaches 0.
Exponential rate of concentration.
For practical computation: when is softmax close enough to hardmax?
If action j has strictly higher score than i, softmax(i, α) → 0 as α → ∞.
Weaker than tendsto_softmax_infty_unique_max: does not require a unique
maximum, just that i is beaten by some j.
Observer posterior for a softmax agent: BToM inference about the state.
An observer sees a softmax agent choose action i, and infers the state: P(s | i, α) = softmax(score(·,s), α)(i) · prior(s) / Σ_{s'} ...
In RSA: the pragmatic listener L1 observes the speaker's utterance u and
infers the world w. This is L1(w | u) parameterized by α for limits.
Equations
- Softmax.softmaxObserver score prior α i s = Core.softmax (fun (j : ι) => score j s) α i * prior s / ∑ s' : σ, Core.softmax (fun (j : ι) => score j s') α i * prior s'
Instances For
BToM inference about a softmax agent concentrates on the optimal state.
An observer watches a softmax agent and infers the hidden state. If action i₀ is the unique best action at state s₀, and is not the best at any other state, the observer's posterior P(s₀ | i₀, α) → 1 as α → ∞.
RSA–exhaustivity bridge: the pragmatic listener (L1) observes a softmax speaker (S1) and does Bayesian inversion. At α → ∞, L1 concentrates on worlds where the heard utterance was the speaker's uniquely optimal choice. This IS the exhaustivity operator: L1 at α → ∞ computes exh(u).
For a simple scale {some, all} with worlds {someNotAll, all}:
- "all" is more informative at the "all" world → S1 says "all" there
- "some" is the only true utterance at "someNotAll" → S1 says "some" there
- L1 hearing "some" → concentrates on "someNotAll" → exh(some) = some ∧ ¬all
softmaxObserver is invariant under state-dependent constant shifts.
Adding c(s) to all action scores at state s doesn't change the
observer's posterior, because softmax is translation-invariant
(softmax_add_const).