Coupled Evaluation: Softmax over Product Spaces #
When items are evaluated jointly — via systemic constraints in MaxEnt grammar or via shared latent variables in RSA — the distribution over individual items doesn't factorize. This module formalizes the shared pattern:
- Per-item scores
s(i, v)that depend only on one item's value - Coupling scores
C(f)that evaluate the full assignment jointly - Joint probability via softmax:
P(f) ∝ exp(Σᵢ s(i, f(i)) + C(f)) - Marginal probability:
P_i(v) = Σ_{f : f(i) = v} P(f)
The key theorem is factorization: when the coupling score is constant (no coupling), the joint factorizes and each marginal equals the independent softmax of its per-item score.
Instances #
MaxEnt systemic constraints (@cite{storme-2026}): items = input→output mappings, coupling = *HOMOPHONY or other systemic constraints.
componentScore i v = harmonyScore(classicalConstraints, (inputs i, v)),couplingScore f = -Σₖ wₖ · Sₖ(f).RSA lexical uncertainty (@cite{bergen-levy-goodman-2016}): the analog is a mixture model rather than a coupled exponential — the listener marginalizes over lexicons:
L1(u,w) ∝ prior(w) · Σ_l prior(l) · S1(l,w,u). The algebraic form differs (mixture vs coupled exponential), but the phenomenon is identical: joint evaluation creates dependencies between items that would otherwise be independent, and marginalization recovers individual probabilities that reflect the coupling.See
RSAConfig.L1agentinTheories/Pragmatics/RSA/Core/Config.leanfor the RSA marginalization pattern.BToM inference (
Core/Agent/BToM.lean): the observer marginalizes over latent mental state variables (percept, belief, desire) to recover world posteriors.worldMarginal(a, w) = Σ_{p,b,d,s,m} joint(a,p,b,d,s,m,w).
Factorization #
The core theorem (marginal_eq_independent_when_uncoupled): when the coupling
score is constant, the joint distribution factorizes as a product of
independent per-item distributions. This is the mathematical basis for:
- Storme: systemic constraint weight = 0 ⟹ marginal = classical MaxEnt
- RSA: single lexicon (|Lexicon| = 1) ⟹ L1 = standard Bayesian update
- MaxEnt↔OT:
softmax_argmax_limitsends MaxEnt → OT; this theorem sends coupled MaxEnt → uncoupled MaxEnt when coupling vanishes
A coupled evaluation over indexed items, where the joint score may not decompose into independent per-item scores.
The joint probability of assignment f : Index → Value is:
P(f) = softmax(totalScore, 1)(f) over the space of all assignments.
When couplingScore is constant, totalScore decomposes additively
and the joint factorizes into independent per-item softmax distributions.
- componentScore : Index → Value → ℝ
Per-item score: depends only on one item's value. In MaxEnt:
harmonyScore(classicalConstraints, (input_i, v)). In RSA:log S1(lexicon, world, utterance). - couplingScore : (Index → Value) → ℝ
Coupling score: evaluates the full assignment jointly. In MaxEnt: systemic constraint penalty (e.g., *HOMOPHONY). In RSA: log prior over shared lexicon.
Instances For
Total score of an assignment: sum of per-item scores plus coupling.
Equations
- cs.totalScore f = ∑ i : I, cs.componentScore i (f i) + cs.couplingScore f
Instances For
Joint probability: softmax over the space of all assignments I → V.
Equations
- cs.jointProb f = Core.softmax cs.totalScore 1 f
Instances For
Marginal probability: P_i(v) = Σ_{f : f(i) = v} P_joint(f). Marginalizes the joint distribution to recover the probability of a specific value at a specific index.
Instances For
Marginals sum to 1 (marginalization preserves total probability).
Proof: each assignment f contributes to exactly one value v = f(i),
so Σ_v marginal(i, v) = Σ_v Σ_{f:f(i)=v} P(f) = Σ_f P(f) = 1.
Marginal probabilities are non-negative.
Independent per-item probability: what the marginal would be if there
were no coupling. P_indep(i, v) = softmax(componentScore(i, ·), 1)(v).
Equations
- cs.independentProb i v = Core.softmax (cs.componentScore i) 1 v
Instances For
Factorization theorem: when coupling is constant (no genuine coupling), the marginal probability equals the independent per-item softmax.
This is the shared mathematical core of:
- @cite{storme-2026}: systemic weight = 0 ⟹ marginal = classical MaxEnt
- @cite{bergen-levy-goodman-2016}: single lexicon ⟹ L1 = standard posterior
- BToM: delta-function latent prior ⟹ marginal = direct inference
Proof #
When couplingScore f = c for all f:
totalScore f = Σᵢ componentScore(i, f(i)) + c- By
softmax_add_const, addingcdoesn't change softmax:jointProb f = softmax(f ↦ Σᵢ componentScore(i, f(i)), 1)(f) - Since the score decomposes additively over indices,
exp(Σᵢ sᵢ) = Πᵢ exp(sᵢ), so the joint factorizes:jointProb f = Πᵢ softmax(componentScore(i, ·))(f(i)) - Marginalizing:
marginal(i, v) = softmax(componentScore(i, ·))(v) · Π_{j≠i} Σ_{v'} softmax(componentScore(j, ·))(v') = softmax(·)(v) · 1
Step 3 uses the finite Fubini theorem (Fintype.prod_sum):
Σ_{f : I → V} Πᵢ g(i, f(i)) = Πᵢ (Σ_v g(i, v)).
The filter [f(i) = v] is encoded into the product via a zero/one trick:
define g(j, w) = (j = i ? (w = v ? exp(s_i(v)) : 0) : exp(s_j(w))), so that
when f(i) ≠ v the product vanishes (Fubini still applies), and when
f(i) = v the product equals ∏_j exp(s_j(f(j))).
A coupled evaluation is genuinely coupled at index i iff there exist
two assignments agreeing at i but with different total scores.
This is the observable signature of non-factorization: the score of the
assignment at position i depends on the values at other positions.
Equations
- cs.genuinelyCoupled i = ∃ (f : I → V) (g : I → V), f i = g i ∧ cs.totalScore f ≠ cs.totalScore g
Instances For
Construct a CoupledSoftmax from MaxEnt grammar components.
componentScore i v = classicalScore(inputs(i), v)couplingScore f = systemicScore(f)
This shows that the MaxEnt systemic constraint framework from
Theories.Phonology.HarmonicGrammar.MaxEnt is an instance of
the general coupled evaluation pattern. The marginal_eq_classical_when_no_systemic
theorem in MaxEnt.lean is a corollary of marginal_eq_independent_when_uncoupled.
Equations
- One or more equations did not get rendered due to their size.