@cite{goldwater-johnson-2003}: Learning OT Constraint Rankings Using a Maximum Entropy Model #

@cite{goldwater-johnson-2003}

@cite{goldwater-johnson-2003} introduce Maximum Entropy models to constraint-based phonology. Their eq (1) defines the conditional probability of output y given input x as:

Pr(y|x) = exp(Σ wᵢfᵢ(y,x)) / Z(x)

This IS softmax(harmonyScoreR constraints, 1) — the same softmax from Core.Agent.RationalAction used throughout linglib for RSA pragmatics. The phonology–pragmatics connection is structural: both are log-linear models over weighted features, differing only in what the features measure.

Key contributions formalized here #

MaxEnt = softmax (§1): MaxEntGrammar.prob is eq (1) by definition.
Log-likelihood (§2): The learning objective is log pseudo-likelihood, which decomposes as Σⱼ (H(yⱼ,xⱼ) − logSumExp(H(·,xⱼ))) — the harmony of each observed output minus the log-partition function.
Concavity (§2): Log-likelihood is concave in each weight (concavity via logConditional_concaveOn). The decomposition log P(y|x;w) = affine(wⱼ) − logSumExpOffset(s,r,wⱼ) makes this affine minus convex = concave, guaranteeing a unique global maximum.
Learning gradient = E[feature] (§3): The derivative of log-likelihood w.r.t. each weight equals observed minus expected feature value (gradient via hasDerivAt_logConditional). At the optimum, observed feature counts equal expected feature counts.
Wolof data (§4): The learned weights for a categorical grammar are exponentially separated (ExponentiallySeparated wolofWeights 1), confirming that MaxEnt reproduces OT for categorical data — the empirical counterpart of maxent_ot_limit.

source

theorem Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.eq1_is_softmax {I O : Type} [Fintype O] [Nonempty O] (g : Theories.Phonology.HarmonicGrammar.MaxEntGrammar I O) (i : I) (o : O) :

g.prob i o = Core.softmax (fun (o' : O) => Theories.Phonology.HarmonicGrammar.harmonyScoreR g.constraints (i, o')) 1 o

@cite{goldwater-johnson-2003} eq (1) is MaxEntGrammar.prob by definition — both are softmax(harmonyScoreR, 1).

The same softmax function powers RSA pragmatic reasoning (Core.Agent.RationalAction): both phonological grammar and pragmatic inference are log-linear models over weighted features.

source

noncomputable def Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.logPseudoLikelihood {I O : Type} [Fintype O] [Nonempty O] (g : Theories.Phonology.HarmonicGrammar.MaxEntGrammar I O) (data : List (I × O)) :

ℝ

Log pseudo-likelihood of training data under a MaxEnt grammar (eq (2)).

logPL(w) = Σⱼ log P_w(yⱼ | xⱼ)

Each term decomposes as H(yⱼ, xⱼ) − logSumExp(H(·, xⱼ), 1) via softmax_exponential_family.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

noncomputable def Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.regularizedObjective {I O : Type} [Fintype O] [Nonempty O] (g : Theories.Phonology.HarmonicGrammar.MaxEntGrammar I O) (data : List (I × O)) (σ : ℝ) :

ℝ

Regularized objective (eq (3)): log-likelihood minus L2 penalty.

J(w) = logPL(w) − Σᵢ (wᵢ − μᵢ)² / (2σᵢ²)

@cite{goldwater-johnson-2003} use μᵢ = 0, σᵢ = σ for all constraints.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

theorem Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.concavity {ι : Type u_1} [Fintype ι] [Nonempty ι] (s r : ι → ℝ) (y : ι) :

ConcaveOn ℝ Set.univ fun (wⱼ : ℝ) => wⱼ * s y + r y - Core.logSumExpOffset s r wⱼ

Concavity of log-likelihood (fn. 4): the log conditional likelihood of a single observation is concave in each weight, guaranteeing a unique global maximum.

When all weights except wⱼ are held fixed, log P(y|x;w) decomposes as (wⱼ · sᵧ + rᵧ) − logSumExpOffset(s, r, wⱼ) where sᵢ = −cⱼ(yᵢ,x) and rᵢ = Σₖ≠ⱼ wₖ(−cₖ(yᵢ,x)). The first term is affine (hence concave); the second is convex (logSumExpOffset_convex). Concave − convex = concave.

This is logConditional_concaveOn from Core.Agent.RationalAction.

source

theorem Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.gradient {ι : Type u_1} [Fintype ι] [Nonempty ι] (s r : ι → ℝ) (y : ι) (wⱼ : ℝ) :

HasDerivAt (fun (w : ℝ) => w * s y + r y - Core.logSumExpOffset s r w) (s y - ∑ i : ι, Core.softmaxOffset s r wⱼ i * s i) wⱼ

Learning gradient = observed − expected (§2, §4.2):

∂/∂wⱼ log P(y|x) = sᵧ − Σᵢ softmaxOffset(s,r,wⱼ)ᵢ · sᵢ

where sᵢ = −cⱼ(yᵢ,x) (negated violation count of constraint j on candidate i) and rᵢ = Σₖ≠ⱼ wₖ(−cₖ(yᵢ,x)) (contribution of other constraints, constant w.r.t. wⱼ).

At the maximum, this derivative is zero, so sᵧ = E_P[s]: the observed feature value equals the expected feature value.

This is hasDerivAt_logConditional from Core.Agent.RationalAction.

source

def Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.wolofWeights :

Fin 5 → ℚ

Wolof tongue-root harmony constraints (§3.1, from Boersma 1999).

The five constraints in ranked order, with learned MaxEnt weights from Table 1 (nσ² ≈ 1,200,000).

Equations

Instances For

source

theorem Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.wolof_pos (i : Fin 5) :

0 < wolofWeights i

All Wolof weights are positive.

source

theorem Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003.wolof_separated :

Theories.Phonology.HarmonicGrammar.ExponentiallySeparated wolofWeights 1

The learned Wolof weights are exponentially separated (M = 1): each weight exceeds the sum of all lower-ranked weights.

This is the empirical counterpart of maxent_ot_limit: for a categorical grammar (no free variation), MaxEnt learning produces weights that satisfy ExponentiallySeparated, recovering OT's strict ranking. The theoretical direction is:

ExponentiallySeparated ⟹ lex_imp_lower_violations ⟹ HG = OT

Here we verify the converse empirically: OT-like data ⟹ learning produces ExponentiallySeparated weights.

Documentation

Linglib.Phenomena.PhonologicalAlternation.Studies.GoldwaterJohnson2003

@cite{goldwater-johnson-2003}: Learning OT Constraint Rankings Using a Maximum Entropy Model #

Key contributions formalized here #