@cite{flemming-2021}: Comparing MaxEnt and Noisy Harmonic Grammar #
@cite{flemming-2021}
@cite{flemming-2021} compares three stochastic Harmonic Grammar variants — MaxEnt, Noisy HG (NHG), and Normal MaxEnt — identifying logit uniformity as the diagnostic that distinguishes them.
The three models as Random Utility Models #
All three HG variants are Random Utility Models (RUMs) differing only in the noise distribution added to the deterministic harmony scores:
| Model | Noise target | Distribution | Binary P | Reference |
|---|---|---|---|---|
| MaxEnt | candidates | Gumbel | logistic(H−H') | maxent_eq_gumbelRUM |
| NHG | weights | Gaussian | Φ((H−H')/σ_d) | nhg_choiceProb_eq |
| Normal MaxEnt | candidates | Gaussian | Φ((H−H')/(ε√2)) | normalMaxEnt_choiceProb_eq |
Key diagnostic: logit uniformity #
MaxEnt exhibits logit uniformity (eq (10)): adding one violation of
constraint j changes the logit by exactly −wⱼ, regardless of the tableau
context. This follows from the log-odds identity (logit_uniformity):
log(P(a)/P(b)) = H(a) − H(b)
NHG violates logit uniformity because its noise standard deviation
σ_d = σ · √(Σ(cⱼ(a)−cⱼ(b))²) (nhgSigmaD) depends on the violation
difference profile. The same harmony difference ΔH produces different
probits ΔH/σ_d in different contexts.
Normal MaxEnt has probit uniformity (constant σ_d = ε√2) rather than logit uniformity, leading to probit (Φ) rather than logistic probability functions — an empirically distinguishable prediction.
French schwa data #
Flemming tests logit uniformity on French schwa deletion across 8 phonological contexts with 6 constraints (Table (35)). Contexts that share the same *Clash violation difference should show the same logit difference under MaxEnt. We encode this data and verify:
logit_uniformity_clash: the *Clash contribution to the harmony difference is identical across all four paired contexts (MaxEnt prediction)nhg_sigmaD_sq_varies: the NHG noise variance σ_d² differs between paired contexts, violating probit uniformity (NHG prediction)
MaxEnt = Gumbel RUM (@cite{flemming-2021} §4/§10): MaxEnt probability is exactly the McFadden integral with Gumbel scale β = 1.
This formalizes the RUM connection: MaxEnt adds i.i.d. Gumbel noise to
candidate harmonies, and by McFadden's theorem
(mcfaddenIntegral_eq_softmax), the resulting choice probability is
softmax — i.e., the standard MaxEnt formula.
Flemming's eq (10): logit(P_a) = h_a − h_b.
The MaxEnt logit-harmony identity. Alias for maxent_logit_harmony.
MaxEnt ratio independence (IIA): P(a)/P(b) = exp(H(a) − H(b)).
The probability ratio depends only on the candidates' own scores,
not on any other candidates. Corollary of softmax_odds with α = 1.
MaxEnt binary logistic (@cite{flemming-2021} eq (9)/(11)): with two candidates, MaxEnt probability is the logistic function of the harmony difference.
P(0) = 1 / (1 + e^{-(H(0) − H(1))}) = logistic(H(0) − H(1))
Corollary of softmax_binary with α = 1.
Violation difference matrix: ə candidate minus ∅ candidate. Rows = 8 contexts, columns = 6 constraints. Constraint order: 0=NoSchwa, 1=*CCC, 2=*Clash, 3=Max, 4=Dep, 5=*Cluster. Table (35) from @cite{flemming-2021}, data from @cite{smith-pater-2020}.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The four *Clash pairs: contexts that differ only in *Clash (index 2). Each pair is (without *Clash, with *Clash).
Equations
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.clashPairs 0 = (0, 1)
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.clashPairs 1 = (2, 3)
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.clashPairs 2 = (4, 5)
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.clashPairs 3 = (6, 7)
Instances For
*Clash pairs differ only in the *Clash column (index 2): for each pair, all non-*Clash violations are identical.
The *Clash violation difference is exactly 1 for all pairs.
Logit uniformity for *Clash (@cite{flemming-2021} §7.1): the *Clash contribution to the harmony difference is the same across all four paired contexts.
For any weights w, the harmony difference change between paired
contexts = −w₂ (*Clash weight), independent of context. This follows
from clash_pairs_identical_except_clash: since non-*Clash violations
are identical in each pair, their weighted contributions cancel,
leaving only −w₂ · 1 = −w₂.
This is a special case of me_predicts_hz (Separability.lean):
the *Clash violation differences are column-insensitive (constant
across paired contexts), so the weighted sum satisfies the
constant-difference identity.
Observed probability of schwa realization across 8 contexts. Data from @cite{smith-pater-2020} (Table 2 of @cite{flemming-2021}).
Values are approximate proportions (hundredths). The key pattern: within each *Clash pair, the +*Clash context always has higher P(schwa), consistent with the *Clash constraint favoring schwa insertion.
Equations
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 0 = 9 / 100
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 1 = 12 / 100
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 2 = 68 / 100
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 3 = 83 / 100
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 4 = 56 / 100
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 5 = 65 / 100
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 6 = 91 / 100
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.observedP 7 = 94 / 100
Instances For
Adding a *Clash violation increases P(schwa) in every paired context.
Sum of squared violation differences for a context.
This is the study-local analogue of violationDiffSqSumQ from
NoisyHG.lean: both compute Σⱼ (cⱼ(ə) − cⱼ(∅))², but schwaSqSum
operates on the pre-computed difference matrix schwaDiff (Table (35))
rather than a WeightedConstraint list.
Equations
- One or more equations did not get rendered due to their size.
Instances For
NHG noise variance σ_d² is context-dependent: without *Clash, the squared violation sum is 3; with *Clash, it is 4. The same *Clash violation change produces different σ_d values in different tableaux — σ_d = √3 vs σ_d = 2 (Table 3 of @cite{flemming-2021}).
NHG probit change when moving from one context to another:
the change in the probit Φ⁻¹(P) = Δh / σ_d when σ_d changes.
h_init = initial harmony difference, Δh = harmony change (e.g., −w_Clash),
σ_d / σ_d' = noise s.d. before/after the change.
Equations
Instances For
Probit non-uniformity (@cite{flemming-2021} §7.2): when σ_d ≠ σ_d',
the NHG probit change depends on the initial harmony difference h_init.
Two contexts with different initial harmonies h₁ ≠ h₂ but the same
*Clash change Δh produce different probit changes. This is because
the denominator shift (σ_d → σ_d') rescales the existing harmony
difference differently depending on its magnitude.
Concretely, for French schwa with σ = 1 (@cite{flemming-2021} §7.2): adding a *Clash violation changes σ_d from √3 to 2 in all pairs, but the initial harmony difference h_ə − h_∅ differs between pairs (e.g., −2.2 for pair (0,1) vs 0.01 for pair (4,5)), so the probit changes differ despite the same *Clash change.
Probit change decomposition (@cite{flemming-2021} eq (38b)): the NHG probit change decomposes into a context-dependent term (proportional to initial harmony difference) and a uniform term.
Δprobit = h · (σ_d − σ_d') / (σ_d · σ_d') + Δh / σ_d'
The first term is why NHG violates probit uniformity: it depends on
h_init, which varies across contexts.
Equations
- One or more equations did not get rendered due to their size.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Candidates b and c have equal harmony: H(b) = H(c) = −16.
NHG noise variances differ: σ²_d(b−a) = 5 ≠ 3 = σ²_d(c−a). Equal-harmony candidates can have different NHG probabilities.
In MaxEnt, equal harmony implies equal probability: since
softmax(s, α, b) = exp(α·s(b)) / Σ exp(α·s(i)), candidates with
the same score get the same numerator and hence the same probability.
This is the MaxEnt half of the §9 contrast: MaxEnt assigns
P(b) = P(c) (both have H = −16), while NHG assigns P(b) ≠ P(c)
because their noise variances differ (table45_nhg_variance_differs).
NHG noise covariance value: Cov(ε_b−ε_a, ε_c−ε_a) = 3σ².
The paper (@cite{flemming-2021} §9, p. 37) computes Cov(ε_a−ε_b, ε_c−ε_b) = 2σ²
using candidate b as reference. Our formalization uses candidate a as
reference, giving 3σ² — a different but equally valid demonstration
that the covariance matrix is non-diagonal.
NHG noise covariance is non-zero: Cov(ε_b−ε_a, ε_c−ε_a) ≠ 0. The multivariate normal over score differences has a non-diagonal covariance matrix, so binary comparisons don't determine the joint distribution — NHG violates IIA (@cite{flemming-2021} §9).
The /∅/ square: contexts 0–3 (underlying /∅/, varying onset × stress).
Equations
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.schwaSquareNull = { tl := 0, tr := 1, bl := 2, br := 3 }
Instances For
The /ə/ square: contexts 4–7 (underlying /ə/, varying onset × stress).
Equations
- Phenomena.PhonologicalAlternation.Studies.Flemming2021.schwaSquareSchwa = { tl := 4, tr := 5, bl := 6, br := 7 }
Instances For
Violation differences satisfy independence on the /∅/ square: each of the 6 constraints is insensitive to either onset (row) or stress (column).
Violation differences satisfy independence on the /ə/ square.
HZ's generalization for French schwa (/∅/ square):
for any MaxEnt weights, the logit-rate difference across onset
types is constant across stress contexts. Derived from
me_predicts_hz + schwaNull_independence.
HZ's generalization for French schwa (/ə/ square).