Documentation

Linglib.Phenomena.PhonologicalAlternation.Studies.Flemming2021

@cite{flemming-2021}: Comparing MaxEnt and Noisy Harmonic Grammar #

@cite{flemming-2021}

@cite{flemming-2021} compares three stochastic Harmonic Grammar variants — MaxEnt, Noisy HG (NHG), and Normal MaxEnt — identifying logit uniformity as the diagnostic that distinguishes them.

The three models as Random Utility Models #

All three HG variants are Random Utility Models (RUMs) differing only in the noise distribution added to the deterministic harmony scores:

ModelNoise targetDistributionBinary PReference
MaxEntcandidatesGumbellogistic(H−H')maxent_eq_gumbelRUM
NHGweightsGaussianΦ((H−H')/σ_d)nhg_choiceProb_eq
Normal MaxEntcandidatesGaussianΦ((H−H')/(ε√2))normalMaxEnt_choiceProb_eq

Key diagnostic: logit uniformity #

MaxEnt exhibits logit uniformity (eq (10)): adding one violation of constraint j changes the logit by exactly −wⱼ, regardless of the tableau context. This follows from the log-odds identity (logit_uniformity):

log(P(a)/P(b)) = H(a) − H(b)

NHG violates logit uniformity because its noise standard deviation σ_d = σ · √(Σ(cⱼ(a)−cⱼ(b))²) (nhgSigmaD) depends on the violation difference profile. The same harmony difference ΔH produces different probits ΔH/σ_d in different contexts.

Normal MaxEnt has probit uniformity (constant σ_d = ε√2) rather than logit uniformity, leading to probit (Φ) rather than logistic probability functions — an empirically distinguishable prediction.

French schwa data #

Flemming tests logit uniformity on French schwa deletion across 8 phonological contexts with 6 constraints (Table (35)). Contexts that share the same *Clash violation difference should show the same logit difference under MaxEnt. We encode this data and verify:

MaxEnt = Gumbel RUM (@cite{flemming-2021} §4/§10): MaxEnt probability is exactly the McFadden integral with Gumbel scale β = 1.

This formalizes the RUM connection: MaxEnt adds i.i.d. Gumbel noise to candidate harmonies, and by McFadden's theorem (mcfaddenIntegral_eq_softmax), the resulting choice probability is softmax — i.e., the standard MaxEnt formula.

MaxEnt ratio independence (IIA): P(a)/P(b) = exp(H(a) − H(b)). The probability ratio depends only on the candidates' own scores, not on any other candidates. Corollary of softmax_odds with α = 1.

MaxEnt binary logistic (@cite{flemming-2021} eq (9)/(11)): with two candidates, MaxEnt probability is the logistic function of the harmony difference.

P(0) = 1 / (1 + e^{-(H(0) − H(1))}) = logistic(H(0) − H(1))

Corollary of softmax_binary with α = 1.

Violation difference matrix: ə candidate minus ∅ candidate. Rows = 8 contexts, columns = 6 constraints. Constraint order: 0=NoSchwa, 1=*CCC, 2=*Clash, 3=Max, 4=Dep, 5=*Cluster. Table (35) from @cite{flemming-2021}, data from @cite{smith-pater-2020}.

Equations
  • One or more equations did not get rendered due to their size.
Instances For

    *Clash pairs differ only in the *Clash column (index 2): for each pair, all non-*Clash violations are identical.

    The *Clash violation difference is exactly 1 for all pairs.

    theorem Phenomena.PhonologicalAlternation.Studies.Flemming2021.logit_uniformity_clash (w : Fin 6) (pair : Fin 4) :
    j : Fin 6, w j * (schwaDiff (clashPairs pair).2 j) - j : Fin 6, w j * (schwaDiff (clashPairs pair).1 j) = w 2

    Logit uniformity for *Clash (@cite{flemming-2021} §7.1): the *Clash contribution to the harmony difference is the same across all four paired contexts.

    For any weights w, the harmony difference change between paired contexts = −w₂ (*Clash weight), independent of context. This follows from clash_pairs_identical_except_clash: since non-*Clash violations are identical in each pair, their weighted contributions cancel, leaving only −w₂ · 1 = −w₂.

    This is a special case of me_predicts_hz (Separability.lean): the *Clash violation differences are column-insensitive (constant across paired contexts), so the weighted sum satisfies the constant-difference identity.

    Observed probability of schwa realization across 8 contexts. Data from @cite{smith-pater-2020} (Table 2 of @cite{flemming-2021}).

    Values are approximate proportions (hundredths). The key pattern: within each *Clash pair, the +*Clash context always has higher P(schwa), consistent with the *Clash constraint favoring schwa insertion.

    Equations
    Instances For

      Adding a *Clash violation increases P(schwa) in every paired context.

      Sum of squared violation differences for a context.

      This is the study-local analogue of violationDiffSqSumQ from NoisyHG.lean: both compute Σⱼ (cⱼ(ə) − cⱼ(∅))², but schwaSqSum operates on the pre-computed difference matrix schwaDiff (Table (35)) rather than a WeightedConstraint list.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        NHG noise variance σ_d² is context-dependent: without *Clash, the squared violation sum is 3; with *Clash, it is 4. The same *Clash violation change produces different σ_d values in different tableaux — σ_d = √3 vs σ_d = 2 (Table 3 of @cite{flemming-2021}).

        NHG probit change when moving from one context to another: the change in the probit Φ⁻¹(P) = Δh / σ_d when σ_d changes.

        h_init = initial harmony difference, Δh = harmony change (e.g., −w_Clash), σ_d / σ_d' = noise s.d. before/after the change.

        Equations
        Instances For
          theorem Phenomena.PhonologicalAlternation.Studies.Flemming2021.nhg_probit_change_depends_on_h_init (Δh σ_d σ_d' h₁ h₂ : ) ( : σ_d σ_d') (hσ_pos : 0 < σ_d) (hσ'_pos : 0 < σ_d') (hh : h₁ h₂) :
          nhgProbitChange h₁ Δh σ_d σ_d' nhgProbitChange h₂ Δh σ_d σ_d'

          Probit non-uniformity (@cite{flemming-2021} §7.2): when σ_d ≠ σ_d', the NHG probit change depends on the initial harmony difference h_init.

          Two contexts with different initial harmonies h₁ ≠ h₂ but the same *Clash change Δh produce different probit changes. This is because the denominator shift (σ_d → σ_d') rescales the existing harmony difference differently depending on its magnitude.

          Concretely, for French schwa with σ = 1 (@cite{flemming-2021} §7.2): adding a *Clash violation changes σ_d from √3 to 2 in all pairs, but the initial harmony difference h_ə − h_∅ differs between pairs (e.g., −2.2 for pair (0,1) vs 0.01 for pair (4,5)), so the probit changes differ despite the same *Clash change.

          theorem Phenomena.PhonologicalAlternation.Studies.Flemming2021.nhgProbitChange_decomp (h_init Δh σ_d σ_d' : ) (hσ_pos : 0 < σ_d) (hσ'_pos : 0 < σ_d') :
          nhgProbitChange h_init Δh σ_d σ_d' = h_init * (σ_d - σ_d') / (σ_d * σ_d') + Δh / σ_d'

          Probit change decomposition (@cite{flemming-2021} eq (38b)): the NHG probit change decomposes into a context-dependent term (proportional to initial harmony difference) and a uniform term.

          Δprobit = h · (σ_d − σ_d') / (σ_d · σ_d') + Δh / σ_d'

          The first term is why NHG violates probit uniformity: it depends on h_init, which varies across contexts.

          In MaxEnt, equal harmony implies equal probability: since softmax(s, α, b) = exp(α·s(b)) / Σ exp(α·s(i)), candidates with the same score get the same numerator and hence the same probability.

          This is the MaxEnt half of the §9 contrast: MaxEnt assigns P(b) = P(c) (both have H = −16), while NHG assigns P(b) ≠ P(c) because their noise variances differ (table45_nhg_variance_differs).

          NHG noise covariance value: Cov(ε_b−ε_a, ε_c−ε_a) = 3σ².

          The paper (@cite{flemming-2021} §9, p. 37) computes Cov(ε_a−ε_b, ε_c−ε_b) = 2σ² using candidate b as reference. Our formalization uses candidate a as reference, giving 3σ² — a different but equally valid demonstration that the covariance matrix is non-diagonal.

          NHG noise covariance is non-zero: Cov(ε_b−ε_a, ε_c−ε_a) ≠ 0. The multivariate normal over score differences has a non-diagonal covariance matrix, so binary comparisons don't determine the joint distribution — NHG violates IIA (@cite{flemming-2021} §9).

          The /∅/ square: contexts 0–3 (underlying /∅/, varying onset × stress).

          Equations
          Instances For

            The /ə/ square: contexts 4–7 (underlying /ə/, varying onset × stress).

            Equations
            Instances For

              Violation differences satisfy independence on the /∅/ square: each of the 6 constraints is insensitive to either onset (row) or stress (column).

              Violation differences satisfy independence on the /ə/ square.

              HZ's generalization for French schwa (/∅/ square): for any MaxEnt weights, the logit-rate difference across onset types is constant across stress contexts. Derived from me_predicts_hz + schwaNull_independence.

              HZ's generalization for French schwa (/ə/ square).