Documentation

Linglib.Theories.Phonology.HarmonicGrammar.NoisyHG

Noisy Harmonic Grammar and Normal MaxEnt #

@cite{boersma-pater-2016} @cite{flemming-2021}

Noisy HG (@cite{boersma-pater-2016}) adds i.i.d. Gaussian noise N(0,σ²) to each constraint weight before evaluation. For binary choice between candidates a and b, the harmony score difference H(a)−H(b) becomes a Gaussian random variable with variance σ² · Σⱼ(cⱼ(a)−cⱼ(b))² — the noise is context-dependent, scaling with squared violation differences.

Normal MaxEnt (@cite{flemming-2021}) instead adds i.i.d. Gaussian noise N(0,ε²) directly to candidate scores, giving a constant noise standard deviation σ_d = ε√2 for binary choice.

Both are instances of Thurstone Case V (Core.Thurstone).

MaxEnt logit-harmony identity #

For classical MaxEnt (Gumbel noise → softmax), the log-odds ratio between any two candidates equals their harmony score difference:

log(P(a)/P(b)) = H(a) − H(b)

This implies logit uniformity (@cite{flemming-2021} §5.1, eq (10)): adding one violation of constraint j changes the logit by exactly −wⱼ, regardless of the violation profile elsewhere. NHG lacks this property because its noise variance σ_d depends on the violation profile.

noncomputable def Theories.Phonology.HarmonicGrammar.violationDiffSqSum {C : Type} (constraints : List (WeightedConstraint C)) (a b : C) :

Sum of squared violation differences between two candidates. This determines the NHG noise variance: σ_d² = σ² · violationDiffSqSum.

Equations
  • One or more equations did not get rendered due to their size.
Instances For

    Sum of squared violation differences (ℚ, computable). Use this for concrete examples with native_decide.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      noncomputable def Theories.Phonology.HarmonicGrammar.nhgSigmaD {C : Type} (constraints : List (WeightedConstraint C)) (sigma : ) (a b : C) :

      NHG noise standard deviation for binary choice: σ_d = σ · √(Σⱼ (cⱼ(a) − cⱼ(b))²).

      The noise is context-dependent: it scales with the violation difference profile, not just the per-weight noise σ.

      Equations
      Instances For
        noncomputable def Theories.Phonology.HarmonicGrammar.nhgAsThurstoneV {C : Type} (constraints : List (WeightedConstraint C)) (sigma : ) (hsigma : 0 < sigma) (a b : C) (h_diff : 0 < violationDiffSqSum constraints a b) :

        NHG binary choice as a Thurstone Case V model.

        • Scale values are harmony scores: scale(0) = H(a), scale(1) = H(b)
        • Thurstone σ = σ_d / √2, so that Thurstone's Φ(d/(σ_T·√2)) equals NHG's Φ((H(a)−H(b))/σ_d).

        Requires non-zero violation differences (otherwise σ_d = 0 and the choice is deterministic).

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          theorem Theories.Phonology.HarmonicGrammar.nhg_choiceProb_eq {C : Type} (constraints : List (WeightedConstraint C)) (sigma : ) (hsigma : 0 < sigma) (a b : C) (h_diff : 0 < violationDiffSqSum constraints a b) :
          (nhgAsThurstoneV constraints sigma hsigma a b h_diff).choiceProb 0 1 = Core.normalCDF ((harmonyScoreR constraints a - harmonyScoreR constraints b) / nhgSigmaD constraints sigma a b)

          NHG binary choice probability (@cite{flemming-2021} eq (15)): P(a ≻ b) = Φ((H(a) − H(b)) / σ_d).

          The Thurstone σ is set to σ_d / √2 so that Thurstone's Φ(d / (σ_T · √2)) reduces to Φ(d / σ_d).

          Normal MaxEnt noise standard deviation: σ_d = ε√2 (constant).

          When noise is added i.i.d. N(0,ε²) to each candidate score, the difference of two candidates is N(H(a)−H(b), 2ε²), giving σ_d = ε√2 regardless of the violation profile. This is the key distinction from NHG, where σ_d varies by context.

          Equations
          Instances For
            noncomputable def Theories.Phonology.HarmonicGrammar.normalMaxEntAsThurstoneV {C : Type} (constraints : List (WeightedConstraint C)) (epsilon : ) (heps : 0 < epsilon) (a b : C) :

            Normal MaxEnt binary choice as a Thurstone Case V model.

            Same as NHG but with constant σ_d = ε√2, so Thurstone σ = ε.

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              theorem Theories.Phonology.HarmonicGrammar.normalMaxEnt_choiceProb_eq {C : Type} (constraints : List (WeightedConstraint C)) (epsilon : ) (heps : 0 < epsilon) (a b : C) :
              (normalMaxEntAsThurstoneV constraints epsilon heps a b).choiceProb 0 1 = Core.normalCDF ((harmonyScoreR constraints a - harmonyScoreR constraints b) / normalMaxEntSigmaD epsilon)

              Normal MaxEnt binary choice probability (@cite{flemming-2021} eq (17)): P(a ≻ b) = Φ((H(a) − H(b)) / (ε√2)).

              Since the Thurstone σ is ε and normalMaxEntSigmaD ε = ε√2, Thurstone's Φ(d / (ε · √2)) directly gives the Normal MaxEnt formula.

              theorem Theories.Phonology.HarmonicGrammar.logit_uniformity {ι : Type} [Fintype ι] [Nonempty ι] (s : ι) (a b : ι) :
              Real.log (Core.softmax s 1 a / Core.softmax s 1 b) = s a - s b

              Logit uniformity (MaxEnt diagnostic; @cite{flemming-2021} §5.1): for softmax with α = 1, the log-odds between any two alternatives equals their score difference. Hence changing one score by −w changes the log-odds by exactly −w, regardless of context.

              This property characterizes MaxEnt among stochastic HG variants. NHG lacks it because its noise variance σ_d depends on the violation profile (see nhgSigmaD).

              See also maxent_logit_as_finsum (Separability.lean) for the Fin-indexed decomposition, and me_predicts_hz for the consequence that independent violation differences yield HZ's generalization.

              theorem Theories.Phonology.HarmonicGrammar.maxent_logit_harmony {C : Type} [Fintype C] [Nonempty C] (constraints : List (WeightedConstraint C)) (a b : C) :
              Real.log (Core.softmax (harmonyScoreR constraints) 1 a / Core.softmax (harmonyScoreR constraints) 1 b) = harmonyScoreR constraints a - harmonyScoreR constraints b

              MaxEnt logit-harmony identity: the log-odds ratio between two candidates equals their harmony score difference.

              log(P(a)/P(b)) = H(a) − H(b)

              Instantiation of logit_uniformity with harmony scores.

              theorem Theories.Phonology.HarmonicGrammar.maxent_iia {C : Type} [Fintype C] [Nonempty C] (constraints : List (WeightedConstraint C)) (a b : C) :
              Core.softmax (harmonyScoreR constraints) 1 a / Core.softmax (harmonyScoreR constraints) 1 b = Real.exp (harmonyScoreR constraints a - harmonyScoreR constraints b)

              Ratio independence (IIA for MaxEnt): the probability ratio between two candidates depends only on their own harmony scores.

              P(a)/P(b) = exp(H(a) − H(b))

              Adding or removing other candidates from the competition doesn't change the ratio. Corollary of softmax_odds with α = 1.

              theorem Theories.Phonology.HarmonicGrammar.harmonyScore_diff {C : Type} (constraints : List (WeightedConstraint C)) (a b : C) :
              harmonyScore constraints a - harmonyScore constraints b = -(List.map (fun (con : WeightedConstraint C) => con.weight * ((con.eval a) - (con.eval b))) constraints).sum

              Harmony difference decomposition: the harmony score difference equals the negated weighted sum of violation differences.

              H(a) − H(b) = −Σⱼ wⱼ · (cⱼ(a) − cⱼ(b))

              This is the bridge between abstract harmony scores and the constraint violation patterns used in empirical analyses (e.g., French schwa).

              theorem Theories.Phonology.HarmonicGrammar.harmonyScoreR_diff {C : Type} (constraints : List (WeightedConstraint C)) (a b : C) :
              harmonyScoreR constraints a - harmonyScoreR constraints b = -(List.map (fun (con : WeightedConstraint C) => con.weight * ((con.eval a) - (con.eval b))) constraints).sum

              Harmony difference decomposition in ℝ.

              Censored weight: noise is clamped so weights never go negative.

              In censored NHG, the noisy weight is max(0, w + n) where n ~ N(0,σ²). This prevents negative weights (which would reverse constraint preferences) and makes the effective noise variance depend on the weight magnitude: constraints with larger weights are less affected by censoring.

              Equations
              Instances For

                Censored weights are non-negative by construction.

                Censored weights are monotone in the underlying weight.

                theorem Theories.Phonology.HarmonicGrammar.censored_nhg_weight_sensitivity (w₁ w₂ : ) (hw : w₁ < w₂) :
                ∃ (n : ), censoredWeight w₁ n censoredWeight w₂ n

                Weight sensitivity (@cite{flemming-2021} §7.3): censoring is non-trivial — different weights produce different censored values for some noise realization.

                The witness is n = -w₁: this zeroes out w₁ but leaves w₂ > 0.

                noncomputable def Theories.Phonology.HarmonicGrammar.nhgCovariance {C : Type} (constraints : List (WeightedConstraint C)) (sigma : ) (a b c : C) :

                NHG noise covariance between two score differences relative to a reference candidate a:

                Cov(ε_b − ε_a, ε_c − ε_a) = σ² · Σₖ (cₖ(b) − cₖ(a)) · (cₖ(c) − cₖ(a))

                When this is non-zero, the score differences are correlated, and the joint distribution over 3+ candidates is multivariate normal with non-diagonal covariance — not reducible to independent binary comparisons. This is why NHG violates IIA for 3+ candidates (@cite{flemming-2021} §9).

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  NHG covariance (ℚ, computable).

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For
                    theorem Theories.Phonology.HarmonicGrammar.nhgCovariance_self {C : Type} (constraints : List (WeightedConstraint C)) (sigma : ) (a b : C) :
                    nhgCovariance constraints sigma a b b = sigma ^ 2 * violationDiffSqSum constraints b a

                    The NHG self-covariance Cov(ε_b − ε_a, ε_b − ε_a) equals the variance σ² · violationDiffSqSum, recovering the binary case.