Documentation

Linglib.Theories.Phonology.HarmonicGrammar.Separability

Separable Harmonies and HZ's Generalization @cite{magri-2025} #

@cite{magri-2025} "Constraint Interaction in Probabilistic Phonology: Deducing Maximum Entropy Grammars from Hayes and Zuraw's Shifted Sigmoids Generalization" (Linguistic Inquiry, Early Access).

Overview #

Hayes and Zuraw (@cite{zuraw-hayes-2017}; @cite{hayes-2022}) observe that the rates of application of variable phonological processes governed by independent factors can be fit onto shifted sigmoids at shared abscissas. @cite{magri-2025} reformulates this as a constant logit-rate difference condition and proves a biconditional: within harmony-based probabilistic phonology, a harmony function predicts HZ's generalization if and only if it is separable — it decomposes into a product of powers of unary functions, each fed by a single constraint.

Key definitions #

Connection to existing infrastructure #

The forward direction leverages logit_uniformity (in NoisyHG.lean) and maxent_logit_harmony, which already prove that MaxEnt log-odds equal harmony score differences. @cite{magri-2025}'s contribution is showing this is the only mode of constraint interaction with this property (the backward direction).

Bridge to List-based API (§11) #

The bridge theorems connect the Fin-indexed separability theory to the List-based MaxEnt API (harmonyScore/harmonyScoreR from Basic.lean):

These enable applying separability results (independence → HZ, rescaling) to any WeightedConstraint list without re-proving in Fin-indexed form.

A square of four underlying forms indexed by two binary factors (row = top/bottom, column = left/right). This is's eq. (12): the four forms x^{TL}, x^{TR}, x^{BL}, x^{BR} arranged so that rows and columns correspond to independent phonological dimensions (e.g., prefix identity × stem-initial obstruent quality).

Row and Col are the two binary factors.

  • tl : X

    Top-left form (e.g., /maŋ+b/).

  • tr : X

    Top-right form (e.g., /maŋ+k/).

  • bl : X

    Bottom-left form (e.g., /paŋ+b/).

  • br : X

    Bottom-right form (e.g., /paŋ+k/).

Instances For

    A constraint C is insensitive to the row dimension of a square: it assigns the same violation count to forms that share a column. Cf. Figure 4a.

    Equations
    Instances For

      A constraint C is insensitive to the column dimension of a square: it assigns the same violation count to forms that share a row. Cf. Figure 4b.

      Equations
      Instances For
        def Theories.Phonology.HarmonicGrammar.ConstraintIndependence {n : } {X : Type} (constraints : Fin nX) (sq : Square X) :

        Constraint independence (§2.4): the rows and columns of the square are independent dimensions relative to a constraint set — each constraint is insensitive to at least one dimension (row or column). No constraint can encode an interaction between the two dimensions.

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          HZ's generalization (eq. 13): the difference between logit rates of application for two underlying forms in the same row (or column) does not depend on the choice of row (or column).

          Equivalently: for a function LR giving logit rates, LR(x^TL) − LR(x^TR) = LR(x^BL) − LR(x^BR).

          Equations
          Instances For

            Independence of violation differences: if a constraint is insensitive to a dimension, its violation differences are too.

            The violation difference Δₖ(x) = Cₖ(x, NO) − Cₖ(x, YES) inherits insensitivity from the raw constraint, because both YES and NO get the same violation count for forms in the same row (or column).

            Equations
            Instances For
              theorem Theories.Phonology.HarmonicGrammar.me_predicts_hz {n : } {X : Type} (w : Fin n) (Δ : Fin nX) (sq : Square X) (hind : ViolDiffIndependence Δ sq) :
              ConstantLogitDiff (fun (x : X) => k : Fin n, w k * Δ k x) sq

              ME predicts HZ (§3.6, eq. 22): when the violation differences satisfy independence (inherited from constraint independence), the weighted sum of violation differences — which equals the ME logit probability by maxent_logit_harmony — satisfies the constant-difference identity.

              The proof follows eq. (18): split the sum by constraint, and for each k, independence ensures the k-th term contributes equally to both sides of the identity.

              An n-ary harmony function H is separable if it decomposes into a product of powers of unary functions, each attending to a single constraint (eq. 30):

              H(C₁(x,y), …, Cₙ(x,y)) = ∏ₖ (hₖ(Cₖ(x,y)))^{wₖ}

              Each hₖ must be positive, normalized (hₖ(0) = 1), and decreasing (more violations → lower harmony).

              • w : Fin n

                Constraint weights.

              • h : Fin n

                Per-constraint rescaling functions.

              • h_pos (k : Fin n) (v : ) : 0 < self.h k v

                Each hₖ is positive.

              • h_norm (k : Fin n) : self.h k 0 = 1

                Normalized: hₖ(0) = 1.

              • h_decreasing (k : Fin n) (a b : ) : a < bself.h k b < self.h k a

                Decreasing: more violations → lower score.

              Instances For

                Evaluate a separable harmony on a violation profile.

                Equations
                Instances For

                  The separable harmony at the zero profile is 1 (normalization).

                  The ME separable harmony: each hₖ(x) = exp(−x) (the exponential-of-opposite function from Figure 5a). This gives H_ME(v) = ∏ₖ (exp(−vₖ))^{wₖ} = exp(−Σ wₖvₖ).

                  Equations
                  Instances For
                    theorem Theories.Phonology.HarmonicGrammar.me_separable_eval {n : } (w : Fin n) (v : Fin n) :
                    (meSeparable n w).eval v = Real.exp (-k : Fin n, w k * (v k))

                    The ME separable harmony agrees with the standard ME harmony score (up to positive scaling by exp).

                    meSeparable.eval(v) = exp(−Σₖ wₖ · vₖ)

                    Proof: ∏ₖ (exp(−vₖ))^{wₖ} = ∏ₖ exp(−wₖvₖ) = exp(−Σₖ wₖvₖ), using rpow = exp(w · log(exp(−v))) = exp(−wv).

                    Constraint rescaling (eq. 33): given a separable harmony with unary functions hₖ, the rescaled constraint is Ĉₖ = −log(hₖ(Cₖ)).

                    The rescaled constraints are nonneg (since hₖ ∈ (0,1] for v ≥ 0) and preserve the ordering on violation profiles.

                    Equations
                    Instances For

                      hₖ(v) ≤ 1 for all v: follows from decreasingness and hₖ(0) = 1.

                      Rescaled constraints are nonneg: since hₖ(v) ∈ (0, 1], −log(hₖ(v)) ≥ 0.

                      Rescaled constraints at 0 are 0: Ĉₖ(0) = −log(1) = 0.

                      theorem Theories.Phonology.HarmonicGrammar.meSeparable_rescale {n : } (w : Fin n) (k : Fin n) (v : ) :
                      (meSeparable n w).rescale k v = v

                      ME rescaling is the identity: since hₖ = exp(−·), Ĉₖ(v) = −log(exp(−v)) = v.

                      theorem Theories.Phonology.HarmonicGrammar.separable_eq_me_rescaled {n : } (H : SeparableHarmony n) (v : Fin n) :
                      H.eval v = Real.exp (-k : Fin n, H.w k * H.rescale k (v k))

                      Any separable harmony is ME under rescaling ( eq. 34): H(C₁, …, Cₙ) = H_ME(Ĉ₁, …, Ĉₙ) where Ĉₖ = −log hₖ(Cₖ).

                      This is the key insight: the choice of hₖ only affects how constraints are rescaled, not how they interact. All separable harmonies have the same mode of constraint interaction as ME.

                      theorem Theories.Phonology.HarmonicGrammar.separable_predicts_hz {n : } {X : Type} (H : SeparableHarmony n) (C_yes C_no : Fin nX) (sq : Square X) (hind : ViolDiffIndependence (fun (k : Fin n) (x : X) => H.rescale k (C_no k x) - H.rescale k (C_yes k x)) sq) :
                      ConstantLogitDiff (fun (x : X) => Real.log ((H.eval fun (k : Fin n) => C_yes k x) / H.eval fun (k : Fin n) => C_no k x)) sq

                      Separable harmonies predict HZ (§5.4): for any separable harmony H, if the rescaled violation differences Δ̂ₖ(x) = Ĉₖ(Cₖ(x,NO)) − Ĉₖ(Cₖ(x,YES)) satisfy independence on a square, then the logit rate log(H(v_YES)/H(v_NO)) satisfies HZ's constant-difference identity.

                      The proof composes two results:

                      1. separable_eq_me_rescaled: H(v) = exp(−Σ wₖĈₖ(vₖ))
                      2. me_predicts_hz: weighted sums with independent differences satisfy constant logit-rate differences

                      Since log(exp(a)/exp(b)) = a − b, the logit rate is a weighted sum of rescaled violation differences, and me_predicts_hz applies.

                      The inverse function h(x) = 1/(1+x) used in the non-separable harmony H(v) = 1 / (1 + Σ wₖvₖ) (eq. 27). Like ME's exp(−x), it is positive, normalized, and decreasing — but the resulting harmony is not separable.

                      Equations
                      Instances For

                        The inverse function is positive for nonneg arguments.

                        The inverse function is normalized: h(0) = 1.

                        The inverse function is strictly decreasing on [0, ∞).

                        noncomputable def Theories.Phonology.HarmonicGrammar.nonSeparableInverseHarmony {n : } (w : Fin n) (v : Fin n) :

                        The non-separable harmony using the inverse function: H(v) = 1 / (1 + Σₖ wₖ · vₖ) (eq. 27).

                        This has the form H = h(Σ wₖCₖ) (eq. 26) with h = inverseFunction, which is not separable because the single h sees the sum of all weighted violations, not each constraint individually.

                        Equations
                        Instances For

                          Counterexample (§4.4, online appendix D.1): the non-separable inverse harmony does not predict HZ's generalization in general.

                          For the inverse harmony H(v) = 1/(1 + Σ wₖvₖ), the logit probability is log((1 + S_NO(x)) / (1 + S_YES(x))) where S_y(x) = Σₖ wₖ Cₖ(x,y). HZ holds iff the cross-product (1+S_NO(tl))(1+S_YES(tr))(1+S_YES(bl))(1+S_NO(br)) equals (1+S_YES(tl))(1+S_NO(tr))(1+S_NO(bl))(1+S_YES(br)).

                          With Tagalog constraints and w₅ ≠ w₆ (weights 1,1,1,1,1,2), the cross-products are 72 and 60, disproving HZ.

                          theorem Theories.Phonology.HarmonicGrammar.harmonyScoreR_as_finsum {C : Type} (constraints : List (WeightedConstraint C)) (c : C) :
                          harmonyScoreR constraints c = -i : Fin constraints.length, (constraints.get i).weight * ((constraints.get i).eval c)

                          harmonyScoreR as a Fin-indexed weighted sum: the List-based harmony score equals a negated Finset.sum over Fin-indexed constraint weights and violations.

                          This is the key List→Fin conversion for applying separability theory to concrete WeightedConstraint lists.

                          theorem Theories.Phonology.HarmonicGrammar.exp_harmonyScoreR_eq_me_separable {C : Type} (constraints : List (WeightedConstraint C)) (c : C) :
                          Real.exp (harmonyScoreR constraints c) = (meSeparable constraints.length fun (i : Fin constraints.length) => (constraints.get i).weight).eval fun (i : Fin constraints.length) => (constraints.get i).eval c

                          MaxEnt unnormalized probability is separable harmony: exp(harmonyScoreR constraints c) = meSeparable.eval v where weights and violations are drawn from the constraint list.

                          Since meSeparable.eval v = exp(-∑ wₖvₖ) (me_separable_eval) and harmonyScoreR c = -∑ wₖvₖ (harmonyScoreR_as_finsum), the exponential of the List-based harmony is exactly the Fin-indexed separable harmony evaluation.

                          This makes all separability theory — constraint independence → HZ (separable_predicts_hz), constraint rescaling (separable_eq_me_rescaled) — directly applicable to any WeightedConstraint list.

                          theorem Theories.Phonology.HarmonicGrammar.maxent_logit_as_finsum {C : Type} [Fintype C] [Nonempty C] (constraints : List (WeightedConstraint C)) (a b : C) :
                          Real.log (Core.softmax (harmonyScoreR constraints) 1 a / Core.softmax (harmonyScoreR constraints) 1 b) = -i : Fin constraints.length, (constraints.get i).weight * (((constraints.get i).eval a) - ((constraints.get i).eval b))

                          MaxEnt logit rate as Fin-indexed weighted violation differences: the logit of MaxEnt probabilities is a negated weighted sum of violation differences, expressed as a Finset.sum over Fin indices.

                          This bridges logit_uniformity (NoisyHG.lean) with me_predicts_hz: since the logit is a weighted sum, it satisfies HZ whenever the violation differences satisfy ViolDiffIndependence.

                          Composition: logit_uniformityharmonyScoreR_as_finsum → algebra.

                          theorem Theories.Phonology.HarmonicGrammar.constantLogitDiff_mono_consistent {X : Type} (d : X) (f : ) (hf : StrictMono f) (sq : Square X) (hcld : ConstantLogitDiff d sq) (hne : d sq.tl d sq.bl) :
                          (f (d sq.tl) - f (d sq.bl)) * (f (d sq.tr) - f (d sq.br)) > 0

                          Consistent ordering from constant logit-rate differences: if a score function d satisfies ConstantLogitDiff and f is strictly monotone, then the f-transformed scores exhibit across-the-board consistency — the product of same-column differences is positive.

                          This is the mathematical core of why HG produces sigmoid families (@cite{zuraw-hayes-2017} §2.5): any strictly monotone probability transform (MaxEnt's softmax, NHG's normal CDF, Normal MaxEnt's probit) applied to scores with constant logit-rate differences preserves the "across-the-board" ordering pattern. Differences may compress at floor and ceiling (producing sigmoids rather than claws), but they never change sign.