Separable Harmonies and HZ's Generalization @cite{magri-2025} #
@cite{magri-2025} "Constraint Interaction in Probabilistic Phonology: Deducing Maximum Entropy Grammars from Hayes and Zuraw's Shifted Sigmoids Generalization" (Linguistic Inquiry, Early Access).
Overview #
Hayes and Zuraw (@cite{zuraw-hayes-2017}; @cite{hayes-2022}) observe that the rates of application of variable phonological processes governed by independent factors can be fit onto shifted sigmoids at shared abscissas. @cite{magri-2025} reformulates this as a constant logit-rate difference condition and proves a biconditional: within harmony-based probabilistic phonology, a harmony function predicts HZ's generalization if and only if it is separable — it decomposes into a product of powers of unary functions, each fed by a single constraint.
Key definitions #
ConstraintIndependence: the formal constraint condition (§2.4) — each constraint is insensitive to at least one of the two dimensions of a 2×2 "square" of underlying formsConstantLogitDiff: HZ's generalization restated as constant differences of logit rates along each dimension (eq. 13)SeparableHarmony: a harmony function that decomposes asH(v) = ∏ₖ (hₖ(vₖ))^{wₖ}(eq. 30)meSeparable: ME harmony is separable (eq. 29), a corollary ofexp(sum) = prod(exp)separable_predicts_hz: the forward direction — separable harmonies predict HZ's generalization. Follows fromlogit_uniformity+ constraint rescalingseparable_eq_me_rescaled: any separable harmony is ME under constraint rescalingĈₖ = −log hₖ(Cₖ)(eq. 34)
Connection to existing infrastructure #
The forward direction leverages logit_uniformity (in NoisyHG.lean)
and maxent_logit_harmony, which already prove that MaxEnt log-odds
equal harmony score differences. @cite{magri-2025}'s contribution is
showing this is the only mode of constraint interaction with this
property (the backward direction).
Bridge to List-based API (§11) #
The bridge theorems connect the Fin-indexed separability theory to the
List-based MaxEnt API (harmonyScore/harmonyScoreR from Basic.lean):
harmonyScoreR_as_finsum: converts List-based harmony to Fin-indexed sumexp_harmonyScoreR_eq_me_separable:exp(harmonyScoreR) = meSeparable.evalmaxent_logit_as_finsum: MaxEnt logit = weighted violation difference sum
These enable applying separability results (independence → HZ, rescaling)
to any WeightedConstraint list without re-proving in Fin-indexed form.
A square of four underlying forms indexed by two binary factors
(row = top/bottom, column = left/right). This is's
eq. (12): the four forms x^{TL}, x^{TR}, x^{BL}, x^{BR} arranged
so that rows and columns correspond to independent phonological
dimensions (e.g., prefix identity × stem-initial obstruent quality).
Row and Col are the two binary factors.
- tl : X
Top-left form (e.g., /maŋ+b/).
- tr : X
Top-right form (e.g., /maŋ+k/).
- bl : X
Bottom-left form (e.g., /paŋ+b/).
- br : X
Bottom-right form (e.g., /paŋ+k/).
Instances For
A constraint C is insensitive to the row dimension of a square:
it assigns the same violation count to forms that share a column.
Cf. Figure 4a.
Equations
Instances For
A constraint C is insensitive to the column dimension of a square:
it assigns the same violation count to forms that share a row.
Cf. Figure 4b.
Equations
Instances For
Constraint independence (§2.4): the rows and columns of the square are independent dimensions relative to a constraint set — each constraint is insensitive to at least one dimension (row or column). No constraint can encode an interaction between the two dimensions.
Equations
- One or more equations did not get rendered due to their size.
Instances For
HZ's generalization (eq. 13): the difference between logit rates of application for two underlying forms in the same row (or column) does not depend on the choice of row (or column).
Equivalently: for a function LR giving logit rates,
LR(x^TL) − LR(x^TR) = LR(x^BL) − LR(x^BR).
Equations
Instances For
Independence of violation differences: if a constraint is insensitive to a dimension, its violation differences are too.
The violation difference Δₖ(x) = Cₖ(x, NO) − Cₖ(x, YES) inherits
insensitivity from the raw constraint, because both YES and NO get
the same violation count for forms in the same row (or column).
Equations
Instances For
ME predicts HZ (§3.6, eq. 22): when the
violation differences satisfy independence (inherited from constraint
independence), the weighted sum of violation differences — which
equals the ME logit probability by maxent_logit_harmony —
satisfies the constant-difference identity.
The proof follows eq. (18): split the sum by constraint, and for each k, independence ensures the k-th term contributes equally to both sides of the identity.
An n-ary harmony function H is separable if it decomposes
into a product of powers of unary functions, each attending to a
single constraint (eq. 30):
H(C₁(x,y), …, Cₙ(x,y)) = ∏ₖ (hₖ(Cₖ(x,y)))^{wₖ}
Each hₖ must be positive, normalized (hₖ(0) = 1), and decreasing
(more violations → lower harmony).
Constraint weights.
Per-constraint rescaling functions.
Each hₖ is positive.
Normalized: hₖ(0) = 1.
Decreasing: more violations → lower score.
Instances For
The separable harmony at the zero profile is 1 (normalization).
The ME separable harmony: each hₖ(x) = exp(−x) (the
exponential-of-opposite function from Figure 5a).
This gives H_ME(v) = ∏ₖ (exp(−vₖ))^{wₖ} = exp(−Σ wₖvₖ).
Equations
Instances For
The ME separable harmony agrees with the standard ME harmony score
(up to positive scaling by exp).
meSeparable.eval(v) = exp(−Σₖ wₖ · vₖ)
Proof: ∏ₖ (exp(−vₖ))^{wₖ} = ∏ₖ exp(−wₖvₖ) = exp(−Σₖ wₖvₖ),
using rpow = exp(w · log(exp(−v))) = exp(−wv).
Constraint rescaling (eq. 33): given a separable
harmony with unary functions hₖ, the rescaled constraint is
Ĉₖ = −log(hₖ(Cₖ)).
The rescaled constraints are nonneg (since hₖ ∈ (0,1] for v ≥ 0) and preserve the ordering on violation profiles.
Instances For
hₖ(v) ≤ 1 for all v: follows from decreasingness and hₖ(0) = 1.
Rescaled constraints are nonneg: since hₖ(v) ∈ (0, 1],
−log(hₖ(v)) ≥ 0.
Rescaled constraints at 0 are 0: Ĉₖ(0) = −log(1) = 0.
Any separable harmony is ME under rescaling (
eq. 34): H(C₁, …, Cₙ) = H_ME(Ĉ₁, …, Ĉₙ) where Ĉₖ = −log hₖ(Cₖ).
This is the key insight: the choice of hₖ only affects how
constraints are rescaled, not how they interact. All separable
harmonies have the same mode of constraint interaction as ME.
Separable harmonies predict HZ (§5.4):
for any separable harmony H, if the rescaled violation differences
Δ̂ₖ(x) = Ĉₖ(Cₖ(x,NO)) − Ĉₖ(Cₖ(x,YES)) satisfy independence on a
square, then the logit rate log(H(v_YES)/H(v_NO)) satisfies HZ's
constant-difference identity.
The proof composes two results:
separable_eq_me_rescaled:H(v) = exp(−Σ wₖĈₖ(vₖ))me_predicts_hz: weighted sums with independent differences satisfy constant logit-rate differences
Since log(exp(a)/exp(b)) = a − b, the logit rate is a weighted sum
of rescaled violation differences, and me_predicts_hz applies.
The inverse function h(x) = 1/(1+x) used in the non-separable
harmony H(v) = 1 / (1 + Σ wₖvₖ) (eq. 27).
Like ME's exp(−x), it is positive, normalized, and decreasing —
but the resulting harmony is not separable.
Equations
Instances For
The inverse function is positive for nonneg arguments.
The inverse function is normalized: h(0) = 1.
The inverse function is strictly decreasing on [0, ∞).
The non-separable harmony using the inverse function:
H(v) = 1 / (1 + Σₖ wₖ · vₖ) (eq. 27).
This has the form H = h(Σ wₖCₖ) (eq. 26) with h = inverseFunction,
which is not separable because the single h sees the sum of all
weighted violations, not each constraint individually.
Equations
- Theories.Phonology.HarmonicGrammar.nonSeparableInverseHarmony w v = Theories.Phonology.HarmonicGrammar.inverseFunction (∑ k : Fin n, w k * ↑(v k))
Instances For
Counterexample (§4.4, online appendix D.1): the non-separable inverse harmony does not predict HZ's generalization in general.
For the inverse harmony H(v) = 1/(1 + Σ wₖvₖ), the logit
probability is log((1 + S_NO(x)) / (1 + S_YES(x))) where
S_y(x) = Σₖ wₖ Cₖ(x,y). HZ holds iff the cross-product
(1+S_NO(tl))(1+S_YES(tr))(1+S_YES(bl))(1+S_NO(br))
equals (1+S_YES(tl))(1+S_NO(tr))(1+S_NO(bl))(1+S_YES(br)).
With Tagalog constraints and w₅ ≠ w₆ (weights 1,1,1,1,1,2), the cross-products are 72 and 60, disproving HZ.
harmonyScoreR as a Fin-indexed weighted sum:
the List-based harmony score equals a negated Finset.sum over
Fin-indexed constraint weights and violations.
This is the key List→Fin conversion for applying separability
theory to concrete WeightedConstraint lists.
MaxEnt unnormalized probability is separable harmony:
exp(harmonyScoreR constraints c) = meSeparable.eval v where
weights and violations are drawn from the constraint list.
Since meSeparable.eval v = exp(-∑ wₖvₖ) (me_separable_eval)
and harmonyScoreR c = -∑ wₖvₖ (harmonyScoreR_as_finsum),
the exponential of the List-based harmony is exactly the
Fin-indexed separable harmony evaluation.
This makes all separability theory — constraint independence → HZ
(separable_predicts_hz), constraint rescaling
(separable_eq_me_rescaled) — directly applicable to any
WeightedConstraint list.
MaxEnt logit rate as Fin-indexed weighted violation differences: the logit of MaxEnt probabilities is a negated weighted sum of violation differences, expressed as a Finset.sum over Fin indices.
This bridges logit_uniformity (NoisyHG.lean) with me_predicts_hz:
since the logit is a weighted sum, it satisfies HZ whenever the
violation differences satisfy ViolDiffIndependence.
Composition: logit_uniformity → harmonyScoreR_as_finsum → algebra.
Consistent ordering from constant logit-rate differences: if a score
function d satisfies ConstantLogitDiff and f is strictly monotone,
then the f-transformed scores exhibit across-the-board consistency —
the product of same-column differences is positive.
This is the mathematical core of why HG produces sigmoid families (@cite{zuraw-hayes-2017} §2.5): any strictly monotone probability transform (MaxEnt's softmax, NHG's normal CDF, Normal MaxEnt's probit) applied to scores with constant logit-rate differences preserves the "across-the-board" ordering pattern. Differences may compress at floor and ceiling (producing sigmoids rather than claws), but they never change sign.