The Low Conditional Entropy Conjecture @cite{ackerman-malouf-2013} #
Ackerman, F. & Malouf, R. (2013). Morphological Organization: The Low Conditional Entropy Conjecture. Language 89(3), 429–464.
E-complexity vs. I-complexity #
Languages differ dramatically in their enumerative complexity (E-complexity): how many inflection classes, allomorphic variants, and paradigm cells they have. But this apparent complexity is misleading. The key question is integrative complexity (I-complexity): given that a speaker knows some forms of a lexeme, how hard is it to predict the rest?
The LCEC #
The Low Conditional Entropy Conjecture states that the average conditional entropy of paradigm cells — how uncertain you are about one cell given another — is uniformly low across typologically diverse languages, regardless of E-complexity. Formally:
I-complexity(L) = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)
is low for all natural languages L, where Cᵢ ranges over paradigm cells and H(Cᵢ | Cⱼ) is the conditional entropy of cell i given cell j.
Implicative structure #
The LCEC holds because morphological systems are organized by implicative relations: knowing one form of a lexeme typically narrows down (or fully determines) the others. These relations form a network whose density keeps I-complexity low even when E-complexity is high.
Main definitions #
InflectionClass: a paradigm row (cell index → surface form)ParadigmSystem: a collection of inflection classes with frequenciescellEntropy: H(Cᵢ) — entropy of a single paradigm cellconditionalCellEntropy: H(Cᵢ | Cⱼ) — conditional entropy of cell pairiComplexity: average conditional entropy across all cell pairseComplexity: number of distinct inflection classesLCECHolds: predicate asserting I-complexity is below a threshold
An inflection class: a function from cell index to surface realization.
Two lexemes belong to the same inflection class iff they have identical paradigm structure (same mapping from cells to exponents, ignoring the stem). In practice, classes are identified by their pattern of allomorphic alternations, not by absolute forms.
Realization of each cell (indexed 0.numCells-1)
Instances For
Equations
- Morphology.WP.instBEqInflectionClass = { beq := fun (a b : Morphology.WP.InflectionClass n) => (List.finRange n).all fun (i : Fin n) => a.realize i == b.realize i }
A paradigm system: the full inventory of inflection classes in a language, each paired with its frequency (proportion of lexemes in that class).
numCells: number of paradigm cells (e.g., 4 for a 4-cell verb system)entries: inflection classes paired with their frequencies (should sum to 1)
- entries : List (InflectionClass numCells × ℚ)
Inflection classes paired with their lexicon frequencies
Instances For
Group a tagged list by key, summing associated ℚ values.
This is the core operation for extracting marginal and joint distributions from paradigm data: multiple inflection classes may share the same realization in a cell, so their frequencies need to be summed.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Extract the marginal distribution of realizations in a single cell.
Groups inflection classes by their realization in cell c and sums
their frequencies. Returns a distribution over surface forms.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Extract the joint distribution of realizations in two cells.
Returns pairs ((formᵢ, formⱼ), frequency) for all inflection classes, with shared patterns summed.
Equations
- One or more equations did not get rendered due to their size.
Instances For
E-complexity: the number of distinct inflection classes.
This is the "enumerative" complexity that varies wildly across languages (e.g., Kwerba: 2 classes, Chiquihuitlán Mazatec: 109).
Equations
Instances For
H(Cᵢ): Shannon entropy of a single paradigm cell.
Measures how unpredictable the realization of cell c is when you
know nothing about the lexeme. High entropy = many equiprobable
realizations; low entropy = one dominant form.
Equations
Instances For
H(Cᵢ | Cⱼ): conditional entropy of cell ci given cell cj.
Measures how uncertain you are about cell ci's realization after
learning cell cj's realization. This is the core quantity in the LCEC:
low H(Cᵢ | Cⱼ) means knowing form j strongly constrains form i.
When H(Cᵢ | Cⱼ) = 0, cell j fully determines cell i — an implicative relation.
Equations
Instances For
I-complexity: average conditional entropy across all directed cell pairs.
I-complexity = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)
This is @cite{ackerman-malouf-2013} central measure. It quantifies how hard it is, on average, to predict one paradigm cell from another. The LCEC asserts this is uniformly low across languages.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The LCEC holds for a paradigm system if its I-complexity is below a threshold (Ackerman & Malouf use ~1.5 bits as the empirical bound across their 10-language sample; the highest observed value is Chiquihuitlán Mazatec at 0.709 bits).
Equations
- Morphology.WP.LCECHolds ps threshold = (Morphology.WP.iComplexity ps ≤ threshold)
Instances For
An implicative relation between two cells: knowing cell cj fully
determines cell ci (conditional entropy = 0).
These are the building blocks of paradigm organization. A rich network of implicative relations is what keeps I-complexity low.
Equations
- Morphology.WP.isImplicative ps ci cj = (Morphology.WP.conditionalCellEntropy ps ci cj = 0)
Instances For
A paradigm system has transparent structure if every cell pair is implicative — knowing any one cell fully determines all others.
This is the strongest form of low I-complexity (I-complexity = 0). It corresponds to @cite{carstairs-mccarthy-2010}'s No Blur Principle / synonymy avoidance, which the LCEC subsumes as a special case.
Equations
- Morphology.WP.isTransparent ps = ∀ (ci cj : Fin n), ci ≠ cj → Morphology.WP.isImplicative ps ci cj
Instances For
Extract a ParadigmSystem from a list of Stems.
Each unique paradigm pattern (sequence of inflected forms) becomes an
inflection class. The number of cells is paradigm.length + 1 (base
form + one per inflectional rule). This bridges Core.MorphRule's
rule-based view to the W&P observed-paradigm view.
Equations
- One or more equations did not get rendered due to their size.