Documentation

Linglib.Theories.Morphology.WP.LCEC

The Low Conditional Entropy Conjecture @cite{ackerman-malouf-2013} #

Ackerman, F. & Malouf, R. (2013). Morphological Organization: The Low Conditional Entropy Conjecture. Language 89(3), 429–464.

E-complexity vs. I-complexity #

Languages differ dramatically in their enumerative complexity (E-complexity): how many inflection classes, allomorphic variants, and paradigm cells they have. But this apparent complexity is misleading. The key question is integrative complexity (I-complexity): given that a speaker knows some forms of a lexeme, how hard is it to predict the rest?

The LCEC #

The Low Conditional Entropy Conjecture states that the average conditional entropy of paradigm cells — how uncertain you are about one cell given another — is uniformly low across typologically diverse languages, regardless of E-complexity. Formally:

I-complexity(L) = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)

is low for all natural languages L, where Cᵢ ranges over paradigm cells and H(Cᵢ | Cⱼ) is the conditional entropy of cell i given cell j.

Implicative structure #

The LCEC holds because morphological systems are organized by implicative relations: knowing one form of a lexeme typically narrows down (or fully determines) the others. These relations form a network whose density keeps I-complexity low even when E-complexity is high.

Main definitions #

InflectionClass: a paradigm row (cell index → surface form)
ParadigmSystem: a collection of inflection classes with frequencies
cellEntropy: H(Cᵢ) — entropy of a single paradigm cell
conditionalCellEntropy: H(Cᵢ | Cⱼ) — conditional entropy of cell pair
iComplexity: average conditional entropy across all cell pairs
eComplexity: number of distinct inflection classes
LCECHolds: predicate asserting I-complexity is below a threshold

structure Morphology.WP.InflectionClass (numCells : ℕ) :

An inflection class: a function from cell index to surface realization.

Two lexemes belong to the same inflection class iff they have identical paradigm structure (same mapping from cells to exponents, ignoring the stem). In practice, classes are identified by their pattern of allomorphic alternations, not by absolute forms.

realize : Fin numCells → String
Realization of each cell (indexed 0.numCells-1)

Instances For

instance Morphology.WP.instBEqInflectionClass {n : ℕ} :

BEq (InflectionClass n)

Equations

Morphology.WP.instBEqInflectionClass = { beq := fun (a b : Morphology.WP.InflectionClass n) => (List.finRange n).all fun (i : Fin n) => a.realize i == b.realize i }

structure Morphology.WP.ParadigmSystem (numCells : ℕ) :

A paradigm system: the full inventory of inflection classes in a language, each paired with its frequency (proportion of lexemes in that class).

numCells: number of paradigm cells (e.g., 4 for a 4-cell verb system)
entries: inflection classes paired with their frequencies (should sum to 1)

entries : List (InflectionClass numCells × ℚ)
Inflection classes paired with their lexicon frequencies

Instances For

def Morphology.WP.groupBySum {α : Type} [BEq α] (tagged : List (α × ℚ)) :

List (α × ℚ)

Group a tagged list by key, summing associated ℚ values.

This is the core operation for extracting marginal and joint distributions from paradigm data: multiple inflection classes may share the same realization in a cell, so their frequencies need to be summed.

Equations

One or more equations did not get rendered due to their size.

Instances For

def Morphology.WP.cellDistribution {n : ℕ} (ps : ParadigmSystem n) (c : Fin n) :

List (String × ℚ)

Extract the marginal distribution of realizations in a single cell.

Groups inflection classes by their realization in cell c and sums their frequencies. Returns a distribution over surface forms.

Equations

One or more equations did not get rendered due to their size.

Instances For

def Morphology.WP.jointCellDistribution {n : ℕ} (ps : ParadigmSystem n) (ci cj : Fin n) :

List ((String × String) × ℚ)

Extract the joint distribution of realizations in two cells.

Returns pairs ((formᵢ, formⱼ), frequency) for all inflection classes, with shared patterns summed.

Equations

One or more equations did not get rendered due to their size.

Instances For

def Morphology.WP.eComplexity {n : ℕ} (ps : ParadigmSystem n) :

E-complexity: the number of distinct inflection classes.

This is the "enumerative" complexity that varies wildly across languages (e.g., Kwerba: 2 classes, Chiquihuitlán Mazatec: 109).

Equations

Morphology.WP.eComplexity ps = ps.entries.length

Instances For

def Morphology.WP.cellEntropy {n : ℕ} (ps : ParadigmSystem n) (c : Fin n) :

H(Cᵢ): Shannon entropy of a single paradigm cell.

Measures how unpredictable the realization of cell c is when you know nothing about the lexeme. High entropy = many equiprobable realizations; low entropy = one dominant form.

Equations

Morphology.WP.cellEntropy ps c = Core.InformationTheory.entropy (Morphology.WP.cellDistribution ps c)

Instances For

def Morphology.WP.conditionalCellEntropy {n : ℕ} (ps : ParadigmSystem n) (ci cj : Fin n) :

H(Cᵢ | Cⱼ): conditional entropy of cell ci given cell cj.

Measures how uncertain you are about cell ci's realization after learning cell cj's realization. This is the core quantity in the LCEC: low H(Cᵢ | Cⱼ) means knowing form j strongly constrains form i.

When H(Cᵢ | Cⱼ) = 0, cell j fully determines cell i — an implicative relation.

Equations

Morphology.WP.conditionalCellEntropy ps ci cj = Core.InformationTheory.conditionalEntropy (Morphology.WP.jointCellDistribution ps ci cj) (Morphology.WP.cellDistribution ps cj)

Instances For

def Morphology.WP.iComplexity {n : ℕ} (ps : ParadigmSystem n) :

I-complexity: average conditional entropy across all directed cell pairs.

I-complexity = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)

This is @cite{ackerman-malouf-2013} central measure. It quantifies how hard it is, on average, to predict one paradigm cell from another. The LCEC asserts this is uniformly low across languages.

Equations

One or more equations did not get rendered due to their size.

Instances For

def Morphology.WP.LCECHolds {n : ℕ} (ps : ParadigmSystem n) (threshold : ℚ) :

The LCEC holds for a paradigm system if its I-complexity is below a threshold (Ackerman & Malouf use ~1.5 bits as the empirical bound across their 10-language sample; the highest observed value is Chiquihuitlán Mazatec at 0.709 bits).

Equations

Morphology.WP.LCECHolds ps threshold = (Morphology.WP.iComplexity ps ≤ threshold)

Instances For

def Morphology.WP.isImplicative {n : ℕ} (ps : ParadigmSystem n) (ci cj : Fin n) :

An implicative relation between two cells: knowing cell cj fully determines cell ci (conditional entropy = 0).

These are the building blocks of paradigm organization. A rich network of implicative relations is what keeps I-complexity low.

Equations

Morphology.WP.isImplicative ps ci cj = (Morphology.WP.conditionalCellEntropy ps ci cj = 0)

Instances For

def Morphology.WP.isTransparent {n : ℕ} (ps : ParadigmSystem n) :

A paradigm system has transparent structure if every cell pair is implicative — knowing any one cell fully determines all others.

This is the strongest form of low I-complexity (I-complexity = 0). It corresponds to @cite{carstairs-mccarthy-2010}'s No Blur Principle / synonymy avoidance, which the LCEC subsumes as a special case.

Equations

Morphology.WP.isTransparent ps = ∀ (ci cj : Fin n), ci ≠ cj → Morphology.WP.isImplicative ps ci cj

Instances For

theorem Morphology.WP.transparent_iComplexity_zero {n : ℕ} (ps : ParadigmSystem n) (h : isTransparent ps) :

iComplexity ps = 0

def Morphology.WP.fromStems {σ : Type} (stems : List (Core.Morphology.Stem σ)) (baseMeaning : σ) (numCells : ℕ) (cellExtractor : List (String × Features × σ) → Fin numCells → String) :

ParadigmSystem numCells

Extract a ParadigmSystem from a list of Stems.

Each unique paradigm pattern (sequence of inflected forms) becomes an inflection class. The number of cells is paradigm.length + 1 (base form + one per inflectional rule). This bridges Core.MorphRule's rule-based view to the W&P observed-paradigm view.

Equations

One or more equations did not get rendered due to their size.

Instances For