Documentation

Linglib.Theories.Morphology.WP.LCEC

The Low Conditional Entropy Conjecture @cite{ackerman-malouf-2013} #

Ackerman, F. & Malouf, R. (2013). Morphological Organization: The Low Conditional Entropy Conjecture. Language 89(3), 429–464.

E-complexity vs. I-complexity #

Languages differ dramatically in their enumerative complexity (E-complexity): how many inflection classes, allomorphic variants, and paradigm cells they have. But this apparent complexity is misleading. The key question is integrative complexity (I-complexity): given that a speaker knows some forms of a lexeme, how hard is it to predict the rest?

The LCEC #

The Low Conditional Entropy Conjecture states that the average conditional entropy of paradigm cells — how uncertain you are about one cell given another — is uniformly low across typologically diverse languages, regardless of E-complexity. Formally:

I-complexity(L) = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)

is low for all natural languages L, where Cᵢ ranges over paradigm cells and H(Cᵢ | Cⱼ) is the conditional entropy of cell i given cell j.

Implicative structure #

The LCEC holds because morphological systems are organized by implicative relations: knowing one form of a lexeme typically narrows down (or fully determines) the others. These relations form a network whose density keeps I-complexity low even when E-complexity is high.

Main definitions #

structure Morphology.WP.InflectionClass (numCells : ) :

An inflection class: a function from cell index to surface realization.

Two lexemes belong to the same inflection class iff they have identical paradigm structure (same mapping from cells to exponents, ignoring the stem). In practice, classes are identified by their pattern of allomorphic alternations, not by absolute forms.

  • realize : Fin numCellsString

    Realization of each cell (indexed 0.numCells-1)

Instances For
    structure Morphology.WP.ParadigmSystem (numCells : ) :

    A paradigm system: the full inventory of inflection classes in a language, each paired with its frequency (proportion of lexemes in that class).

    • numCells: number of paradigm cells (e.g., 4 for a 4-cell verb system)
    • entries: inflection classes paired with their frequencies (should sum to 1)
    Instances For
      def Morphology.WP.groupBySum {α : Type} [BEq α] (tagged : List (α × )) :
      List (α × )

      Group a tagged list by key, summing associated ℚ values.

      This is the core operation for extracting marginal and joint distributions from paradigm data: multiple inflection classes may share the same realization in a cell, so their frequencies need to be summed.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        Extract the marginal distribution of realizations in a single cell.

        Groups inflection classes by their realization in cell c and sums their frequencies. Returns a distribution over surface forms.

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Extract the joint distribution of realizations in two cells.

          Returns pairs ((formᵢ, formⱼ), frequency) for all inflection classes, with shared patterns summed.

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            E-complexity: the number of distinct inflection classes.

            This is the "enumerative" complexity that varies wildly across languages (e.g., Kwerba: 2 classes, Chiquihuitlán Mazatec: 109).

            Equations
            Instances For
              def Morphology.WP.cellEntropy {n : } (ps : ParadigmSystem n) (c : Fin n) :

              H(Cᵢ): Shannon entropy of a single paradigm cell.

              Measures how unpredictable the realization of cell c is when you know nothing about the lexeme. High entropy = many equiprobable realizations; low entropy = one dominant form.

              Equations
              Instances For

                H(Cᵢ | Cⱼ): conditional entropy of cell ci given cell cj.

                Measures how uncertain you are about cell ci's realization after learning cell cj's realization. This is the core quantity in the LCEC: low H(Cᵢ | Cⱼ) means knowing form j strongly constrains form i.

                When H(Cᵢ | Cⱼ) = 0, cell j fully determines cell i — an implicative relation.

                Equations
                Instances For

                  I-complexity: average conditional entropy across all directed cell pairs.

                  I-complexity = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)

                  This is @cite{ackerman-malouf-2013} central measure. It quantifies how hard it is, on average, to predict one paradigm cell from another. The LCEC asserts this is uniformly low across languages.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For
                    def Morphology.WP.LCECHolds {n : } (ps : ParadigmSystem n) (threshold : ) :

                    The LCEC holds for a paradigm system if its I-complexity is below a threshold (Ackerman & Malouf use ~1.5 bits as the empirical bound across their 10-language sample; the highest observed value is Chiquihuitlán Mazatec at 0.709 bits).

                    Equations
                    Instances For
                      def Morphology.WP.isImplicative {n : } (ps : ParadigmSystem n) (ci cj : Fin n) :

                      An implicative relation between two cells: knowing cell cj fully determines cell ci (conditional entropy = 0).

                      These are the building blocks of paradigm organization. A rich network of implicative relations is what keeps I-complexity low.

                      Equations
                      Instances For

                        A paradigm system has transparent structure if every cell pair is implicative — knowing any one cell fully determines all others.

                        This is the strongest form of low I-complexity (I-complexity = 0). It corresponds to @cite{carstairs-mccarthy-2010}'s No Blur Principle / synonymy avoidance, which the LCEC subsumes as a special case.

                        Equations
                        Instances For
                          def Morphology.WP.fromStems {σ : Type} (stems : List (Core.Morphology.Stem σ)) (baseMeaning : σ) (numCells : ) (cellExtractor : List (String × Features × σ)Fin numCellsString) :

                          Extract a ParadigmSystem from a list of Stems.

                          Each unique paradigm pattern (sequence of inflected forms) becomes an inflection class. The number of cells is paradigm.length + 1 (base form + one per inflectional rule). This bridges Core.MorphRule's rule-based view to the W&P observed-paradigm view.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For