Morphological Typology: Paradigm Complexity #
@cite{ackerman-malouf-2013}
Cross-linguistic typological data on morphological paradigm complexity.
LanguageData #
LanguageData records summary statistics about a language's inflectional
paradigm system: the number of inflection classes (E-complexity), the
number of paradigm cells, and information-theoretic measures of paradigm
predictability (I-complexity).
@cite{ackerman-malouf-2013} Sample #
Ten typologically diverse languages from @cite{ackerman-malouf-2013}, spanning three phyla, four macro-areas, and a range from 2 to 109 inflection classes. The central empirical finding: despite wildly varying E-complexity, I-complexity (average conditional entropy) is uniformly low across all ten languages.
MorphProfile Sample #
Eighteen typologically diverse languages with morphological profiles
derived from WALS data. Types and WALS lookup helpers are defined in
Core.Morphology.MorphProfile; per-language profiles live in
Fragments.{Language}.Morph.
Summary statistics for a language's morphological paradigm system, as reported in published studies.
Fields correspond to Tables 2--3 of @cite{ackerman-malouf-2013}.
- name : String
Language name
- family : String
Language family
- numClasses : ℕ
Number of inflection classes (E-complexity)
- numCells : ℕ
Number of paradigm cells
- avgCondEntropy : ℚ
Average conditional entropy H(Ci|Cj) in bits (I-complexity)
- maxCellEntropy : ℚ
Maximum cell entropy max H(Ci) in bits
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Nilo-Saharan languages #
Fur (Nilo-Saharan, Fur; Sudan). 4 classes, 2 cells.
Equations
- Phenomena.Morphology.Typology.fur = { name := "Fur", family := "Nilo-Saharan", numClasses := 4, numCells := 2, avgCondEntropy := 489 / 1000, maxCellEntropy := 1334 / 1000 }
Instances For
Ngiti (Nilo-Saharan, Central Sudanic; DRC). 8 classes, 2 cells.
Equations
- Phenomena.Morphology.Typology.ngiti = { name := "Ngiti", family := "Nilo-Saharan", numClasses := 8, numCells := 2, avgCondEntropy := 380 / 1000, maxCellEntropy := 1741 / 1000 }
Instances For
Nuer (Nilo-Saharan, Nilotic; Sudan/South Sudan). 31 classes, 4 cells.
Equations
- Phenomena.Morphology.Typology.nuer = { name := "Nuer", family := "Nilo-Saharan", numClasses := 31, numCells := 4, avgCondEntropy := 513 / 1000, maxCellEntropy := 3224 / 1000 }
Instances For
Trans-New Guinea languages #
Kwerba (Trans-New Guinea; Papua, Indonesia). 2 classes, 2 cells.
Equations
- Phenomena.Morphology.Typology.kwerba = { name := "Kwerba", family := "Trans-New Guinea", numClasses := 2, numCells := 2, avgCondEntropy := 469 / 1000, maxCellEntropy := 529 / 1000 }
Instances For
Oto-Manguean languages #
Chinantec (Oto-Manguean; Oaxaca, Mexico). 62 classes, 4 cells. Comaltepec Chinantec tonal verb paradigms.
Equations
- Phenomena.Morphology.Typology.chinantec = { name := "Chinantec", family := "Oto-Manguean", numClasses := 62, numCells := 4, avgCondEntropy := 426 / 1000, maxCellEntropy := 4266 / 1000 }
Instances For
Chiquihuitlan Mazatec (Oto-Manguean; Oaxaca, Mexico). 109 classes, 4 cells. The paper's primary case study (section 4).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Uralic languages #
Finnish (Uralic, Finnic). 51 classes, 8 cells.
Equations
- Phenomena.Morphology.Typology.finnish = { name := "Finnish", family := "Uralic", numClasses := 51, numCells := 8, avgCondEntropy := 209 / 1000, maxCellEntropy := 3803 / 1000 }
Instances For
Indo-European languages #
German (Indo-European, Germanic). 7 classes, 8 cells.
Equations
- Phenomena.Morphology.Typology.german = { name := "German", family := "Indo-European", numClasses := 7, numCells := 8, avgCondEntropy := 45 / 1000, maxCellEntropy := 1906 / 1000 }
Instances For
Russian (Indo-European, Slavic). 8 classes, 8 cells.
Equations
- Phenomena.Morphology.Typology.russian = { name := "Russian", family := "Indo-European", numClasses := 8, numCells := 8, avgCondEntropy := 89 / 1000, maxCellEntropy := 2170 / 1000 }
Instances For
Spanish (Indo-European, Romance). 3 classes, 57 cells.
Equations
- Phenomena.Morphology.Typology.spanish = { name := "Spanish", family := "Indo-European", numClasses := 3, numCells := 57, avgCondEntropy := 3 / 1000, maxCellEntropy := 1522 / 1000 }
Instances For
All 10 languages in the @cite{ackerman-malouf-2013} sample (Table 3).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The LCEC threshold: all 10 languages fall below 1 bit of average conditional entropy. Even the most complex system (Mazatec, 109 classes) has I-complexity < 1 bit.
Equations
Instances For
Expected I-complexity under random class assignment for Mazatec (Monte Carlo baseline). The paper reports the mean of 1000 random permutations as ~5.25 bits, far above the observed 0.709 bits.
Equations
Instances For
Morphological Mechanisms: WALS Chapters 20--29 #
Chapters 20--29 of WALS cover the fundamental mechanisms of inflectional morphology: how formatives are put together (fusion), how many categories a single formative expresses (exponence), how synthetic verb morphology is (inflectional synthesis), where grammatical relations are marked (locus of marking), whether affixes go before or after stems (prefixing vs suffixing), and whether the language has productive reduplication.
Typological classification types are defined in Core.Morphology.MorphProfile.
Sources: @cite{bickel-nichols-2013a} (Ch 20, Fusion) @cite{bickel-nichols-2013b} (Ch 21, Exponence) @cite{bickel-nichols-2013c} (Ch 22, Inflectional Synthesis) @cite{nichols-bickel-2013b} (Ch 23, Locus of Marking in the Clause) @cite{nichols-bickel-2013c} (Ch 24, Locus of Marking in Possessive NPs) @cite{nichols-bickel-2013a} (Ch 25, Locus of Marking: Whole-language Typology) @cite{nichols-bickel-2013d} (Ch 25B, Zero Marking of A and P Arguments) @cite{baerman-brown-2013} (Ch 28, Case Syncretism) @cite{baerman-brown-2013a} (Ch 29, Syncretism in Verbal Person/Number Marking) @cite{dryer-2013-wals} (Ch 26, Prefixing vs. Suffixing) @cite{rubino-2013} (Ch 27, Reduplication)
Map WALS 20A fine-grained fusion types to coarse categories. Mixed types are mapped by their non-concatenative component.
Equations
- Phenomena.Morphology.Typology.toFusion Core.WALS.F20A.FusionType.exclusivelyConcatenative = Core.Morphology.Fusion.concatenative
- Phenomena.Morphology.Typology.toFusion Core.WALS.F20A.FusionType.exclusivelyIsolating = Core.Morphology.Fusion.isolating
- Phenomena.Morphology.Typology.toFusion Core.WALS.F20A.FusionType.exclusivelyTonal = Core.Morphology.Fusion.nonlinear
- Phenomena.Morphology.Typology.toFusion Core.WALS.F20A.FusionType.tonalIsolating = Core.Morphology.Fusion.nonlinear
- Phenomena.Morphology.Typology.toFusion Core.WALS.F20A.FusionType.tonalConcatenative = Core.Morphology.Fusion.nonlinear
- Phenomena.Morphology.Typology.toFusion Core.WALS.F20A.FusionType.ablautConcatenative = Core.Morphology.Fusion.nonlinear
- Phenomena.Morphology.Typology.toFusion Core.WALS.F20A.FusionType.isolatingConcatenative = Core.Morphology.Fusion.concatenative
Instances For
A single row in a WALS distribution table: a label and a language count.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
WALS Chapter 20 distribution, derived from F20A data (@cite{bickel-nichols-2013a}).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 20 total: 165 languages (derived from F20A data).
Map WALS 21A fine-grained exponence types to coarse categories.
All polyexponential subtypes (case+number, case+referentiality,
case+TAM) map to .polyexponential.
Equations
- Phenomena.Morphology.Typology.toExponence Core.WALS.F21A.ExponenceType.monoexponentialCase = Core.Morphology.Exponence.monoexponential
- Phenomena.Morphology.Typology.toExponence Core.WALS.F21A.ExponenceType.caseNumber = Core.Morphology.Exponence.polyexponential
- Phenomena.Morphology.Typology.toExponence Core.WALS.F21A.ExponenceType.caseReferentiality = Core.Morphology.Exponence.polyexponential
- Phenomena.Morphology.Typology.toExponence Core.WALS.F21A.ExponenceType.caseTam = Core.Morphology.Exponence.polyexponential
- Phenomena.Morphology.Typology.toExponence Core.WALS.F21A.ExponenceType.noCase = Core.Morphology.Exponence.noCase
Instances For
WALS Chapter 21 distribution, derived from F21A data (@cite{bickel-nichols-2013b}).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 21 total: 162 languages (derived from F21A data).
Map WALS 22A fine-grained categories to coarse synthesis levels. Boundaries align with WALS bin edges to avoid splitting categories.
Equations
- Phenomena.Morphology.Typology.toVerbSynthesis Core.WALS.F22A.InflectionalSynthesis.categoryPerWord0_1 = Core.Morphology.VerbSynthesis.low
- Phenomena.Morphology.Typology.toVerbSynthesis Core.WALS.F22A.InflectionalSynthesis.categoriesPerWord2_3 = Core.Morphology.VerbSynthesis.low
- Phenomena.Morphology.Typology.toVerbSynthesis Core.WALS.F22A.InflectionalSynthesis.categoriesPerWord4_5 = Core.Morphology.VerbSynthesis.moderate
- Phenomena.Morphology.Typology.toVerbSynthesis Core.WALS.F22A.InflectionalSynthesis.categoriesPerWord6_7 = Core.Morphology.VerbSynthesis.moderate
- Phenomena.Morphology.Typology.toVerbSynthesis Core.WALS.F22A.InflectionalSynthesis.categoriesPerWord8_9 = Core.Morphology.VerbSynthesis.high
- Phenomena.Morphology.Typology.toVerbSynthesis Core.WALS.F22A.InflectionalSynthesis.categoriesPerWord10_11 = Core.Morphology.VerbSynthesis.high
- Phenomena.Morphology.Typology.toVerbSynthesis Core.WALS.F22A.InflectionalSynthesis.categoriesPerWord12_13 = Core.Morphology.VerbSynthesis.high
Instances For
WALS Chapter 22 distribution, derived from F22A data (@cite{bickel-nichols-2013c}).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 22 total: 145 languages (derived from F22A data).
Map WALS 25A values to the 5-way LocusOfMarking classification.
Equations
- Phenomena.Morphology.Typology.toLocusOfMarking Core.WALS.F25A.LocusOfMarkingWholeLanguageTypology.headMarking = Core.Morphology.LocusOfMarking.headMarking
- Phenomena.Morphology.Typology.toLocusOfMarking Core.WALS.F25A.LocusOfMarkingWholeLanguageTypology.dependentMarking = Core.Morphology.LocusOfMarking.dependentMarking
- Phenomena.Morphology.Typology.toLocusOfMarking Core.WALS.F25A.LocusOfMarkingWholeLanguageTypology.doubleMarking = Core.Morphology.LocusOfMarking.doubleMarking
- Phenomena.Morphology.Typology.toLocusOfMarking Core.WALS.F25A.LocusOfMarkingWholeLanguageTypology.zeroMarking = Core.Morphology.LocusOfMarking.zeroMarking
- Phenomena.Morphology.Typology.toLocusOfMarking Core.WALS.F25A.LocusOfMarkingWholeLanguageTypology.inconsistentOrOther = Core.Morphology.LocusOfMarking.inconsistentOrOther
Instances For
WALS Chapter 26 distribution, derived from F26A data (@cite{dryer-haspelmath-2013}).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 26 total: 969 languages (derived from F26A data).
WALS Chapter 27 distribution, derived from F27A data (@cite{rubino-2013}).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 27 total: 368 languages (derived from F27A data).
WALS Chapter 23 distribution, derived from F23A data (N = 236).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 23 total: 236 languages (derived from F23A data).
WALS Chapter 24 distribution, derived from F24A data (N = 236).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 24 total: 236 languages (derived from F24A data).
WALS Chapter 25A distribution, derived from F25A data (N = 236).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 25A total: 236 languages (derived from F25A data).
WALS Chapter 25B distribution, derived from F25B data (N = 235).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 25B total: 235 languages (derived from F25B data).
WALS Chapter 28 distribution, derived from F28A data (N = 198).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 28 total: 198 languages (derived from F28A data).
WALS Chapter 29 distribution, derived from F29A data (N = 198).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 29 total: 198 languages (derived from F29A data).
WALS Chapter 21B distribution, derived from F21B data (N = 160).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 21B total: 160 languages (derived from F21B data).
WALS Chapter 62A distribution, derived from F62A data (N = 168).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 62 total: 168 languages (derived from F62A data).
WALS Chapter 79A distribution, derived from F79A data (N = 193).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 79A total: 193 languages (derived from F79A data).
WALS Chapter 79B distribution, derived from F79B data (N = 193).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 79B total: 193 languages (derived from F79B data).
WALS Chapter 80A distribution, derived from F80A data (N = 193).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 80 total: 193 languages (derived from F80A data).
Fragment profiles are defined in Fragments.{Language}.Morph with values
derived from WALS data via Core.Morphology.wals* lookup helpers.
All 18 morphological mechanism profiles.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Phenomena.Morphology.Typology.countByFusion langs f = (List.filter (fun (p : Core.Morphology.MorphProfile) => p.fusion == f) langs).length
Instances For
Equations
- Phenomena.Morphology.Typology.countByExponence langs e = (List.filter (fun (p : Core.Morphology.MorphProfile) => p.exponence == e) langs).length
Instances For
Equations
- Phenomena.Morphology.Typology.countByLocus langs l = (List.filter (fun (p : Core.Morphology.MorphProfile) => p.locus == l) langs).length
Instances For
Equations
- Phenomena.Morphology.Typology.countBySynthesis langs s = (List.filter (fun (p : Core.Morphology.MorphProfile) => p.verbSynthesis == s) langs).length
Instances For
Count values are determined by WALS-derived Fragment profiles. Values differ from the pre-refactor hand-specified profiles because WALS evaluates specific formatives, not overall typological tradition.
Fusion type distribution in our sample.
Exponence distribution in our sample.
Verb synthesis distribution in our sample.
Locus of marking distribution in our sample.
Fusion counts sum to total.
Locus counts sum to total.
Generalization 1: Suffixing strongly dominates prefixing worldwide. #
Greenberg's Universal 27: "If a language is exclusively suffixing, it is postpositional; if it is exclusively prefixing, it is prepositional." More broadly, suffixing is far more common than prefixing.
Generalization 2: Concatenative morphology is the most common fusion type. #
In the WALS Ch 20 data, exclusively concatenative languages form the largest single category. In our sample, concatenative languages also form the plurality.
Generalization 3: Dependent-marking is the most common locus type #
in our sample.
Generalization 4: Productive reduplication is present in a majority of #
languages in the WALS Ch 27 data.
Generalization 7: Concatenative languages are predominantly #
monoexponential.
This is the defining correlation of agglutination: one-to-one mapping between form and meaning in the morphology.
Generalization 8: Head-marking languages tend to have high verb #
synthesis.
If grammatical relations are marked on the head (verb), then the verb carries agreement morphology for multiple arguments, driving up synthesis.
Generalization: All languages with high verb synthesis have either #
concatenative or nonlinear fusion (never isolating).
Ch 20: Concatenative types (exclusively + strongly + weakly) account for the majority of the sample.
Ch 22: Most languages have 2--7 categories per verb word. The extremes (0--1 and 12--13) are rare.
All ISO codes in the morphological profiles are 3 characters.
No duplicate ISO codes in the morphological profiles.
WALS Grounding #
Per-language grounding theorems verify that Fragment MorphProfile values are
consistent with WALS generated data. Since profiles are now WALS-derived by
construction (via wals* helpers), these theorems serve as regression tests:
if WALS data changes, the corresponding theorem breaks.