Documentation

Linglib.Phenomena.Morphology.Typology

Morphological Typology: Paradigm Complexity #

@cite{ackerman-malouf-2013}

Cross-linguistic typological data on morphological paradigm complexity.

LanguageData #

LanguageData records summary statistics about a language's inflectional paradigm system: the number of inflection classes (E-complexity), the number of paradigm cells, and information-theoretic measures of paradigm predictability (I-complexity).

@cite{ackerman-malouf-2013} Sample #

Ten typologically diverse languages from @cite{ackerman-malouf-2013}, spanning three phyla, four macro-areas, and a range from 2 to 109 inflection classes. The central empirical finding: despite wildly varying E-complexity, I-complexity (average conditional entropy) is uniformly low across all ten languages.

MorphProfile Sample #

Eighteen typologically diverse languages with morphological profiles derived from WALS data. Types and WALS lookup helpers are defined in Core.Morphology.MorphProfile; per-language profiles live in Fragments.{Language}.Morph.

Summary statistics for a language's morphological paradigm system, as reported in published studies.

Fields correspond to Tables 2--3 of @cite{ackerman-malouf-2013}.

  • name : String

    Language name

  • family : String

    Language family

  • numClasses :

    Number of inflection classes (E-complexity)

  • numCells :

    Number of paradigm cells

  • avgCondEntropy :

    Average conditional entropy H(Ci|Cj) in bits (I-complexity)

  • maxCellEntropy :

    Maximum cell entropy max H(Ci) in bits

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Nilo-Saharan languages #

      Fur (Nilo-Saharan, Fur; Sudan). 4 classes, 2 cells.

      Equations
      Instances For

        Ngiti (Nilo-Saharan, Central Sudanic; DRC). 8 classes, 2 cells.

        Equations
        Instances For

          Nuer (Nilo-Saharan, Nilotic; Sudan/South Sudan). 31 classes, 4 cells.

          Equations
          Instances For

            Trans-New Guinea languages #

            Kwerba (Trans-New Guinea; Papua, Indonesia). 2 classes, 2 cells.

            Equations
            Instances For

              Oto-Manguean languages #

              Chinantec (Oto-Manguean; Oaxaca, Mexico). 62 classes, 4 cells. Comaltepec Chinantec tonal verb paradigms.

              Equations
              Instances For

                Chiquihuitlan Mazatec (Oto-Manguean; Oaxaca, Mexico). 109 classes, 4 cells. The paper's primary case study (section 4).

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Uralic languages #

                  Finnish (Uralic, Finnic). 51 classes, 8 cells.

                  Equations
                  Instances For

                    Indo-European languages #

                    German (Indo-European, Germanic). 7 classes, 8 cells.

                    Equations
                    Instances For

                      Russian (Indo-European, Slavic). 8 classes, 8 cells.

                      Equations
                      Instances For

                        Spanish (Indo-European, Romance). 3 classes, 57 cells.

                        Equations
                        Instances For

                          All 10 languages in the @cite{ackerman-malouf-2013} sample (Table 3).

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            The LCEC threshold: all 10 languages fall below 1 bit of average conditional entropy. Even the most complex system (Mazatec, 109 classes) has I-complexity < 1 bit.

                            Equations
                            Instances For

                              Expected I-complexity under random class assignment for Mazatec (Monte Carlo baseline). The paper reports the mean of 1000 random permutations as ~5.25 bits, far above the observed 0.709 bits.

                              Equations
                              Instances For

                                Morphological Mechanisms: WALS Chapters 20--29 #

                                Chapters 20--29 of WALS cover the fundamental mechanisms of inflectional morphology: how formatives are put together (fusion), how many categories a single formative expresses (exponence), how synthetic verb morphology is (inflectional synthesis), where grammatical relations are marked (locus of marking), whether affixes go before or after stems (prefixing vs suffixing), and whether the language has productive reduplication.

                                Typological classification types are defined in Core.Morphology.MorphProfile.

                                Sources: @cite{bickel-nichols-2013a} (Ch 20, Fusion) @cite{bickel-nichols-2013b} (Ch 21, Exponence) @cite{bickel-nichols-2013c} (Ch 22, Inflectional Synthesis) @cite{nichols-bickel-2013b} (Ch 23, Locus of Marking in the Clause) @cite{nichols-bickel-2013c} (Ch 24, Locus of Marking in Possessive NPs) @cite{nichols-bickel-2013a} (Ch 25, Locus of Marking: Whole-language Typology) @cite{nichols-bickel-2013d} (Ch 25B, Zero Marking of A and P Arguments) @cite{baerman-brown-2013} (Ch 28, Case Syncretism) @cite{baerman-brown-2013a} (Ch 29, Syncretism in Verbal Person/Number Marking) @cite{dryer-2013-wals} (Ch 26, Prefixing vs. Suffixing) @cite{rubino-2013} (Ch 27, Reduplication)

                                theorem Phenomena.Morphology.Typology.vietnamese_isolating :
                                Core.WALS.F20A.lookup "vie" = some { walsCode := "vie", language := "Vietnamese", iso := "vie", value := Core.WALS.F20A.FusionType.exclusivelyIsolating }
                                theorem Phenomena.Morphology.Typology.indonesian_isolating :
                                Core.WALS.F20A.lookup "ind" = some { walsCode := "ind", language := "Indonesian", iso := "ind", value := Core.WALS.F20A.FusionType.exclusivelyIsolating }
                                theorem Phenomena.Morphology.Typology.arabic_ablaut :
                                Core.WALS.F20A.lookup "aeg" = some { walsCode := "aeg", language := "Arabic (Egyptian)", iso := "arz", value := Core.WALS.F20A.FusionType.ablautConcatenative }
                                theorem Phenomena.Morphology.Typology.hebrew_ablaut :
                                Core.WALS.F20A.lookup "heb" = some { walsCode := "heb", language := "Hebrew (Modern)", iso := "heb", value := Core.WALS.F20A.FusionType.ablautConcatenative }

                                A single row in a WALS distribution table: a label and a language count.

                                Instances For
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For
                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      WALS Chapter 20 distribution, derived from F20A data (@cite{bickel-nichols-2013a}).

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        Ch 20 total: 165 languages (derived from F20A data).

                                        theorem Phenomena.Morphology.Typology.turkish_monoexp :
                                        Core.WALS.F21A.lookup "tur" = some { walsCode := "tur", language := "Turkish", iso := "tur", value := Core.WALS.F21A.ExponenceType.monoexponentialCase }
                                        theorem Phenomena.Morphology.Typology.finnish_caseNumber :
                                        Core.WALS.F21A.lookup "fin" = some { walsCode := "fin", language := "Finnish", iso := "fin", value := Core.WALS.F21A.ExponenceType.caseNumber }
                                        theorem Phenomena.Morphology.Typology.german_caseNumber :
                                        Core.WALS.F21A.lookup "ger" = some { walsCode := "ger", language := "German", iso := "deu", value := Core.WALS.F21A.ExponenceType.caseNumber }
                                        theorem Phenomena.Morphology.Typology.russian_caseNumber :
                                        Core.WALS.F21A.lookup "rus" = some { walsCode := "rus", language := "Russian", iso := "rus", value := Core.WALS.F21A.ExponenceType.caseNumber }
                                        theorem Phenomena.Morphology.Typology.english_noCase :
                                        Core.WALS.F21A.lookup "eng" = some { walsCode := "eng", language := "English", iso := "eng", value := Core.WALS.F21A.ExponenceType.noCase }
                                        theorem Phenomena.Morphology.Typology.kayardild_caseTam :
                                        Core.WALS.F21A.lookup "kay" = some { walsCode := "kay", language := "Kayardild", iso := "gyd", value := Core.WALS.F21A.ExponenceType.caseTam }

                                        WALS Chapter 21 distribution, derived from F21A data (@cite{bickel-nichols-2013b}).

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Ch 21 total: 162 languages (derived from F21A data).

                                          WALS Chapter 22 distribution, derived from F22A data (@cite{bickel-nichols-2013c}).

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Ch 22 total: 145 languages (derived from F22A data).

                                            theorem Phenomena.Morphology.Typology.arabic_eg_weakSuffix :
                                            Core.WALS.F26A.lookup "aeg" = some { walsCode := "aeg", language := "Arabic (Egyptian)", iso := "arz", value := Core.WALS.F26A.PrefixSuffixPreference.weaklySuffixing }

                                            WALS Chapter 26 distribution, derived from F26A data (@cite{dryer-haspelmath-2013}).

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              Ch 26 total: 969 languages (derived from F26A data).

                                              WALS Chapter 27 distribution, derived from F27A data (@cite{rubino-2013}).

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Ch 27 total: 368 languages (derived from F27A data).

                                                WALS Chapter 23 distribution, derived from F23A data (N = 236).

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  Ch 23 total: 236 languages (derived from F23A data).

                                                  WALS Chapter 24 distribution, derived from F24A data (N = 236).

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    Ch 24 total: 236 languages (derived from F24A data).

                                                    WALS Chapter 25A distribution, derived from F25A data (N = 236).

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      Ch 25A total: 236 languages (derived from F25A data).

                                                      WALS Chapter 25B distribution, derived from F25B data (N = 235).

                                                      Equations
                                                      • One or more equations did not get rendered due to their size.
                                                      Instances For

                                                        Ch 25B total: 235 languages (derived from F25B data).

                                                        WALS Chapter 28 distribution, derived from F28A data (N = 198).

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          Ch 28 total: 198 languages (derived from F28A data).

                                                          WALS Chapter 29 distribution, derived from F29A data (N = 198).

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            Ch 29 total: 198 languages (derived from F29A data).

                                                            WALS Chapter 21B distribution, derived from F21B data (N = 160).

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              Ch 21B total: 160 languages (derived from F21B data).

                                                              WALS Chapter 62A distribution, derived from F62A data (N = 168).

                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                Ch 62 total: 168 languages (derived from F62A data).

                                                                WALS Chapter 79A distribution, derived from F79A data (N = 193).

                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For

                                                                  Ch 79A total: 193 languages (derived from F79A data).

                                                                  WALS Chapter 79B distribution, derived from F79B data (N = 193).

                                                                  Equations
                                                                  • One or more equations did not get rendered due to their size.
                                                                  Instances For

                                                                    Ch 79B total: 193 languages (derived from F79B data).

                                                                    WALS Chapter 80A distribution, derived from F80A data (N = 193).

                                                                    Equations
                                                                    • One or more equations did not get rendered due to their size.
                                                                    Instances For

                                                                      Ch 80 total: 193 languages (derived from F80A data).

                                                                      Fragment profiles are defined in Fragments.{Language}.Morph with values derived from WALS data via Core.Morphology.wals* lookup helpers.

                                                                      All 18 morphological mechanism profiles.

                                                                      Equations
                                                                      • One or more equations did not get rendered due to their size.
                                                                      Instances For

                                                                        Count values are determined by WALS-derived Fragment profiles. Values differ from the pre-refactor hand-specified profiles because WALS evaluates specific formatives, not overall typological tradition.

                                                                        Generalization 1: Suffixing strongly dominates prefixing worldwide. #

                                                                        Greenberg's Universal 27: "If a language is exclusively suffixing, it is postpositional; if it is exclusively prefixing, it is prepositional." More broadly, suffixing is far more common than prefixing.

                                                                        Generalization 2: Concatenative morphology is the most common fusion type. #

                                                                        In the WALS Ch 20 data, exclusively concatenative languages form the largest single category. In our sample, concatenative languages also form the plurality.

                                                                        Generalization 3: Dependent-marking is the most common locus type #

                                                                        in our sample.

                                                                        Generalization 4: Productive reduplication is present in a majority of #

                                                                        languages in the WALS Ch 27 data.

                                                                        Generalization 7: Concatenative languages are predominantly #

                                                                        monoexponential.

                                                                        This is the defining correlation of agglutination: one-to-one mapping between form and meaning in the morphology.

                                                                        Generalization 8: Head-marking languages tend to have high verb #

                                                                        synthesis.

                                                                        If grammatical relations are marked on the head (verb), then the verb carries agreement morphology for multiple arguments, driving up synthesis.

                                                                        Generalization: All languages with high verb synthesis have either #

                                                                        concatenative or nonlinear fusion (never isolating).

                                                                        Ch 26: Suffixing (strongly + weakly) accounts for over half of languages with affixation (529/816 = 65%).

                                                                        Ch 20: Concatenative types (exclusively + strongly + weakly) account for the majority of the sample.

                                                                        Ch 22: Most languages have 2--7 categories per verb word. The extremes (0--1 and 12--13) are rare.

                                                                        theorem Phenomena.Morphology.Typology.ch27_reduplication_split :
                                                                        have withRedup := 147 + 35; have withoutRedup := 186; withRedup > withoutRedup / 2 withoutRedup > withRedup / 2

                                                                        Ch 27: Languages split roughly evenly between having productive reduplication and not.

                                                                        All ISO codes in the morphological profiles are 3 characters.

                                                                        WALS Grounding #

                                                                        Per-language grounding theorems verify that Fragment MorphProfile values are consistent with WALS generated data. Since profiles are now WALS-derived by construction (via wals* helpers), these theorems serve as regression tests: if WALS data changes, the corresponding theorem breaks.