Documentation

Linglib.Phenomena.Gender.Typology

Gender and Noun Class Typology (WALS Chapters 30--32) #

@cite{corbett-1991} @cite{dryer-haspelmath-2013} @cite{corbett-2013} @cite{dixon-1972}

Formalizes three chapters from the World Atlas of Language Structures (WALS) covering the typology of gender and noun class systems, all authored by @cite{corbett-2013}:

Key Concepts #

The boundary between "gender" (2--3 categories, typically sex-based) and "noun class" (4+ categories, often with semantic and formal assignment) is gradient rather than categorical. @cite{corbett-1991} treats them as a single phenomenon at different scales.

Corbett's Agreement Hierarchy governs where gender agreement surfaces: attributive adjective > predicate adjective > relative pronoun > personal pronoun > verb target. If a language shows gender agreement on a lower target, it shows agreement on all higher targets.

Number of gender/noun class distinctions in a language (WALS Ch 30).

The five values form a scale from no gender system at all through increasingly fine-grained classification. Systems with 5 or more categories are often called "noun class" systems rather than "gender" systems, but the boundary is conventional, not categorical.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Chapter 30 total sample size (from generated data).

      Whether a gender system is based on biological sex (WALS Ch 31).

      Sex-based systems organize genders around a masculine/feminine distinction (possibly with additional genders like neuter). Non-sex-based systems use other semantic criteria: animate/inanimate, human/non-human, shape-based, etc. Many Bantu noun class systems are non-sex-based, organized instead around semantic categories like human, plant, artifact, abstract, liquid, etc.

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Chapter 31 total sample size (from generated data).

          How nouns are assigned to their gender categories (WALS Ch 32).

          Semantic assignment means gender is determined by meaning: male referents are masculine, female referents are feminine, etc. Formal assignment means gender is determined by the phonological or morphological shape of the noun: e.g., in Italian, nouns ending in -o tend to be masculine, nouns ending in -a tend to be feminine.

          A key typological finding is that no language has a purely formal assignment system: formal rules always supplement a semantic core. Languages either use semantic assignment alone or combine semantic and formal criteria.

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Ch 32: No purely formal assignment system is attested. This is Corbett's key generalization: formal assignment always supplements a semantic core, never replaces it. WALS F32A has only three categories: noGender, semantic, and semanticAndFormal — no "formal only" category exists.

              Semantic dimensions that can underlie gender/noun class assignment.

              Different languages organize their noun classification around different semantic properties. Sex is the most common, but animacy, humanness, and shape/size are also attested as organizing principles.

              Instances For
                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  A language's gender profile combining classifications from WALS Chapters 30, 31, and 32, plus additional typological information.

                  The rawGenderCount field stores the actual number of gender/noun class categories (not just the WALS bin), enabling finer-grained comparisons. The agreementTargets list records where gender agreement surfaces. The semanticBases list records the semantic dimensions organizing the system.

                  Instances For
                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For
                        Equations
                        Instances For

                          Whether the raw gender count is consistent with the WALS bin.

                          Equations
                          Instances For

                            Whether the profile is internally consistent across chapters: no-gender in Ch 30 should align with noGender in Ch 31 and Ch 32.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Whether the language qualifies as a "noun class" system (5+ categories).

                              Equations
                              Instances For

                                Whether the language has any gender agreement.

                                Equations
                                Instances For

                                  Lowest agreement target (most marked in Corbett's hierarchy).

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    English: no grammatical gender on nouns or adjectives; only natural gender in 3sg pronouns (he/she/it). No gender agreement on adjectives or verbs.

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Mandarin Chinese: no gender system whatsoever. No morphological gender marking on nouns, adjectives, or verbs. Third-person pronouns were historically undifferentiated; the written distinction ta1/ta1 is a 20th-century orthographic innovation.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        Japanese: no gender system. Classifier system organizes nouns but does not trigger agreement. No gender marking on adjectives, verbs, or pronouns.

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Turkish: no gender system. No morphological gender distinctions on nouns, adjectives, verbs, or pronouns. The third-person pronoun o is used for all referents regardless of sex or animacy.

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Finnish: no gender system. No grammatical gender on nouns, adjectives, or verbs. The third-person pronoun han is used for all referents.

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              Korean: no gender system. No morphological gender marking. Third-person reference uses demonstratives rather than gendered pronouns.

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Quechua (Cusco): no gender system. Agglutinative morphology with rich case and number but no gender distinctions.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  French: 2 genders (masculine/feminine). Sex-based with considerable formal reinforcement (nouns ending in -tion, -ence tend feminine; -ment, -age tend masculine). Agreement on determiners, adjectives (attributive and predicate), past participles. Assignment is semantic + formal.

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    Spanish: 2 genders (masculine/feminine). Sex-based with strong formal correlates (-o masculine, -a feminine, with many exceptions). Agreement on determiners, adjectives, past participles, and some verb forms.

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      Hindi-Urdu: 2 genders (masculine/feminine). Sex-based with strong formal assignment (nouns ending in -aa tend masculine, -ii feminine). Agreement on adjectives, verbs (in perfective aspect), and auxiliaries. One of the clearest cases of verb agreement for gender.

                                                      Equations
                                                      • One or more equations did not get rendered due to their size.
                                                      Instances For

                                                        Irish: 2 genders (masculine/feminine). Sex-based with formal assignment playing a role (initial consonant mutations triggered by gender). Agreement on determiners, attributive adjectives, and pronouns.

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          Hebrew (Modern): 2 genders (masculine/feminine). Sex-based with formal correlates (-a, -et endings often feminine). Agreement on adjectives, verbs (all tenses), demonstratives, and numerals.

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            German: 3 genders (masculine/feminine/neuter). Sex-based with extensive formal assignment (suffixes like -ung, -heit, -keit are feminine; -chen, -lein diminutives are neuter; Ge- collectives are neuter). Agreement on determiners, adjectives (attributive and predicate), relative pronouns, and personal pronouns.

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              Russian: 3 genders (masculine/feminine/neuter). Sex-based with strong formal correlates (consonant-final stems tend masculine, -a final feminine, -o or -e final neuter). Agreement on adjectives, verbs (past tense), demonstratives, relative pronouns, and personal pronouns.

                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                Latin: 3 genders (masculine/feminine/neuter). Sex-based with formal assignment via declension classes (-us, -er 2nd decl. mostly masculine, -a 1st decl. mostly feminine, -um 2nd decl. neuter). Agreement on adjectives and relative/demonstrative pronouns.

                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For

                                                                  Romanian: 3 genders (masculine/feminine/neuter -- the neuter behaves as masculine in the singular and feminine in the plural, sometimes analyzed as "ambigeneric"). Sex-based with formal assignment. Agreement on determiners (enclitic definite article), adjectives, and pronouns.

                                                                  Equations
                                                                  • One or more equations did not get rendered due to their size.
                                                                  Instances For

                                                                    Dyirbal (Australia): 4 genders. I: males, most animate beings; II: females, water, fire, fighting; III: edible plants and fruit; IV: residual (everything else). Non-sex-based in the sense that the system includes categories organized around edibility and natural forces, not just biological sex. Semantic assignment only (no formal correlates). Agreement on determiners/classifiers.

                                                                    Equations
                                                                    • One or more equations did not get rendered due to their size.
                                                                    Instances For

                                                                      Georgian: 4 categories in traditional analysis -- rational (humans) vs non-rational, cross-cut by an older masculine/feminine trace in pronouns. More recent analyses posit a simpler animacy-based split. Non-sex-based: the primary division is rational/non-rational. Semantic assignment. Agreement on verbs (verb agreement cross-references subjects and objects, tracking the rational/non-rational distinction).

                                                                      Equations
                                                                      • One or more equations did not get rendered due to their size.
                                                                      Instances For

                                                                        Swahili (Bantu): approximately 15--18 noun classes (paired into 9 singular/plural pairings + locative classes). The system is organized around semantic dimensions (human, plant, artifact, abstract, liquid, augmentative, diminutive) but with extensive formal assignment via prefixes (m-, wa- for Class 1/2 humans; ki-, vi- for Class 7/8 tools; etc.). Agreement permeates: determiners, adjectives, verbs, pronouns, numerals, and possessives all agree in noun class.

                                                                        Equations
                                                                        • One or more equations did not get rendered due to their size.
                                                                        Instances For

                                                                          Zulu (Bantu): approximately 15 noun classes, closely related to the Swahili system (both are Bantu). Organized by similar semantic dimensions with prefix-based formal assignment (umu-, aba- for Class 1/2 humans, etc.). Full agreement across all targets.

                                                                          Equations
                                                                          • One or more equations did not get rendered due to their size.
                                                                          Instances For

                                                                            Fula (Atlantic, Niger-Congo): approximately 20+ noun classes, one of the richest class systems in Africa. Classes encode semantic distinctions including human, animal, plant, liquid, diminutive, augmentative, and pejorative. Formal assignment via class suffixes (-o, -Be for Class 1/2 humans; -nde, -de for trees, etc.). Agreement on determiners, adjectives, verbs, pronouns.

                                                                            Equations
                                                                            • One or more equations did not get rendered due to their size.
                                                                            Instances For

                                                                              All language profiles in our sample.

                                                                              Equations
                                                                              • One or more equations did not get rendered due to their size.
                                                                              Instances For

                                                                                Every language's raw gender count falls within its declared WALS category.

                                                                                All raw gender counts are consistent with their WALS bins.

                                                                                Cross-chapter consistency: no-gender in Ch 30 aligns with noGender in Ch 31 and Ch 32; gender-bearing languages do not have noGender in Ch 31 or Ch 32.

                                                                                All profiles are cross-chapter consistent.

                                                                                Spot-checks that each language has the expected WALS category values.

                                                                                Generalization 1: No gender is the most common state. #

                                                                                The majority of the world's languages have no grammatical gender system at all (145 out of 257 in the WALS Ch 30 sample). This is reflected in our sample, where 7 out of 21 languages have no gender.

                                                                                Generalization 2: Among gender systems, 2-gender is most common. #

                                                                                Of the 112 languages with gender in the WALS Ch 30 sample, 50 have exactly 2 genders -- the single most common count among gender-bearing languages.

                                                                                Generalization 3: Sex-based systems dominate over non-sex-based. #

                                                                                Among languages with gender, sex-based systems (84) far outnumber non-sex-based ones (25) in the WALS Ch 31 sample. In our sample, all 2- and 3-gender systems are sex-based; non-sex-based systems appear only with 4+ categories.

                                                                                Generalization 4: No purely formal assignment exists. #

                                                                                Corbett's key finding: no language assigns gender on a purely formal basis without any semantic core. In our sample, every language with gender has semantic assignment (either alone or combined with formal).

                                                                                Generalization 5: Formal assignment correlates with more genders. #

                                                                                Languages that combine semantic and formal assignment tend to have richer gender systems. In our sample, all 3-gender languages use semantic + formal assignment, while semantic-only assignment is attested only in 4-gender systems.

                                                                                Generalization 6: Agreement targets respect Corbett's hierarchy. #

                                                                                In our sample, every language that shows gender agreement on verbs also shows it on attributive adjectives (i.e., no language agrees only on verbs without agreeing higher on the hierarchy).

                                                                                Generalization 7: Noun class systems (5+) have the richest agreement. #

                                                                                Languages with 5+ genders show agreement on more targets than languages with fewer genders. In our sample, all noun class systems (Swahili, Zulu, Fula) agree on at least 4 out of 5 target types.

                                                                                Generalization 8: Non-sex-based systems tend to have more genders. #

                                                                                In our sample, non-sex-based systems have 4+ genders, while sex-based systems have 2--3. This reflects the cross-linguistic pattern: when gender is not organized around sex, the system tends to proliferate into a richer noun class system.

                                                                                Generalization 9: All GenderCount bins are attested in the sample. #

                                                                                Our 21-language sample covers every WALS Chapter 30 category, from no gender through 5+ genders.

                                                                                Generalization 10: Gender-bearing languages always show some agreement. #

                                                                                A gender system without any agreement is not a gender system -- genders are precisely the categories that trigger agreement. In our sample, every language with gender has at least one agreement target, and every language without gender has none.

                                                                                Whether a gender system is "canonical" in Corbett's sense: sex-based, with 2 or 3 genders, and semantic + formal assignment.

                                                                                Equations
                                                                                • One or more equations did not get rendered due to their size.
                                                                                Instances For

                                                                                  Hindi-Urdu also has a canonical gender system.

                                                                                  Non-canonical gender systems (noun class, non-sex-based) are all non-European in our sample.

                                                                                  The gender-number scale: languages can be ordered from no gender (0 categories) through systems with progressively more categories. In our sample, the scale spans from 0 (English) to 20 (Fula).

                                                                                  The maximum raw gender count in our sample.

                                                                                  All ISO 639-3 codes are non-empty.

                                                                                  All ISO 639-3 codes are exactly 3 characters (standard length).

                                                                                  No duplicate ISO codes (each language appears once).

                                                                                  Number of no-gender languages in our sample.

                                                                                  Equations
                                                                                  • One or more equations did not get rendered due to their size.
                                                                                  Instances For

                                                                                    Number of sex-based gender languages in our sample.

                                                                                    Equations
                                                                                    • One or more equations did not get rendered due to their size.
                                                                                    Instances For

                                                                                      Number of non-sex-based gender languages in our sample.

                                                                                      Equations
                                                                                      • One or more equations did not get rendered due to their size.
                                                                                      Instances For

                                                                                        Number of semantic-only assignment languages.

                                                                                        Equations
                                                                                        • One or more equations did not get rendered due to their size.
                                                                                        Instances For

                                                                                          Number of semantic-and-formal assignment languages.

                                                                                          Equations
                                                                                          • One or more equations did not get rendered due to their size.
                                                                                          Instances For

                                                                                            Corbett's Agreement Hierarchy predicts: if a language shows gender agreement on a lower target, it agrees on all higher targets.

                                                                                            We check: for every pair of targets in a language's agreement list, if a lower-ranked target is present, all higher-ranked targets that the language might show agreement on are consistent with the hierarchy.

                                                                                            Specifically: verb agreement (rank 0) implies at least one higher target (pronoun, predicate, or attributive) is also present.

                                                                                            Semantic-only assignment is restricted to languages with non-sex-based systems in our sample. This makes sense: sex-based systems typically have formal correlates (declension class, phonological patterns) that supplement the semantic core.

                                                                                            The sample spans at least 5 distinct ISO codes in each category (gender-bearing vs genderless).

                                                                                            Per-language grounding theorems verify that each profile's field values agree with the corresponding WALS chapter data via the converter functions.

                                                                                            Languages with known discrepancies between our profile classifications and WALS are skipped on the discrepant chapter: