Gender and Noun Class Typology (WALS Chapters 30--32) #
@cite{corbett-1991} @cite{dryer-haspelmath-2013} @cite{corbett-2013} @cite{dixon-1972}
Formalizes three chapters from the World Atlas of Language Structures (WALS) covering the typology of gender and noun class systems, all authored by @cite{corbett-2013}:
Chapter 30: Number of Genders -- how many gender/noun class distinctions a language makes. Values range from none (no gender system) through 2, 3, 4, to 5 or more. Most of the world's languages have no gender system at all; among those that do, 2 and 3 genders are most common. Systems with 5 or more categories are typically called "noun class" systems rather than "gender" systems (the Bantu languages being the canonical example).
Chapter 31: Sex-based and Non-sex-based Gender Systems -- whether the gender system is organized around biological sex (masculine/feminine, with or without neuter) or around other semantic/formal criteria such as animacy, shape, or size. The distinction cross-cuts the number of genders: a 2-gender system may be masculine/feminine (French) or animate/inanimate (Ojibwe).
Chapter 32: Systems of Gender Assignment -- how nouns are assigned to their gender categories. Assignment may be purely semantic (based on the meaning of the noun, e.g. male referents are masculine), purely formal (based on phonological or morphological properties), or a combination. A key typological finding is that no language has a purely formal assignment system: formal rules always supplement a semantic core.
Key Concepts #
The boundary between "gender" (2--3 categories, typically sex-based) and "noun class" (4+ categories, often with semantic and formal assignment) is gradient rather than categorical. @cite{corbett-1991} treats them as a single phenomenon at different scales.
Corbett's Agreement Hierarchy governs where gender agreement surfaces: attributive adjective > predicate adjective > relative pronoun > personal pronoun > verb target. If a language shows gender agreement on a lower target, it shows agreement on all higher targets.
Number of gender/noun class distinctions in a language (WALS Ch 30).
The five values form a scale from no gender system at all through increasingly fine-grained classification. Systems with 5 or more categories are often called "noun class" systems rather than "gender" systems, but the boundary is conventional, not categorical.
- none : GenderCount
- two : GenderCount
- three : GenderCount
- four : GenderCount
- fivePlus : GenderCount
Instances For
Equations
- Phenomena.Gender.Typology.instBEqGenderCount.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Numeric lower bound for each GenderCount category.
Equations
Instances For
Whether a raw gender count falls in a given GenderCount category.
Equations
- Phenomena.Gender.Typology.GenderCount.none.contains n = (n == 0)
- Phenomena.Gender.Typology.GenderCount.two.contains n = (n == 2)
- Phenomena.Gender.Typology.GenderCount.three.contains n = (n == 3)
- Phenomena.Gender.Typology.GenderCount.four.contains n = (n == 4)
- Phenomena.Gender.Typology.GenderCount.fivePlus.contains n = decide (n ≥ 5)
Instances For
Chapter 30 total sample size (from generated data).
Whether a gender system is based on biological sex (WALS Ch 31).
Sex-based systems organize genders around a masculine/feminine distinction (possibly with additional genders like neuter). Non-sex-based systems use other semantic criteria: animate/inanimate, human/non-human, shape-based, etc. Many Bantu noun class systems are non-sex-based, organized instead around semantic categories like human, plant, artifact, abstract, liquid, etc.
- noGender : GenderBasis
- sexBased : GenderBasis
- nonSexBased : GenderBasis
Instances For
Equations
- Phenomena.Gender.Typology.instBEqGenderBasis.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Chapter 31 total sample size (from generated data).
How nouns are assigned to their gender categories (WALS Ch 32).
Semantic assignment means gender is determined by meaning: male referents are masculine, female referents are feminine, etc. Formal assignment means gender is determined by the phonological or morphological shape of the noun: e.g., in Italian, nouns ending in -o tend to be masculine, nouns ending in -a tend to be feminine.
A key typological finding is that no language has a purely formal assignment system: formal rules always supplement a semantic core. Languages either use semantic assignment alone or combine semantic and formal criteria.
- noGender : AssignmentSystem
- semanticOnly : AssignmentSystem
- semanticAndFormal : AssignmentSystem
Instances For
Equations
- Phenomena.Gender.Typology.instBEqAssignmentSystem.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Chapter 32 total sample size.
Ch 30: Languages with no gender are the modal category.
Ch 30: Among gender-bearing languages, 2-gender systems are most common.
Ch 31: Sex-based systems far outnumber non-sex-based ones.
Ch 32: Semantic-and-formal assignment slightly outnumbers semantic-only.
Ch 32: No purely formal assignment system is attested. This is Corbett's key generalization: formal assignment always supplements a semantic core, never replaces it. WALS F32A has only three categories: noGender, semantic, and semanticAndFormal — no "formal only" category exists.
Semantic dimensions that can underlie gender/noun class assignment.
Different languages organize their noun classification around different semantic properties. Sex is the most common, but animacy, humanness, and shape/size are also attested as organizing principles.
- sex : SemanticBasis
- animacy : SemanticBasis
- humanness : SemanticBasis
- shape : SemanticBasis
- rationality : SemanticBasis
Instances For
Equations
- Phenomena.Gender.Typology.instBEqSemanticBasis.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
A language's gender profile combining classifications from WALS Chapters 30, 31, and 32, plus additional typological information.
The rawGenderCount field stores the actual number of gender/noun class
categories (not just the WALS bin), enabling finer-grained comparisons.
The agreementTargets list records where gender agreement surfaces.
The semanticBases list records the semantic dimensions organizing
the system.
- name : String
Language name
- iso639 : String
ISO 639-3 code
- genderCount : GenderCount
Ch 30: Number of genders (WALS category)
- rawGenderCount : Nat
Actual number of gender/noun class categories
- basis : GenderBasis
Ch 31: Sex-based or non-sex-based
- assignment : AssignmentSystem
Ch 32: Assignment system
- agreementTargets : List AgreementTarget
Where gender agreement surfaces
- semanticBases : List SemanticBasis
Semantic dimensions organizing the system
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Gender.Typology.instBEqGenderProfile.beq x✝¹ x✝ = false
Instances For
Whether the raw gender count is consistent with the WALS bin.
Equations
Instances For
Whether the profile is internally consistent across chapters: no-gender in Ch 30 should align with noGender in Ch 31 and Ch 32.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether the language qualifies as a "noun class" system (5+ categories).
Equations
- p.isNounClassSystem = decide (p.rawGenderCount ≥ 5)
Instances For
Whether the language has any gender agreement.
Equations
- p.hasAgreement = decide (p.agreementTargets.length > 0)
Instances For
Lowest agreement target (most marked in Corbett's hierarchy).
Equations
- One or more equations did not get rendered due to their size.
Instances For
English: no grammatical gender on nouns or adjectives; only natural gender in 3sg pronouns (he/she/it). No gender agreement on adjectives or verbs.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mandarin Chinese: no gender system whatsoever. No morphological gender marking on nouns, adjectives, or verbs. Third-person pronouns were historically undifferentiated; the written distinction ta1/ta1 is a 20th-century orthographic innovation.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese: no gender system. Classifier system organizes nouns but does not trigger agreement. No gender marking on adjectives, verbs, or pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Turkish: no gender system. No morphological gender distinctions on nouns, adjectives, verbs, or pronouns. The third-person pronoun o is used for all referents regardless of sex or animacy.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish: no gender system. No grammatical gender on nouns, adjectives, or verbs. The third-person pronoun han is used for all referents.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Korean: no gender system. No morphological gender marking. Third-person reference uses demonstratives rather than gendered pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Quechua (Cusco): no gender system. Agglutinative morphology with rich case and number but no gender distinctions.
Equations
- One or more equations did not get rendered due to their size.
Instances For
French: 2 genders (masculine/feminine). Sex-based with considerable formal reinforcement (nouns ending in -tion, -ence tend feminine; -ment, -age tend masculine). Agreement on determiners, adjectives (attributive and predicate), past participles. Assignment is semantic + formal.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Spanish: 2 genders (masculine/feminine). Sex-based with strong formal correlates (-o masculine, -a feminine, with many exceptions). Agreement on determiners, adjectives, past participles, and some verb forms.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hindi-Urdu: 2 genders (masculine/feminine). Sex-based with strong formal assignment (nouns ending in -aa tend masculine, -ii feminine). Agreement on adjectives, verbs (in perfective aspect), and auxiliaries. One of the clearest cases of verb agreement for gender.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Irish: 2 genders (masculine/feminine). Sex-based with formal assignment playing a role (initial consonant mutations triggered by gender). Agreement on determiners, attributive adjectives, and pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hebrew (Modern): 2 genders (masculine/feminine). Sex-based with formal correlates (-a, -et endings often feminine). Agreement on adjectives, verbs (all tenses), demonstratives, and numerals.
Equations
- One or more equations did not get rendered due to their size.
Instances For
German: 3 genders (masculine/feminine/neuter). Sex-based with extensive formal assignment (suffixes like -ung, -heit, -keit are feminine; -chen, -lein diminutives are neuter; Ge- collectives are neuter). Agreement on determiners, adjectives (attributive and predicate), relative pronouns, and personal pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Russian: 3 genders (masculine/feminine/neuter). Sex-based with strong formal correlates (consonant-final stems tend masculine, -a final feminine, -o or -e final neuter). Agreement on adjectives, verbs (past tense), demonstratives, relative pronouns, and personal pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Latin: 3 genders (masculine/feminine/neuter). Sex-based with formal assignment via declension classes (-us, -er 2nd decl. mostly masculine, -a 1st decl. mostly feminine, -um 2nd decl. neuter). Agreement on adjectives and relative/demonstrative pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Romanian: 3 genders (masculine/feminine/neuter -- the neuter behaves as masculine in the singular and feminine in the plural, sometimes analyzed as "ambigeneric"). Sex-based with formal assignment. Agreement on determiners (enclitic definite article), adjectives, and pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Dyirbal (Australia): 4 genders. I: males, most animate beings; II: females, water, fire, fighting; III: edible plants and fruit; IV: residual (everything else). Non-sex-based in the sense that the system includes categories organized around edibility and natural forces, not just biological sex. Semantic assignment only (no formal correlates). Agreement on determiners/classifiers.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Georgian: 4 categories in traditional analysis -- rational (humans) vs non-rational, cross-cut by an older masculine/feminine trace in pronouns. More recent analyses posit a simpler animacy-based split. Non-sex-based: the primary division is rational/non-rational. Semantic assignment. Agreement on verbs (verb agreement cross-references subjects and objects, tracking the rational/non-rational distinction).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Swahili (Bantu): approximately 15--18 noun classes (paired into 9 singular/plural pairings + locative classes). The system is organized around semantic dimensions (human, plant, artifact, abstract, liquid, augmentative, diminutive) but with extensive formal assignment via prefixes (m-, wa- for Class 1/2 humans; ki-, vi- for Class 7/8 tools; etc.). Agreement permeates: determiners, adjectives, verbs, pronouns, numerals, and possessives all agree in noun class.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Zulu (Bantu): approximately 15 noun classes, closely related to the Swahili system (both are Bantu). Organized by similar semantic dimensions with prefix-based formal assignment (umu-, aba- for Class 1/2 humans, etc.). Full agreement across all targets.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Fula (Atlantic, Niger-Congo): approximately 20+ noun classes, one of the richest class systems in Africa. Classes encode semantic distinctions including human, animal, plant, liquid, diminutive, augmentative, and pejorative. Formal assignment via class suffixes (-o, -Be for Class 1/2 humans; -nde, -de for trees, etc.). Agreement on determiners, adjectives, verbs, pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All language profiles in our sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Every language's raw gender count falls within its declared WALS category.
All raw gender counts are consistent with their WALS bins.
Cross-chapter consistency: no-gender in Ch 30 aligns with noGender in Ch 31 and Ch 32; gender-bearing languages do not have noGender in Ch 31 or Ch 32.
All profiles are cross-chapter consistent.
Spot-checks that each language has the expected WALS category values.
Generalization 1: No gender is the most common state. #
The majority of the world's languages have no grammatical gender system at all (145 out of 257 in the WALS Ch 30 sample). This is reflected in our sample, where 7 out of 21 languages have no gender.
Generalization 2: Among gender systems, 2-gender is most common. #
Of the 112 languages with gender in the WALS Ch 30 sample, 50 have exactly 2 genders -- the single most common count among gender-bearing languages.
Generalization 3: Sex-based systems dominate over non-sex-based. #
Among languages with gender, sex-based systems (84) far outnumber non-sex-based ones (25) in the WALS Ch 31 sample. In our sample, all 2- and 3-gender systems are sex-based; non-sex-based systems appear only with 4+ categories.
Generalization 4: No purely formal assignment exists. #
Corbett's key finding: no language assigns gender on a purely formal basis without any semantic core. In our sample, every language with gender has semantic assignment (either alone or combined with formal).
Generalization 5: Formal assignment correlates with more genders. #
Languages that combine semantic and formal assignment tend to have richer gender systems. In our sample, all 3-gender languages use semantic + formal assignment, while semantic-only assignment is attested only in 4-gender systems.
Generalization 6: Agreement targets respect Corbett's hierarchy. #
In our sample, every language that shows gender agreement on verbs also shows it on attributive adjectives (i.e., no language agrees only on verbs without agreeing higher on the hierarchy).
Equations
- Phenomena.Gender.Typology.hasTarget targets t = targets.any fun (x : Core.AgreementTarget) => x == t
Instances For
Generalization 7: Noun class systems (5+) have the richest agreement. #
Languages with 5+ genders show agreement on more targets than languages with fewer genders. In our sample, all noun class systems (Swahili, Zulu, Fula) agree on at least 4 out of 5 target types.
Generalization 8: Non-sex-based systems tend to have more genders. #
In our sample, non-sex-based systems have 4+ genders, while sex-based systems have 2--3. This reflects the cross-linguistic pattern: when gender is not organized around sex, the system tends to proliferate into a richer noun class system.
Generalization 9: All GenderCount bins are attested in the sample. #
Our 21-language sample covers every WALS Chapter 30 category, from no gender through 5+ genders.
Generalization 10: Gender-bearing languages always show some agreement. #
A gender system without any agreement is not a gender system -- genders are precisely the categories that trigger agreement. In our sample, every language with gender has at least one agreement target, and every language without gender has none.
Whether a gender system is "canonical" in Corbett's sense: sex-based, with 2 or 3 genders, and semantic + formal assignment.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The European languages in our sample all have canonical gender systems.
Hindi-Urdu also has a canonical gender system.
Non-canonical gender systems (noun class, non-sex-based) are all non-European in our sample.
The gender-number scale: languages can be ordered from no gender (0 categories) through systems with progressively more categories. In our sample, the scale spans from 0 (English) to 20 (Fula).
The maximum raw gender count in our sample.
All ISO 639-3 codes are non-empty.
All ISO 639-3 codes are exactly 3 characters (standard length).
No duplicate ISO codes (each language appears once).
Number of no-gender languages in our sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of sex-based gender languages in our sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of non-sex-based gender languages in our sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of semantic-only assignment languages.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of semantic-and-formal assignment languages.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of noun class systems (5+).
Equations
Instances For
Corbett's Agreement Hierarchy predicts: if a language shows gender agreement on a lower target, it agrees on all higher targets.
We check: for every pair of targets in a language's agreement list, if a lower-ranked target is present, all higher-ranked targets that the language might show agreement on are consistent with the hierarchy.
Specifically: verb agreement (rank 0) implies at least one higher target (pronoun, predicate, or attributive) is also present.
No language in our sample agrees only on verbs.
No language in our sample has gender agreement only on pronouns: pronoun agreement always co-occurs with at least one other target (attributive, predicate, or verb).
Semantic-only assignment is restricted to languages with non-sex-based systems in our sample. This makes sense: sex-based systems typically have formal correlates (declension class, phonological patterns) that supplement the semantic core.
All sex-based systems in our sample use semantic + formal assignment.
All three GenderBasis values are attested.
All three AssignmentSystem values are attested.
The sample spans at least 5 distinct ISO codes in each category (gender-bearing vs genderless).
Per-language grounding theorems verify that each profile's field values agree with the corresponding WALS chapter data via the converter functions.
Languages with known discrepancies between our profile classifications and WALS are skipped on the discrepant chapter:
- English: WALS F30A records 3 genders (pronominal he/she/it), but our profile treats English as genderless (no gender agreement on nouns/adjectives).
- Georgian: WALS F30A records no gender, but our profile follows analyses that posit 4 categories (rational/non-rational × older masc/fem trace).
- Dyirbal: WALS F31A records sex-based, but our profile classifies the system as non-sex-based (the 4-way system includes edibility/natural-force categories beyond biological sex).