Case Typology (WALS Chapters 49--52) #
@cite{dryer-haspelmath-2013} @cite{iggesen-2013} @cite{stolz-veselinova-2013}
Formalizes four chapters from the World Atlas of Language Structures (WALS) covering the typology of case systems:
- Chapter 49: Number of Cases -- how many morphological cases a language has, from zero to ten or more.
- Chapter 50: Asymmetrical Case-Marking -- whether case marking is conditioned by NP properties (animacy, definiteness, pronoun status). Also known as Differential Case Marking.
- Chapter 51: Position of Case Affixes -- whether case markers are suffixes, prefixes, tonal, or mixed.
- Chapter 52: Comitatives and Instrumentals -- whether comitative ('with X') and instrumental ('by means of X') are marked identically or distinctly.
Each chapter is encoded as an inductive type with distributions
derived from generated WALS data (Ch 49--51) or hand-coded counts (Ch 52).
Language profiles combine all four dimensions, and typological
generalizations are verified over the sample by native_decide.
Number-of-cases categories (WALS Ch. 49, @cite{iggesen-2013}).
Languages are classified by the number of morphological case distinctions in their nominal paradigm. "No morphological case-marking" means the language has no affixal or clitic case at all (e.g., Mandarin, Thai).
- none : CaseCount
- two : CaseCount
- threeFour : CaseCount
- fiveSeven : CaseCount
- eightNine : CaseCount
- tenPlus : CaseCount
Instances For
Equations
- Phenomena.Case.Typology.instBEqCaseCount.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Numeric lower bound for each CaseCount category.
Equations
- Phenomena.Case.Typology.CaseCount.none.lowerBound = 0
- Phenomena.Case.Typology.CaseCount.two.lowerBound = 2
- Phenomena.Case.Typology.CaseCount.threeFour.lowerBound = 3
- Phenomena.Case.Typology.CaseCount.fiveSeven.lowerBound = 5
- Phenomena.Case.Typology.CaseCount.eightNine.lowerBound = 8
- Phenomena.Case.Typology.CaseCount.tenPlus.lowerBound = 10
Instances For
Whether a raw case count falls in a given CaseCount category.
Equations
- Phenomena.Case.Typology.CaseCount.none.contains n = (n == 0)
- Phenomena.Case.Typology.CaseCount.two.contains n = (n == 2)
- Phenomena.Case.Typology.CaseCount.threeFour.contains n = (decide (n ≥ 3) && decide (n ≤ 4))
- Phenomena.Case.Typology.CaseCount.fiveSeven.contains n = (decide (n ≥ 5) && decide (n ≤ 7))
- Phenomena.Case.Typology.CaseCount.eightNine.contains n = (decide (n ≥ 8) && decide (n ≤ 9))
- Phenomena.Case.Typology.CaseCount.tenPlus.contains n = decide (n ≥ 10)
Instances For
Chapter 49 total sample size (from generated data).
Asymmetrical (differential) case-marking types (WALS Ch. 50, @cite{iggesen-2013}).
Differential case marking (DCM) means that case marking on a noun phrase depends on properties of that NP -- its animacy, definiteness, or whether it is a full noun vs. a pronoun. For example, in Hindi-Urdu the accusative marker -ko appears on animate/definite objects but not inanimate/indefinite ones.
- noCase : AsymmetricalCaseMarking
- borderlineOnly : AsymmetricalCaseMarking
- noAsymmetry : AsymmetricalCaseMarking
- animacyOnly : AsymmetricalCaseMarking
- definitenessOnly : AsymmetricalCaseMarking
- pronounOnly : AsymmetricalCaseMarking
- twoConditions : AsymmetricalCaseMarking
- threeConditions : AsymmetricalCaseMarking
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether this type involves any differential case marking.
Equations
Instances For
Number of conditioning factors (0--3).
Equations
- Phenomena.Case.Typology.AsymmetricalCaseMarking.noCase.conditionCount = 0
- Phenomena.Case.Typology.AsymmetricalCaseMarking.borderlineOnly.conditionCount = 0
- Phenomena.Case.Typology.AsymmetricalCaseMarking.noAsymmetry.conditionCount = 0
- Phenomena.Case.Typology.AsymmetricalCaseMarking.animacyOnly.conditionCount = 1
- Phenomena.Case.Typology.AsymmetricalCaseMarking.definitenessOnly.conditionCount = 1
- Phenomena.Case.Typology.AsymmetricalCaseMarking.pronounOnly.conditionCount = 1
- Phenomena.Case.Typology.AsymmetricalCaseMarking.twoConditions.conditionCount = 2
- Phenomena.Case.Typology.AsymmetricalCaseMarking.threeConditions.conditionCount = 3
Instances For
Chapter 50 total sample size (from generated data).
Position of case affixes (WALS Ch. 51, @cite{iggesen-2013}).
Classifies where the case morpheme sits relative to the nominal stem. Languages with no case affixes at all (either no case or case expressed only by adpositions) are distinguished from those with suffixes, prefixes, tonal marking, or mixed strategies.
- noAffixes : CaseAffixPosition
- suffixesOnly : CaseAffixPosition
- prefixesOnly : CaseAffixPosition
- toneOnly : CaseAffixPosition
- bothSuffixPrefix : CaseAffixPosition
Instances For
Equations
- Phenomena.Case.Typology.instBEqCaseAffixPosition.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether this position type involves bound morphology.
Equations
Instances For
Whether suffixal case marking is present.
Equations
Instances For
Chapter 51 total sample size (from generated data).
Comitative-instrumental syncretism (WALS Ch. 52, @cite{stolz-veselinova-2013}).
In many languages the marker for 'with X' (comitative: accompaniment) and 'by means of X' (instrumental: means/instrument) is the same morpheme. For example, Russian uses the instrumental case (-om, -oj) for both "I went with Ivan" and "I cut it with a knife". Other languages distinguish them, e.g. Japanese -to (comitative) vs. -de (instrumental).
- identity : ComitativeInstrumental
- differentiation : ComitativeInstrumental
- mixed : ComitativeInstrumental
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether the language uses the same morpheme for both functions.
Equations
Instances For
Chapter 52 total sample size.
Ch 49: Languages with no case are the modal category.
Ch 49: Case-bearing languages (2+ cases) outnumber caseless ones.
Ch 50: Languages with some asymmetrical case-marking outnumber those with purely symmetrical case.
Ch 51: Suffixal case is the dominant strategy among case-marking languages.
Ch 52: Differentiation is the majority pattern.
A language's case profile, combining classifications from all four WALS case chapters.
This structure records a single language's position in each of the four
typological dimensions. The rawCaseCount field stores the actual number
of morphological cases (not just the WALS bin), enabling finer-grained
generalizations.
- name : String
Language name
- iso639 : String
ISO 639-3 code
- caseCount : CaseCount
Ch 49: Number of cases (WALS category)
- rawCaseCount : Nat
Actual number of morphological cases
- asymmetry : AsymmetricalCaseMarking
Ch 50: Asymmetrical case-marking type
- affixPosition : CaseAffixPosition
Ch 51: Position of case affixes
- comitativeInstr : ComitativeInstrumental
Ch 52: Comitative-instrumental relation
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Case.Typology.instBEqCaseProfile.beq x✝¹ x✝ = false
Instances For
Whether the raw case count is consistent with the WALS bin.
Equations
Instances For
Whether the profile is internally consistent across chapters: no-case in Ch 49 should align with no-case in Ch 50 and no-affixes in Ch 51.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish: 15 morphological cases (nom, gen, acc, part, iness, elat, illat, adess, ablat, allat, ess, transl, instruct, comit, abes). Suffixal. No DCM. Comitative and instrumental are distinct cases.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hungarian: 18 morphological cases (nom, acc, dat, instrum, causal-final, translative, terminative, essive-formal, essive-modal, inessive, elative, illative, superessive, delative, sublative, adessive, ablative, allative). Suffixal agglutinative. Comitative (-val, -vel) = instrumental.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Turkish: 6 cases (nom, acc, gen, dat, loc, abl). Suffixal agglutinative. Differential object marking: definite objects take -I, indefinite do not.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Latin: 6 cases (nom, acc, gen, dat, abl, voc; locative vestigial). Suffixal fusional. No asymmetrical case-marking. Comitative (cum + abl) vs. instrumental (plain abl) are technically distinct strategies.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Russian: 6 cases (nom, acc, gen, dat, instrum, prep/loc). Suffixal fusional. Differential accusative: animate nouns take genitive form in accusative, inanimates keep nominative form.
Equations
- One or more equations did not get rendered due to their size.
Instances For
German: 4 cases (nom, acc, gen, dat). Suffixal fusional with articles carrying most case marking. No systematic DCM. Comitative (mit + dat) and instrumental (mit + dat) use the same marker.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese: case particles (ga, o, ni, no, de, e, to, kara, made,...). Postpositional clitics rather than affixes in WALS's classification. Differential object marking with -o conditioned by specificity/topicality. Comitative -to vs. instrumental -de are distinct.
Equations
- One or more equations did not get rendered due to their size.
Instances For
English: 2-case system surviving only in pronouns (nom/acc: I/me, he/him, she/her, we/us, they/them). No case affixes on nouns. Comitative 'with' and instrumental 'with' are identical.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Korean: case particles (-i/ga nom, -(l)eul acc, -ui gen, -e dat/loc, -eseo loc/source, -(eu)ro instr/dir, -wa/gwa comit). Particles are postpositional clitics. Optional object marking conditioned by definiteness/topicality. Comitative -wa and instrumental -(eu)ro are distinct.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mandarin Chinese: no morphological case. Fixed SVO word order encodes grammatical relations. No case markers, no DCM, comitative and instrumental expressed by distinct prepositions (he 'with-COM' vs. yong 'with-INSTR').
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hindi-Urdu: 3 cases (direct, oblique, vocative). Postpositional system with -ne (ergative), -ko (accusative/dative), -se (instrumental/ ablative), -me (locative). Differential object marking: -ko appears on animate/specific objects. Comitative -ke saath vs. instrumental -se are distinct.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Arabic (Modern Standard): 3 cases (nom -u, acc -a, gen -i). Suffixal. Full case marking on indefinite nouns (tanwin); definite nouns often show reduced marking in spoken varieties, but MSA maintains it. Comitative (maʕa) and instrumental (bi-) are distinct.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Georgian: 7 cases (nom, erg, dat, gen, instrum, adverbial, vocative). Suffixal agglutinative. Split-ergative system conditioned by tense/aspect (not NP properties), so no DCM in the WALS sense. Instrumental -it and comitative -tan are distinct.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Quechua (Cusco): 12+ cases (nom, acc -ta, gen -pa or -q, dat -man, loc -pi, abl -manta, instrum -wan, comit -wan, limit -kama, causal -rayku, benef -paq, topic -qa,...). Suffixal agglutinative. Comitative and instrumental both use -wan (identity).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Basque: ergative-absolutive system with 11+ cases (abs, erg, dat, gen, comit -ekin, instrum -z, iness, allat, ablat, destinat, motivat). Suffixal. Differential ergative marking in some analyses. Comitative -ekin and instrumental -z are distinct.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tamil: 8 cases (nom, acc, dat, gen, instrum, loc, ablat, sociative/ comitative). Suffixal agglutinative. Differential object marking: accusative -ai on animate/definite objects. Comitative -ootu and instrumental -aal are distinct.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All language profiles in our sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Every language's raw case count falls within its declared WALS category.
All raw case counts are consistent with their WALS bins.
Cross-chapter consistency: no-case in Ch 49 aligns with noCase in Ch 50 and noAffixes in Ch 51; case-bearing languages do not have noCase in Ch 50.
All profiles are cross-chapter consistent.
Generalization 1: Case-rich languages are overwhelmingly suffixal. #
Among the world's languages, suffixal case marking is far more common than prefixal. In our sample, every language with case affixes uses suffixes (either exclusively or in combination with prefixes). This reflects the strong universal preference documented by @cite{hawkins-1983} and @cite{dryer-1992}.
Generalization 2: No prefixal-only case in our sample. #
No language in our sample uses exclusively prefixal case marking. Cross-linguistically, prefixal-only case is very rare (WALS Ch 51 reports only 7 out of 261 languages).
Generalization 3: DCM is conditioned by animacy or definiteness. #
Among languages with differential case marking in our sample, the conditioning factors are animacy, definiteness, or pronoun status -- never some other property like gender or number alone.
Generalization 4: Comitative-instrumental identity is common but #
not universal. Identity (syncretism) and differentiation both occur across language families.
Generalization 5: No-case languages have no asymmetrical marking. #
By definition, if there is no morphological case, there can be no asymmetrical (differential) case marking.
Generalization 6: No-case languages have no case affixes. #
Again by definition: without morphological case, there are no case affixes to position.
Generalization 7: 10+-case languages all have suffixal case. #
Highly agglutinative case-rich systems (Finnish, Hungarian, Quechua, Basque) uniformly use suffixes. No case-rich language uses prefixes only or tone only.
Generalization 8: Languages with 2 cases tend toward asymmetrical #
marking.
When a language has only two cases, case marking often applies differentially (to pronouns only, or conditioned by definiteness). English is the classic example: only pronouns show nominative/accusative.
Generalization 9: Comitative-instrumental identity correlates with #
case-rich systems.
Among our case-rich languages (5+ cases), those with identity include Hungarian, Russian, Turkish, Quechua -- all agglutinative or fusional languages where an instrumental case doubles for comitative.
Generalization 10: All CaseCount bins are attested in the sample. #
Our 16-language sample covers every WALS Chapter 49 category.
Spot-checks that each language has the expected WALS category values.
Raw case counts are ordered as expected: Finnish < Hungarian at the top, Mandarin and English at the bottom.
Number of caseless languages in our sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of DCM languages in our sample.
Equations
Instances For
Number of suffixal-case languages in our sample.
Equations
Instances For
Number of comitative-instrumental identity languages.
Equations
Instances For
All ISO 639-3 codes are non-empty.
All ISO 639-3 codes are exactly 3 characters (standard length).
No duplicate ISO codes (each language appears once).
Languages with DCM (Ch 50) all have at least 2 cases (Ch 49).
Languages with case affixes (Ch 51) all have at least 2 cases (Ch 49).
No language with 10+ cases uses identity for comitative-instrumental in our sample that also has no DCM and uses suffixes. This checks a three-way conjunction across chapters.
Ch 49 and Ch 50 share the same 261-language sample.
@cite{aissen-2003} DOM Hierarchy #
Formalizes the bidimensional DOM predictions from:
- Aissen, J. (2003). Differential Object Marking: Iconicity vs. Economy. Natural Language & Linguistic Theory 21(3): 435--483.
The prominence scales (AnimacyLevel, DefinitenessLevel) and their
orderings are defined in Core.Prominence and re-exported here.
DOM is the P-flagging specialization of the general differential marking
framework.
A DOM (Differential Object Marking) profile: a DifferentialMarkingProfile
specialized to role P + channel flagging.
Each cell (a, d) records whether an object with animacy level a
and definiteness level d obligatorily receives an overt DOM marker
(e.g., Spanish a, Turkish -(y)I, Hindi -ko).
DOM is the P-flagging instance of @cite{just-2024}'s general differential
marking framework. Monotonicity (isMonotone), isAnimacyOnly, and
isDefinitenessOnly are all inherited from DifferentialMarkingProfile.
Instances For
Spanish: a-marking for human direct objects regardless of definiteness.
One-dimensional (animacy-based), cutoff between human and animate.
Equations
Instances For
Russian: animate accusative (genitive form used as accusative for animate nouns). One-dimensional (animacy-based), cutoff between animate and inanimate.
Equations
Instances For
Turkish: -(y)I marking for definite direct objects regardless of
animacy. One-dimensional (definiteness-based), cutoff between definite
and indefinite specific.
Equations
Instances For
Hebrew: ʔet marking for definite direct objects regardless of
animacy. Same one-dimensional definiteness cutoff as Turkish.
Equations
Instances For
Persian: -rā marking for definite direct objects. One-dimensional
(definiteness-based) for obligatory marking; optional extension to
specific indefinite animates. Modeled here with the
definiteness-based obligatory core.
Equations
Instances For
Catalan: a-marking restricted to personal pronouns. The most
restrictive DOM pattern attested: only the highest cell on the
definiteness scale receives marking.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hindi-Urdu: -ko marking conditioned by BOTH animacy and definiteness.
Two-dimensional DOM with a staircase cutoff:
- Human objects: marked when indefinite specific or more prominent
- Animate objects: marked when definite or more prominent
- Inanimate objects: not obligatorily marked
This captures the obligatory marking core. Optional/variable marking extends further down the staircase at the boundary cells.
Equations
- One or more equations did not get rendered due to their size.
Instances For
No DOM: no differential marking (either no case at all, or uniform case on all objects). Trivially monotone.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All DOM profiles in the sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Each language's DOM pattern forms an upper set in the bidimensional animacy × definiteness grid — Aissen's central prediction.
Aissen's DOM monotonicity universal: all attested DOM patterns in the sample form upper sets in the bidimensional animacy × definiteness grid. No language marks a less-prominent object while leaving a more-prominent one unmarked.
Verify that the one-dimensional profiles are indeed one-dimensional, and that Hindi is genuinely two-dimensional.
Hindi DOM depends on both animacy and definiteness — it cannot be reduced to a single scale.
Consequences of monotonicity: higher prominence on one dimension implies at least as much marking, holding the other dimension constant.
In all sample languages, human objects are never less marked than animate objects at the same definiteness level.
In all sample languages, animate objects are never less marked than inanimate objects at the same definiteness level.
In all sample languages, pronouns are never less marked than proper names at the same animacy level.
In all sample languages, definite NPs are never less marked than indefinite specific NPs at the same animacy level.
The most prominent cell (human, pronoun) is always marked when any DOM exists; the least prominent cell (inanimate, non-specific) is never marked in our sample.
If any cell is marked, the most prominent cell (human, pronoun) is also marked.
The least prominent cell (inanimate, non-specific) is unmarked in all DOM languages in the sample.
The bidimensional grid has 3 × 5 = 15 cells per language.
Total marked cells across all sample languages.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Marked cells: Spanish (5) + Russian (10) + Turkish (9) + Hebrew (9) + Persian (9) + Catalan (3) + Hindi (7) + NoDOM (0) = 52.