Cross-Linguistic Typology of Articles and Demonstratives (WALS) #
@cite{bhat-2013} @cite{greenberg-1978} @cite{himmelmann-1997} @cite{dryer-haspelmath-2013}
Typological data on definiteness marking, indefinite articles, and demonstrative systems across languages, drawn from five chapters of the World Atlas of Language Structures:
- Ch 37 (Dryer): Definite articles -- whether a language has a definite article as a word distinct from demonstratives, an affix, a demonstrative used for definiteness, or no definite marking at all.
- Ch 38 (Dryer): Indefinite articles -- whether a language has an indefinite article distinct from the numeral 'one', uses 'one' as an indefinite article, has an indefinite affix, or lacks indefinite articles entirely.
- Ch 41 (Diessel): Distance contrasts in demonstratives -- the number of distance distinctions encoded in adnominal demonstratives (1 through 5+).
- Ch 42 (Diessel): Pronominal and adnominal demonstratives -- whether the two demonstrative uses have the same form, different stems, or different inflectional features.
- Ch 43 (Bhat): Third-person pronouns and demonstratives -- whether 3rd-person pronouns are identical to, derived, or unrelated to demonstratives.
Key Generalizations #
- Two-way demonstrative distance systems (proximal/distal) are the most common cross-linguistically (54.3%), followed by three-way systems (37.6%).
- Languages with definite articles tend to also have indefinite articles, but the reverse is not true: 41 languages have only an indefinite article with no definite article.
- In a majority of languages (125 of 225), third-person pronouns show some relationship to demonstratives -- the diachronic pathway demonstrative -> 3rd-person pronoun is well attested.
- The grammaticalization cline demonstrative -> definite article -> definite affix is a well-established diachronic pathway.
Definite article type (WALS Ch 37, @cite{dryer-haspelmath-2013}).
Classifies languages by how (or whether) they mark definiteness on nouns. The categories are ordered along a grammaticalization cline: demonstrative -> definite word -> definite affix.
- definiteWord : DefiniteArticleType
Definite word distinct from demonstratives (e.g. English "the").
- definiteAffix : DefiniteArticleType
Definite affix on the noun (e.g. Danish "-en", Arabic "al-").
- demonstrativeUsed : DefiniteArticleType
No dedicated definite article; a demonstrative is used for definiteness (e.g. Ojibwa, Swahili).
- noDefButIndef : DefiniteArticleType
No definite article, but language has an indefinite article.
- noArticle : DefiniteArticleType
Neither definite nor indefinite article.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Indefinite article type (WALS Ch 38, @cite{dryer-haspelmath-2013}).
Languages either have a dedicated indefinite word (distinct from 'one'), use the numeral 'one' as an indefinite marker (the most common grammaticalization path), have an indefinite affix, or lack indefinite articles entirely.
- indefiniteWord : IndefiniteArticleType
Indefinite word distinct from the numeral 'one' (e.g. English "a").
- numeralOne : IndefiniteArticleType
Numeral 'one' used as indefinite article (e.g. German "ein").
- indefiniteAffix : IndefiniteArticleType
Indefinite affix on noun.
- noIndefButDef : IndefiniteArticleType
No indefinite article, but language has a definite article.
- noArticle : IndefiniteArticleType
Neither indefinite nor definite article.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of distance contrasts in adnominal demonstratives (WALS Ch 41, @cite{diessel-2013}).
Two-way systems (proximal/distal) are by far the most common (54.3%), followed by three-way systems (37.6%). Systems with four or more distinctions are rare (< 6%).
Three-way systems subdivide into distance-oriented (proximal/medial/distal, about 2/3) and person-oriented (near speaker / near hearer / away from both, about 1/3). Japanese ko/so/a is the canonical person-oriented example.
- noContrast : DemDistanceSystem
No distance contrast; demonstratives are distance-neutral (e.g. Modern German "dieser").
- twoWay : DemDistanceSystem
Two-way contrast: proximal vs distal (e.g. English "this"/"that").
- threeWay : DemDistanceSystem
Three-way contrast (e.g. Japanese ko/so/a, Latin hic/iste/ille).
- fourWay : DemDistanceSystem
Four-way contrast (e.g. Hausa).
- fiveOrMore : DemDistanceSystem
Five or more distance contrasts.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether a three-way demonstrative system is distance-oriented or person-oriented.
In a distance-oriented system (e.g. Hunzib), all three terms indicate relative distance from the speaker. In a person-oriented system (e.g. Japanese), one term specifically denotes proximity to the hearer.
@cite{diessel-2013} notes that about 2/3 of three-way systems are distance-oriented and about 1/3 are person-oriented.
- distanceOriented : DemOrientationType
All terms encode distance from speaker (proximal/medial/distal).
- personOriented : DemOrientationType
One term encodes proximity to the hearer (near-speaker/near-hearer/distal).
- notApplicable : DemOrientationType
Not applicable (system is not three-way).
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Relationship between pronominal and adnominal demonstratives (WALS Ch 42, @cite{diessel-2013}).
English uses the same forms ("this book" / "I like this"); French uses different stems (adnominal "ce"/"cette" vs pronominal "celui"/"celle"); Turkish uses the same stems but different inflectional features.
- sameForms : DemFormRelation
Same forms for pronominal and adnominal use (e.g. English).
- differentStems : DemFormRelation
Different stems (e.g. French ce/celui, Korean i + defective noun).
- differentInflection : DemFormRelation
Same stems but different inflectional features (e.g. Turkish).
Instances For
Equations
- Phenomena.Reference.Typology.instBEqDemFormRelation.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Relationship between third-person pronouns and demonstratives (WALS Ch 43, @cite{bhat-2013}).
In "two-person languages" (Bhat's term), 3rd-person pronouns are related to demonstratives -- the pronoun is either identical to a demonstrative or derived from one. In "three-person languages", 3rd-person pronouns form part of the person paradigm and are unrelated to demonstratives.
The majority of the world's languages (125/225 = 55.6%) show some relationship, supporting the diachronic pathway demonstrative -> 3rd-person pronoun.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether 3rd-person pronouns show ANY relationship to demonstratives (Bhat's "two-person" vs "three-person" distinction).
Equations
Instances For
A language's article and demonstrative profile across all five WALS chapters.
Not all chapters have data for every language (WALS samples vary by chapter), so each field is optional.
- language : String
- family : String
- iso : String
ISO 639-3 code
- defArticle : Option DefiniteArticleType
Ch 37: Definite article type
- indefArticle : Option IndefiniteArticleType
Ch 38: Indefinite article type
- demDistance : Option DemDistanceSystem
Ch 41: Distance contrasts in demonstratives
- demOrientation : Option DemOrientationType
Ch 41 subtype: distance-oriented vs person-oriented (for 3-way systems)
- demFormType : Option DemFormRelation
Ch 42: Pronominal vs adnominal demonstrative form
- pronDemRelation : Option PronounDemRelation
Ch 43: 3rd-person pronoun ~ demonstrative relationship
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
English (Indo-European, Germanic). Definite article "the" distinct from demonstratives "this"/"that". Indefinite article "a/an" distinct from numeral "one". Two-way demonstrative distance: this (proximal) vs that (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns ("he/she/it") unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
French (Indo-European, Romance). Definite article "le/la/les" distinct from demonstratives. Indefinite article "un/une" historically from numeral 'one' but now distinct (WALS codes French as indefinite word distinct from 'one'). Two-way demonstrative distance: ce N-ci (proximal) vs ce N-la (distal), though adnominal "ce" alone is distance-neutral. Different stems for pronominal ("celui/celle") vs adnominal ("ce/cette"). 3rd-person pronouns ("il/elle") unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
German (Indo-European, Germanic). Definite article "der/die/das" distinct from demonstratives. Indefinite article "ein" = numeral 'one' (phonological reduction in speech). Distance-neutral adnominal demonstratives: "dieser" and stressed "der/die/das" are deictically noncontrastive; distance expressed by adding adverbial "hier"/"da". Classified as no-contrast in WALS Ch 41. Different inflectional features: pronominal demonstratives inflect for case while adnominal demonstratives co-occur with inflected nouns. 3rd-person pronouns ("er/sie/es") unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese (Japonic). No definite or indefinite articles. Three-way person-oriented demonstrative system: ko- (near speaker), so- (near hearer), a- (away from both). The canonical person-oriented system. Different stems: adnominal kono/sono/ano vs pronominal kore/sore/are. 3rd-person pronouns ("kare/kanojo") unrelated to demonstratives (borrowed from Classical Chinese / relatively recent innovations).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mandarin Chinese (Sino-Tibetan). No definite or indefinite articles (bare nouns are ambiguous for definiteness). Two-way demonstrative distance: zhe (proximal) vs na (distal). Same forms for pronominal and adnominal demonstratives (with optional classifier in adnominal use). 3rd-person pronoun "ta" unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Turkish (Turkic). No definite article; indefinite article "bir" = numeral 'one' (different NP position when used as article vs numeral, per @cite{kornfilt-1997}). Two-way demonstrative distance: bu (proximal) vs o (distal), with su as a restricted medial form. WALS codes as two-way. Different inflectional features: pronominal demonstratives inflect for case and number, adnominal demonstratives are uninflected particles. 3rd-person pronoun "o" identical to distal demonstrative.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Arabic (Egyptian) (Afro-Asiatic, Semitic). Definite prefix "al-" on nouns (definite affix). No indefinite article (unmarked nouns are indefinite, though tanwin in Standard Arabic marks indefiniteness). Two-way demonstrative distance: hada (proximal) vs daak (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "huwa/hiya" unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish (Uralic). No definite or indefinite articles. Two-way demonstrative distance: tama (proximal) vs tuo/se (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "han" (human) / "se" (nonhuman) -- "se" is identical to the distal demonstrative.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hungarian (Uralic). Definite article "a/az" distinct from demonstratives. Indefinite article "egy" = numeral 'one'. Two-way demonstrative distance: ez (proximal) vs az (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "o" unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Russian (Indo-European, Slavic). No definite or indefinite articles. Two-way demonstrative distance: etot (proximal) vs tot (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns ("on/ona/ono") unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Swahili (Niger-Congo, Bantu). Demonstrative used as definite marker (precedes noun for definiteness, follows noun for deictic use; WALS Ch 37 value 2). No indefinite article. Three-way demonstrative distance: h- (proximal) / h-o (medial) / -le (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns related via shared noun-class agreement prefixes (gender-marker relationship).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tagalog (Austronesian, Philippine). Definite/indefinite distinction via case markers: "ang" (definite-like topic marker) vs "ng" (indefinite-like). WALS codes as definite word distinct from demonstratives. Two-way demonstrative distance: ito (proximal) vs iyon (distal), with middle term "iyan" sometimes yielding a three-way analysis. WALS codes as three-way. Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "siya" unrelated to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Latin (Indo-European, Italic). No definite or indefinite articles (bare nouns are ambiguous). Three-way distance-oriented demonstrative system: hic (proximal), iste (medial), ille (distal). This is the textbook three-way distance-oriented system. Same forms for pronominal and adnominal demonstratives (though with full case/gender/number inflection in both uses). 3rd-person pronoun "is/ea/id" distinct from but historically related to demonstratives (related to all demonstratives via shared inflection patterns).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Korean (Koreanic). No definite or indefinite articles (topic marker "-un"/"-nun" sometimes conveys definiteness pragmatically). Three-way person-oriented demonstrative system: i (near speaker), ku (near hearer), ce (away from both). Different stems: pronominal demonstratives formed by combining i/ku/ce with a "defective noun" like "il" (thing/fact), giving "i-il", "ku-il", etc. (@cite{sohn-1994}: 295, @cite{diessel-2013}). 3rd-person pronoun "ku" related to medial demonstrative "ku".
Equations
- One or more equations did not get rendered due to their size.
Instances For
Danish (Indo-European, Germanic). Definite suffix "-en"/"-et" on nouns (definite affix); separate definite article "den/det" used when an adjective is present. Indefinite article "en/et" = numeral 'one'. Two-way demonstrative distance: denne (proximal) vs den (distal). Different inflectional features between pronominal and adnominal uses. 3rd-person pronouns ("han/hun/den/det") -- "den/det" related to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hausa (Afro-Asiatic, Chadic). No definite article; no standard indefinite article (WALS Ch 37 value 5). Four-way person-oriented demonstrative system: nan (near speaker), nan (near hearer, tonal difference), can (away from both), can (far away). Hausa is a key example of a four-or-more-way system. Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns related to demonstratives.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Basque (language isolate). Definite suffix "-a"/"-ak" (definite affix). No indefinite article (bare nouns are indefinite). Two-way demonstrative distance: hau (proximal) vs hura (distal), with hori (medial) yielding a three-way system. Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns: hau/hori/hura function as both demonstratives and 3rd-person pronouns (@cite{saltarelli-etal-1988}: 213).
Equations
- One or more equations did not get rendered due to their size.
Instances For
All 16 language profiles.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- c.total = c.definiteWord + c.demonstrativeUsed + c.definiteAffix + c.noDefButIndef + c.noArticle
Instances For
WALS Ch 37 distribution (@cite{dryer-haspelmath-2013}, n = 566).
Equations
- Phenomena.Reference.Typology.walsDefiniteArticle = { definiteWord := 197, demonstrativeUsed := 56, definiteAffix := 84, noDefButIndef := 41, noArticle := 188 }
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- c.total = c.indefiniteWord + c.numeralOne + c.indefiniteAffix + c.noIndefButDef + c.noArticle
Instances For
WALS Ch 38 distribution (@cite{dryer-haspelmath-2013}, n = 473).
Equations
- Phenomena.Reference.Typology.walsIndefiniteArticle = { indefiniteWord := 91, numeralOne := 90, indefiniteAffix := 23, noIndefButDef := 81, noArticle := 188 }
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Instances For
WALS Ch 41 distribution (@cite{diessel-2013}, n = 234).
Equations
- Phenomena.Reference.Typology.walsDemDistance = { noContrast := 7, twoWay := 127, threeWay := 88, fourWay := 8, fiveOrMore := 4 }
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- c.total = c.sameForms + c.differentStems + c.differentInflection
Instances For
WALS Ch 42 distribution (@cite{diessel-2013}, n = 201).
Equations
- Phenomena.Reference.Typology.walsDemForm = { sameForms := 143, differentStems := 37, differentInflection := 21 }
Instances For
WALS Ch 43: Third-person pronoun ~ demonstrative relationship across 225 languages.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- c.total = c.unrelated + c.relatedAll + c.relatedRemote + c.relatedNonRemote + c.relatedGender + c.relatedNonhuman
Instances For
Total count of languages where 3rd-person pronouns show any relationship to demonstratives.
Equations
- c.totalRelated = c.relatedAll + c.relatedRemote + c.relatedNonRemote + c.relatedGender + c.relatedNonhuman
Instances For
WALS Ch 43 distribution (@cite{bhat-2013}, n = 225).
Equations
- Phenomena.Reference.Typology.walsPronounDem = { unrelated := 100, relatedAll := 52, relatedRemote := 18, relatedNonRemote := 14, relatedGender := 24, relatedNonhuman := 17 }
Instances For
Ch 37 distribution counts derived from WALS generated data (n = 620).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 38 distribution counts derived from WALS generated data (n = 534).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 41 distribution counts derived from WALS generated data (n = 234).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 42 distribution counts derived from WALS generated data (n = 201).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 43 distribution counts derived from WALS generated data (n = 225).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 42 generated counts match hand-coded exactly.
Ch 43 generated counts match hand-coded exactly.
Two-way demonstrative systems (proximal/distal) are the most common type, accounting for 127 of 234 languages in the WALS sample (54.3%).
@cite{diessel-2013}: "The vast majority of the world's languages employ two or three distance-marked demonstratives: 54.3 per cent of all languages shown on the map have adnominal demonstratives that express a two-way contrast."
Two-way and three-way systems together account for over 90% of languages. One-way, four-way, and five-or-more-way systems together are under 10%.
Languages with definite articles tend to also have indefinite articles. The evidence: 81 languages have a definite article but no indefinite article (Ch 38, value 4), compared to 41 languages that have an indefinite article but no definite article (Ch 37, value 4).
This asymmetry suggests that definiteness marking is the typologically "prior" or more basic category: languages are more likely to grammaticalize definiteness without indefiniteness than vice versa.
Note: This is a tendency, not an absolute universal. The 41 exceptions (indefinite without definite) are concentrated in Asia (Turkey to Iran) and New Guinea.
Languages with some form of definite marking (word, affix, or demonstrative) outnumber those without. 337 of 566 languages (59.5%) have definite marking.
In most languages (143 of 201 = 71.1%), pronominal and adnominal demonstratives have the same forms (@cite{diessel-2013}, Ch 42). Languages where adnominal demonstratives have different stems (37) or different inflectional features (21) are the minority.
Languages with same-form demonstratives are especially prevalent in Australia (where no language in the sample differentiates the two uses) and North America (except Pacific Northwest).
The grammaticalization cline: demonstrative -> definite article -> definite affix.
This is supported by the WALS data:
- 56 languages use demonstratives as definite markers (mid-cline)
- 84 languages have definite affixes (end of cline)
- 197 languages have definite words distinct from demonstratives (a stage where the article has diverged from the source demonstrative)
The 56 "demonstrative used as definite marker" languages represent the transitional stage where this grammaticalization is actively underway.
See @cite{greenberg-1978}, @cite{himmelmann-1997} for theoretical discussion.
Among languages without dedicated definite articles, a substantial proportion (56 out of 285 article-less languages = 19.6%) use demonstratives to mark definiteness. This is the "demonstrative used as definite article" category from Ch 37.
This confirms the typological observation that demonstratives are the natural source for definiteness marking: even languages that lack a grammaticalized definite article often use demonstratives in definite contexts more frequently than expected.
Helper: does a profile have any definite marking?
Equations
- One or more equations did not get rendered due to their size.
Instances For
Helper: does a profile have any indefinite article?
Equations
- One or more equations did not get rendered due to their size.
Instances For
In our 16-language sample, all but one language with an indefinite article also has some form of definite marking (article, affix, or demonstrative used). The exception is Turkish, which has the numeral "bir" as an indefinite article but no dedicated definite article (WALS Ch 37, value 4).
This is consistent with the WALS aggregate data, which shows 41 languages with indefinite but no definite articles, concentrated in the Turkey-to-Iran belt and New Guinea.
Turkish is the one exception: indefinite article but no definite marking.
In our sample, languages with three-way or larger demonstrative systems tend to lack articles entirely. 5 of the 7 languages with 3+ distance contrasts have no definite or indefinite articles.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Count of languages in our sample with three-way-or-more demonstratives.
8 of our 16 sample languages show some relationship between 3rd-person pronouns and demonstratives (Turkish, Finnish, Swahili, Latin, Korean, Danish, Hausa, Basque), consistent with Bhat's finding that a majority of languages worldwide (125 of 225) are "two-person languages."
Equations
- Phenomena.Reference.Typology.hasPronDemRelated p = match p.pronDemRelation with | some r => r.isRelated | none => false
Instances For
The grammaticalization hierarchy for definiteness marking, attested cross-linguistically:
Stage 0: No definiteness marking (bare nouns, e.g. Mandarin, Russian) Stage 1: Demonstrative used for definiteness (e.g. Swahili, Ojibwa) Stage 2: Definite word distinct from demonstrative (e.g. English, French) Stage 3: Definite affix (e.g. Danish, Arabic, Basque)
Each stage represents a further degree of grammaticalization: phonological reduction, semantic bleaching (loss of deictic content), and increased obligatoriness.
- noMarking : GrammaticalizationStage
- demonstrative : GrammaticalizationStage
- definiteWord : GrammaticalizationStage
- definiteAffix : GrammaticalizationStage
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Map a DefiniteArticleType to its grammaticalization stage.
Equations
- Phenomena.Reference.Typology.DefiniteArticleType.noArticle.stage = Phenomena.Reference.Typology.GrammaticalizationStage.noMarking
- Phenomena.Reference.Typology.DefiniteArticleType.noDefButIndef.stage = Phenomena.Reference.Typology.GrammaticalizationStage.noMarking
- Phenomena.Reference.Typology.DefiniteArticleType.demonstrativeUsed.stage = Phenomena.Reference.Typology.GrammaticalizationStage.demonstrative
- Phenomena.Reference.Typology.DefiniteArticleType.definiteWord.stage = Phenomena.Reference.Typology.GrammaticalizationStage.definiteWord
- Phenomena.Reference.Typology.DefiniteArticleType.definiteAffix.stage = Phenomena.Reference.Typology.GrammaticalizationStage.definiteAffix
Instances For
The stages form a total order (higher = more grammaticalized).
Equations
Instances For
All four stages of the grammaticalization cline are attested in our 16-language sample.