Documentation

Linglib.Phenomena.Reference.Typology

Cross-Linguistic Typology of Articles and Demonstratives (WALS) #

@cite{bhat-2013} @cite{greenberg-1978} @cite{himmelmann-1997} @cite{dryer-haspelmath-2013}

Typological data on definiteness marking, indefinite articles, and demonstrative systems across languages, drawn from five chapters of the World Atlas of Language Structures:

Key Generalizations #

  1. Two-way demonstrative distance systems (proximal/distal) are the most common cross-linguistically (54.3%), followed by three-way systems (37.6%).
  2. Languages with definite articles tend to also have indefinite articles, but the reverse is not true: 41 languages have only an indefinite article with no definite article.
  3. In a majority of languages (125 of 225), third-person pronouns show some relationship to demonstratives -- the diachronic pathway demonstrative -> 3rd-person pronoun is well attested.
  4. The grammaticalization cline demonstrative -> definite article -> definite affix is a well-established diachronic pathway.

Definite article type (WALS Ch 37, @cite{dryer-haspelmath-2013}).

Classifies languages by how (or whether) they mark definiteness on nouns. The categories are ordered along a grammaticalization cline: demonstrative -> definite word -> definite affix.

  • definiteWord : DefiniteArticleType

    Definite word distinct from demonstratives (e.g. English "the").

  • definiteAffix : DefiniteArticleType

    Definite affix on the noun (e.g. Danish "-en", Arabic "al-").

  • demonstrativeUsed : DefiniteArticleType

    No dedicated definite article; a demonstrative is used for definiteness (e.g. Ojibwa, Swahili).

  • noDefButIndef : DefiniteArticleType

    No definite article, but language has an indefinite article.

  • noArticle : DefiniteArticleType

    Neither definite nor indefinite article.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Indefinite article type (WALS Ch 38, @cite{dryer-haspelmath-2013}).

      Languages either have a dedicated indefinite word (distinct from 'one'), use the numeral 'one' as an indefinite marker (the most common grammaticalization path), have an indefinite affix, or lack indefinite articles entirely.

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Number of distance contrasts in adnominal demonstratives (WALS Ch 41, @cite{diessel-2013}).

          Two-way systems (proximal/distal) are by far the most common (54.3%), followed by three-way systems (37.6%). Systems with four or more distinctions are rare (< 6%).

          Three-way systems subdivide into distance-oriented (proximal/medial/distal, about 2/3) and person-oriented (near speaker / near hearer / away from both, about 1/3). Japanese ko/so/a is the canonical person-oriented example.

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Whether a three-way demonstrative system is distance-oriented or person-oriented.

              In a distance-oriented system (e.g. Hunzib), all three terms indicate relative distance from the speaker. In a person-oriented system (e.g. Japanese), one term specifically denotes proximity to the hearer.

              @cite{diessel-2013} notes that about 2/3 of three-way systems are distance-oriented and about 1/3 are person-oriented.

              • distanceOriented : DemOrientationType

                All terms encode distance from speaker (proximal/medial/distal).

              • personOriented : DemOrientationType

                One term encodes proximity to the hearer (near-speaker/near-hearer/distal).

              • notApplicable : DemOrientationType

                Not applicable (system is not three-way).

              Instances For
                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Relationship between pronominal and adnominal demonstratives (WALS Ch 42, @cite{diessel-2013}).

                  English uses the same forms ("this book" / "I like this"); French uses different stems (adnominal "ce"/"cette" vs pronominal "celui"/"celle"); Turkish uses the same stems but different inflectional features.

                  • sameForms : DemFormRelation

                    Same forms for pronominal and adnominal use (e.g. English).

                  • differentStems : DemFormRelation

                    Different stems (e.g. French ce/celui, Korean i + defective noun).

                  • differentInflection : DemFormRelation

                    Same stems but different inflectional features (e.g. Turkish).

                  Instances For
                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      Relationship between third-person pronouns and demonstratives (WALS Ch 43, @cite{bhat-2013}).

                      In "two-person languages" (Bhat's term), 3rd-person pronouns are related to demonstratives -- the pronoun is either identical to a demonstrative or derived from one. In "three-person languages", 3rd-person pronouns form part of the person paradigm and are unrelated to demonstratives.

                      The majority of the world's languages (125/225 = 55.6%) show some relationship, supporting the diachronic pathway demonstrative -> 3rd-person pronoun.

                      • unrelated : PronounDemRelation

                        3rd-person pronouns unrelated to demonstratives (e.g. Ainu, Polish).

                      • relatedAll : PronounDemRelation

                        Related to all demonstratives (e.g. Basque, where any demonstrative can serve as 3rd-person pronoun).

                      • relatedRemote : PronounDemRelation

                        Related specifically to remote/distal demonstratives (e.g. Eastern Armenian: 3sg "na" = distal "na").

                      • relatedNonRemote : PronounDemRelation

                        Related specifically to non-remote (proximal or medial) demonstratives (e.g. Brahui: 3sg = medial demonstrative).

                      • relatedGender : PronounDemRelation

                        Related via shared gender/noun-class markers (e.g. Venda: both show 21-class distinctions).

                      • relatedNonhuman : PronounDemRelation

                        Demonstratives used as 3rd-person pronouns for nonhuman reference only (e.g. Jaqaru: 3sg "upa" for humans, demonstratives for nonhumans).

                      Instances For
                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          Whether 3rd-person pronouns show ANY relationship to demonstratives (Bhat's "two-person" vs "three-person" distinction).

                          Equations
                          Instances For

                            A language's article and demonstrative profile across all five WALS chapters.

                            Not all chapters have data for every language (WALS samples vary by chapter), so each field is optional.

                            Instances For
                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                English (Indo-European, Germanic). Definite article "the" distinct from demonstratives "this"/"that". Indefinite article "a/an" distinct from numeral "one". Two-way demonstrative distance: this (proximal) vs that (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns ("he/she/it") unrelated to demonstratives.

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  French (Indo-European, Romance). Definite article "le/la/les" distinct from demonstratives. Indefinite article "un/une" historically from numeral 'one' but now distinct (WALS codes French as indefinite word distinct from 'one'). Two-way demonstrative distance: ce N-ci (proximal) vs ce N-la (distal), though adnominal "ce" alone is distance-neutral. Different stems for pronominal ("celui/celle") vs adnominal ("ce/cette"). 3rd-person pronouns ("il/elle") unrelated to demonstratives.

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    German (Indo-European, Germanic). Definite article "der/die/das" distinct from demonstratives. Indefinite article "ein" = numeral 'one' (phonological reduction in speech). Distance-neutral adnominal demonstratives: "dieser" and stressed "der/die/das" are deictically noncontrastive; distance expressed by adding adverbial "hier"/"da". Classified as no-contrast in WALS Ch 41. Different inflectional features: pronominal demonstratives inflect for case while adnominal demonstratives co-occur with inflected nouns. 3rd-person pronouns ("er/sie/es") unrelated to demonstratives.

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Japanese (Japonic). No definite or indefinite articles. Three-way person-oriented demonstrative system: ko- (near speaker), so- (near hearer), a- (away from both). The canonical person-oriented system. Different stems: adnominal kono/sono/ano vs pronominal kore/sore/are. 3rd-person pronouns ("kare/kanojo") unrelated to demonstratives (borrowed from Classical Chinese / relatively recent innovations).

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        Mandarin Chinese (Sino-Tibetan). No definite or indefinite articles (bare nouns are ambiguous for definiteness). Two-way demonstrative distance: zhe (proximal) vs na (distal). Same forms for pronominal and adnominal demonstratives (with optional classifier in adnominal use). 3rd-person pronoun "ta" unrelated to demonstratives.

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Turkish (Turkic). No definite article; indefinite article "bir" = numeral 'one' (different NP position when used as article vs numeral, per @cite{kornfilt-1997}). Two-way demonstrative distance: bu (proximal) vs o (distal), with su as a restricted medial form. WALS codes as two-way. Different inflectional features: pronominal demonstratives inflect for case and number, adnominal demonstratives are uninflected particles. 3rd-person pronoun "o" identical to distal demonstrative.

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Arabic (Egyptian) (Afro-Asiatic, Semitic). Definite prefix "al-" on nouns (definite affix). No indefinite article (unmarked nouns are indefinite, though tanwin in Standard Arabic marks indefiniteness). Two-way demonstrative distance: hada (proximal) vs daak (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "huwa/hiya" unrelated to demonstratives.

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              Finnish (Uralic). No definite or indefinite articles. Two-way demonstrative distance: tama (proximal) vs tuo/se (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "han" (human) / "se" (nonhuman) -- "se" is identical to the distal demonstrative.

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Hungarian (Uralic). Definite article "a/az" distinct from demonstratives. Indefinite article "egy" = numeral 'one'. Two-way demonstrative distance: ez (proximal) vs az (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "o" unrelated to demonstratives.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  Russian (Indo-European, Slavic). No definite or indefinite articles. Two-way demonstrative distance: etot (proximal) vs tot (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns ("on/ona/ono") unrelated to demonstratives.

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    Swahili (Niger-Congo, Bantu). Demonstrative used as definite marker (precedes noun for definiteness, follows noun for deictic use; WALS Ch 37 value 2). No indefinite article. Three-way demonstrative distance: h- (proximal) / h-o (medial) / -le (distal). Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns related via shared noun-class agreement prefixes (gender-marker relationship).

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      Tagalog (Austronesian, Philippine). Definite/indefinite distinction via case markers: "ang" (definite-like topic marker) vs "ng" (indefinite-like). WALS codes as definite word distinct from demonstratives. Two-way demonstrative distance: ito (proximal) vs iyon (distal), with middle term "iyan" sometimes yielding a three-way analysis. WALS codes as three-way. Same forms for pronominal and adnominal demonstratives. 3rd-person pronoun "siya" unrelated to demonstratives.

                                                      Equations
                                                      • One or more equations did not get rendered due to their size.
                                                      Instances For

                                                        Latin (Indo-European, Italic). No definite or indefinite articles (bare nouns are ambiguous). Three-way distance-oriented demonstrative system: hic (proximal), iste (medial), ille (distal). This is the textbook three-way distance-oriented system. Same forms for pronominal and adnominal demonstratives (though with full case/gender/number inflection in both uses). 3rd-person pronoun "is/ea/id" distinct from but historically related to demonstratives (related to all demonstratives via shared inflection patterns).

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          Korean (Koreanic). No definite or indefinite articles (topic marker "-un"/"-nun" sometimes conveys definiteness pragmatically). Three-way person-oriented demonstrative system: i (near speaker), ku (near hearer), ce (away from both). Different stems: pronominal demonstratives formed by combining i/ku/ce with a "defective noun" like "il" (thing/fact), giving "i-il", "ku-il", etc. (@cite{sohn-1994}: 295, @cite{diessel-2013}). 3rd-person pronoun "ku" related to medial demonstrative "ku".

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            Danish (Indo-European, Germanic). Definite suffix "-en"/"-et" on nouns (definite affix); separate definite article "den/det" used when an adjective is present. Indefinite article "en/et" = numeral 'one'. Two-way demonstrative distance: denne (proximal) vs den (distal). Different inflectional features between pronominal and adnominal uses. 3rd-person pronouns ("han/hun/den/det") -- "den/det" related to demonstratives.

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              Hausa (Afro-Asiatic, Chadic). No definite article; no standard indefinite article (WALS Ch 37 value 5). Four-way person-oriented demonstrative system: nan (near speaker), nan (near hearer, tonal difference), can (away from both), can (far away). Hausa is a key example of a four-or-more-way system. Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns related to demonstratives.

                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                Basque (language isolate). Definite suffix "-a"/"-ak" (definite affix). No indefinite article (bare nouns are indefinite). Two-way demonstrative distance: hau (proximal) vs hura (distal), with hori (medial) yielding a three-way system. Same forms for pronominal and adnominal demonstratives. 3rd-person pronouns: hau/hori/hura function as both demonstratives and 3rd-person pronouns (@cite{saltarelli-etal-1988}: 213).

                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For

                                                                  All 16 language profiles.

                                                                  Equations
                                                                  • One or more equations did not get rendered due to their size.
                                                                  Instances For

                                                                    WALS Ch 37: Definite article distribution across 566 languages.

                                                                    • definiteWord : Nat
                                                                    • demonstrativeUsed : Nat
                                                                    • definiteAffix : Nat
                                                                    • noDefButIndef : Nat
                                                                    • noArticle : Nat
                                                                    Instances For
                                                                      Equations
                                                                      • One or more equations did not get rendered due to their size.
                                                                      Instances For

                                                                        WALS Ch 37 distribution (@cite{dryer-haspelmath-2013}, n = 566).

                                                                        Equations
                                                                        Instances For

                                                                          WALS Ch 38: Indefinite article distribution across 473 languages.

                                                                          • indefiniteWord : Nat
                                                                          • numeralOne : Nat
                                                                          • indefiniteAffix : Nat
                                                                          • noIndefButDef : Nat
                                                                          • noArticle : Nat
                                                                          Instances For
                                                                            Equations
                                                                            • One or more equations did not get rendered due to their size.
                                                                            Instances For

                                                                              WALS Ch 38 distribution (@cite{dryer-haspelmath-2013}, n = 473).

                                                                              Equations
                                                                              Instances For

                                                                                WALS Ch 41: Demonstrative distance contrasts across 234 languages.

                                                                                Instances For
                                                                                  Equations
                                                                                  • One or more equations did not get rendered due to their size.
                                                                                  Instances For

                                                                                    WALS Ch 41 distribution (@cite{diessel-2013}, n = 234).

                                                                                    Equations
                                                                                    Instances For

                                                                                      WALS Ch 42: Pronominal/adnominal demonstrative form across 201 languages.

                                                                                      • sameForms : Nat
                                                                                      • differentStems : Nat
                                                                                      • differentInflection : Nat
                                                                                      Instances For
                                                                                        Equations
                                                                                        • One or more equations did not get rendered due to their size.
                                                                                        Instances For

                                                                                          WALS Ch 42 distribution (@cite{diessel-2013}, n = 201).

                                                                                          Equations
                                                                                          Instances For

                                                                                            WALS Ch 43: Third-person pronoun ~ demonstrative relationship across 225 languages.

                                                                                            • unrelated : Nat
                                                                                            • relatedAll : Nat
                                                                                            • relatedRemote : Nat
                                                                                            • relatedNonRemote : Nat
                                                                                            • relatedGender : Nat
                                                                                            • relatedNonhuman : Nat
                                                                                            Instances For
                                                                                              Equations
                                                                                              • One or more equations did not get rendered due to their size.
                                                                                              Instances For

                                                                                                Total count of languages where 3rd-person pronouns show any relationship to demonstratives.

                                                                                                Equations
                                                                                                Instances For

                                                                                                  WALS Ch 43 distribution (@cite{bhat-2013}, n = 225).

                                                                                                  Equations
                                                                                                  Instances For

                                                                                                    Ch 37 distribution counts derived from WALS generated data (n = 620).

                                                                                                    Equations
                                                                                                    • One or more equations did not get rendered due to their size.
                                                                                                    Instances For

                                                                                                      Ch 38 distribution counts derived from WALS generated data (n = 534).

                                                                                                      Equations
                                                                                                      • One or more equations did not get rendered due to their size.
                                                                                                      Instances For

                                                                                                        Ch 41 distribution counts derived from WALS generated data (n = 234).

                                                                                                        Equations
                                                                                                        • One or more equations did not get rendered due to their size.
                                                                                                        Instances For

                                                                                                          Ch 42 distribution counts derived from WALS generated data (n = 201).

                                                                                                          Equations
                                                                                                          • One or more equations did not get rendered due to their size.
                                                                                                          Instances For

                                                                                                            Ch 43 distribution counts derived from WALS generated data (n = 225).

                                                                                                            Equations
                                                                                                            • One or more equations did not get rendered due to their size.
                                                                                                            Instances For

                                                                                                              Two-way demonstrative systems (proximal/distal) are the most common type, accounting for 127 of 234 languages in the WALS sample (54.3%).

                                                                                                              @cite{diessel-2013}: "The vast majority of the world's languages employ two or three distance-marked demonstratives: 54.3 per cent of all languages shown on the map have adnominal demonstratives that express a two-way contrast."

                                                                                                              Two-way and three-way systems together account for over 90% of languages. One-way, four-way, and five-or-more-way systems together are under 10%.

                                                                                                              Languages with definite articles tend to also have indefinite articles. The evidence: 81 languages have a definite article but no indefinite article (Ch 38, value 4), compared to 41 languages that have an indefinite article but no definite article (Ch 37, value 4).

                                                                                                              This asymmetry suggests that definiteness marking is the typologically "prior" or more basic category: languages are more likely to grammaticalize definiteness without indefiniteness than vice versa.

                                                                                                              Note: This is a tendency, not an absolute universal. The 41 exceptions (indefinite without definite) are concentrated in Asia (Turkey to Iran) and New Guinea.

                                                                                                              Languages with some form of definite marking (word, affix, or demonstrative) outnumber those without. 337 of 566 languages (59.5%) have definite marking.

                                                                                                              The most common subtype of pronoun-demonstrative relationship is "related to all demonstratives" (52 languages), where any demonstrative can serve as a 3rd-person pronoun.

                                                                                                              In most languages (143 of 201 = 71.1%), pronominal and adnominal demonstratives have the same forms (@cite{diessel-2013}, Ch 42). Languages where adnominal demonstratives have different stems (37) or different inflectional features (21) are the minority.

                                                                                                              Languages with same-form demonstratives are especially prevalent in Australia (where no language in the sample differentiates the two uses) and North America (except Pacific Northwest).

                                                                                                              The grammaticalization cline: demonstrative -> definite article -> definite affix.

                                                                                                              This is supported by the WALS data:

                                                                                                              • 56 languages use demonstratives as definite markers (mid-cline)
                                                                                                              • 84 languages have definite affixes (end of cline)
                                                                                                              • 197 languages have definite words distinct from demonstratives (a stage where the article has diverged from the source demonstrative)

                                                                                                              The 56 "demonstrative used as definite marker" languages represent the transitional stage where this grammaticalization is actively underway.

                                                                                                              See @cite{greenberg-1978}, @cite{himmelmann-1997} for theoretical discussion.

                                                                                                              Among languages without dedicated definite articles, a substantial proportion (56 out of 285 article-less languages = 19.6%) use demonstratives to mark definiteness. This is the "demonstrative used as definite article" category from Ch 37.

                                                                                                              This confirms the typological observation that demonstratives are the natural source for definiteness marking: even languages that lack a grammaticalized definite article often use demonstratives in definite contexts more frequently than expected.

                                                                                                              Helper: does a profile have any definite marking?

                                                                                                              Equations
                                                                                                              • One or more equations did not get rendered due to their size.
                                                                                                              Instances For

                                                                                                                Helper: does a profile have any indefinite article?

                                                                                                                Equations
                                                                                                                • One or more equations did not get rendered due to their size.
                                                                                                                Instances For
                                                                                                                  theorem Phenomena.Reference.Typology.indef_implies_def_almost :
                                                                                                                  have withIndef := List.filter (fun (x : ArticleDemProfile) => x.hasIndefinite) allLanguages; have withIndefAndDef := List.filter (fun (x : ArticleDemProfile) => x.hasDefinite) withIndef; withIndef.length = 6 withIndefAndDef.length = 5

                                                                                                                  In our 16-language sample, all but one language with an indefinite article also has some form of definite marking (article, affix, or demonstrative used). The exception is Turkish, which has the numeral "bir" as an indefinite article but no dedicated definite article (WALS Ch 37, value 4).

                                                                                                                  This is consistent with the WALS aggregate data, which shows 41 languages with indefinite but no definite articles, concentrated in the Turkey-to-Iran belt and New Guinea.

                                                                                                                  Turkish is the one exception: indefinite article but no definite marking.

                                                                                                                  In our sample, languages with three-way or larger demonstrative systems tend to lack articles entirely. 5 of the 7 languages with 3+ distance contrasts have no definite or indefinite articles.

                                                                                                                  Equations
                                                                                                                  • One or more equations did not get rendered due to their size.
                                                                                                                  Instances For

                                                                                                                    Count of languages in our sample with three-way-or-more demonstratives.

                                                                                                                    8 of our 16 sample languages show some relationship between 3rd-person pronouns and demonstratives (Turkish, Finnish, Swahili, Latin, Korean, Danish, Hausa, Basque), consistent with Bhat's finding that a majority of languages worldwide (125 of 225) are "two-person languages."

                                                                                                                    Equations
                                                                                                                    Instances For

                                                                                                                      The grammaticalization hierarchy for definiteness marking, attested cross-linguistically:

                                                                                                                      Stage 0: No definiteness marking (bare nouns, e.g. Mandarin, Russian) Stage 1: Demonstrative used for definiteness (e.g. Swahili, Ojibwa) Stage 2: Definite word distinct from demonstrative (e.g. English, French) Stage 3: Definite affix (e.g. Danish, Arabic, Basque)

                                                                                                                      Each stage represents a further degree of grammaticalization: phonological reduction, semantic bleaching (loss of deictic content), and increased obligatoriness.

                                                                                                                      Instances For
                                                                                                                        Equations
                                                                                                                        • One or more equations did not get rendered due to their size.
                                                                                                                        Instances For