Documentation

Linglib.Phenomena.Modality.Typology

Cross-Linguistic Typology of Modality and Evidentiality (WALS Chapters 74--78) #

@cite{aikhenvald-2004} @cite{de-haan-2013} @cite{vanbogaert-2013} @cite{deandradedehaanValenzuela-2013}

Cross-linguistic data on modality and evidentiality from the World Atlas of Language Structures, covering five parameters:

Cross-linguistic data on grammatical evidentiality, covering two parameters:

Key findings #

@cite{de-haan-2013} observes that evidentiality is areally concentrated: it is pervasive in the Americas (especially the Andes and Amazonia), common across Central and Inner Asia (Tibetan, Turkic), and well-attested in the Balkans and Caucasus. In other parts of the world --- most of Africa, most of Western Europe, most of East Asia --- grammatical evidentials are absent. When present, evidentials are overwhelmingly verbal affixes; particles and clitics are comparatively rare. Systems with three or more evidential choices always include direct evidence as a grammaticalized category.

WALS Ch 77: How many evidential distinctions a language grammaticalizes.

Four values on a scale of increasing complexity: (1) No grammatical evidentials: evidential source is conveyed lexically or pragmatically, never by obligatory morphology. (2) Indirect evidential only: the language has a single evidential marker indicating indirect (reported, inferred, or both) information source, but no dedicated marker for direct evidence. (3) Two-choice system (direct vs indirect): the language distinguishes direct evidence (visual/sensory witness) from indirect evidence (reportative, inferential, or both). (4) Three-or-more-choice system: the language distinguishes at least direct, reportative, and inferential evidence as separate categories. May include further distinctions (visual vs nonvisual, firsthand vs secondhand report, assumption vs inference from results).

  • noGrammatical : EvidentialSystem

    No grammatical evidentials. Evidential source may be conveyed by lexical adverbs ("apparently", "reportedly") or pragmatic inference, but is never obligatorily encoded in verbal morphology. (e.g., English, French, Mandarin, German)

  • indirectOnly : EvidentialSystem

    Indirect evidential only. A single marker indicates that the speaker's information comes from a non-direct source (inference, report, or both), with no dedicated direct-evidence marker. (e.g., Georgian, Tajik, West Greenlandic)

  • directAndIndirect : EvidentialSystem

    Two-choice system: direct vs indirect evidence. The language obligatorily distinguishes firsthand sensory witness from all other information sources. (e.g., Turkish, Bulgarian, Tibetan, Abkhaz)

  • threeOrMore : EvidentialSystem

    Three or more evidential choices. The language distinguishes at least direct, reportative, and inferential as separate grammatical categories. May include further splits. (e.g., Quechua, Tuyuca, Kashaya, Aymara)

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      WALS Ch 78: How evidentiality is morphologically expressed.

      Only applicable to languages that HAVE grammatical evidentials. Four coding strategies: (1) Verbal affix: evidential is a bound morpheme on the verb. (2) Clitic: evidential is a clitic (phrasal affix, not bound to verb). (3) Modal particle: evidential is a free-standing particle. (4) Part of the TAM system: evidential distinctions are fused with tense-aspect-mood marking and cannot be separated.

      • verbalAffix : EvidentialCoding

        Evidential is a verbal affix or clitic (bound morpheme). The dominant strategy worldwide (131/418 languages in WALS Ch 78). (e.g., Quechua ‑mi, ‑si, ‑chá; Turkish ‑mIş; Tuyuca verbal suffixes)

      • clitic : EvidentialCoding

        Evidential is a clitic (phrasal-level bound morpheme, not specific to the verb). WALS Ch 78 groups this with verbal affixes. (e.g., Tsafiki =ti, Kham =re)

      • particle : EvidentialCoding

        Evidential is a free separate particle. (65/418 in WALS Ch 78). (e.g., Lhasa Tibetan 'dug, Kalmyk gej)

      • partOfTAM : EvidentialCoding

        Evidential distinctions are fused into the tense-aspect-mood paradigm and cannot be isolated as a separate morpheme. (e.g., Bulgarian, Georgian, Abkhaz, some Turkic languages)

      • notApplicable : EvidentialCoding

        Not applicable: language has no grammatical evidentials (Ch 77 value 1). Used for cross-chapter profile consistency.

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          A single row in a WALS frequency table: a category label and its count.

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                Chapter 77 distribution: semantic distinctions of evidentiality (N = 418).

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Chapter 78 distribution: coding of evidentiality (N = 418). Both chapters 77 and 78 cover the same 418-language sample.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For
                    theorem Phenomena.Modality.Typology.ch78_wals_affix_dominant :
                    (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.separateParticle) Phenomena.Modality.Typology.ch78✝¹).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝²).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.partOfTheTenseSystem) Phenomena.Modality.Typology.ch78✝³).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝⁴).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.modalMorpheme) Phenomena.Modality.Typology.ch78✝⁵).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝⁶).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.mixed) Phenomena.Modality.Typology.ch78✝⁷).length

                    Ch 78 (WALS): Verbal affix/clitic is the most common coding strategy among languages with evidentials.

                    A language's evidentiality profile across WALS Chapters 77--78.

                    Instances For
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For
                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          English (Indo-European, Germanic). No grammatical evidentials. Evidential source is conveyed lexically by adverbs like "apparently", "reportedly", "evidently", or by hedging expressions like "I hear that...", "it seems that...". None of these are obligatory or part of the verbal paradigm.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            French (Indo-European, Romance). No grammatical evidentials. The conditional tense can convey reportative meaning in journalistic French ("le president serait malade" — 'the president is reportedly sick'), but this is not a dedicated evidential marker; it is a secondary use of the conditional.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              German (Indo-European, Germanic). No grammatical evidentials. The modal verbs "sollen" (reportative) and "wollen" (self-report) have evidential-like uses but are full modal verbs, not grammaticalized evidential markers.

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                Mandarin Chinese (Sino-Tibetan). No grammatical evidentials. Evidential source is conveyed by lexical items such as "tinshuo" (听说, 'I hear that'), "juede" (觉得, 'I feel that'), or sentence-final particles like "ba" (吧) for tentativeness.

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  Japanese (Japonic). No grammatical evidentials in the strict sense. The hearsay particle "soo da" (そうだ) and inferential "rashii" (らしい) have evidential-like functions but are analyzed as modal rather than evidential morphology by @cite{de-haan-2013}. WALS classifies Japanese as lacking grammatical evidentials.

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    Korean (Koreanic). No grammatical evidentials. Korean has evidential-like constructions (e.g., "-deo-" retrospective, "-da-" reported speech) but these are not classified as grammaticalized evidentials in WALS.

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Turkish (Turkic). Two-choice evidential system: direct vs indirect. The past tense paradigm contrasts direct-evidence past (-DI, witnessed) with indirect-evidence past (-mIş, inferred or reported). The -mIş suffix is the best-known example of an indirect evidential in a major language. The distinction is obligatory in past-tense contexts. Coded as part of the TAM system (evidentiality is fused with past tense).

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        Bulgarian (Indo-European, Slavic). Two-choice evidential system: direct (witnessed) vs indirect (reported, nonwitnessed). Bulgarian is the best-known European language with grammatical evidentials. The distinction is marked by a contrast between the aorist (direct/witnessed) and a separate evidential paradigm (indirect/nonwitnessed). Fused with the TAM system.

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Tibetan (Sino-Tibetan, Tibeto-Burman). Two-choice evidential system: direct (egophoric/sensory) vs indirect. Lhasa Tibetan uses the copula/auxiliary contrast: "red" and "yod" for personal knowledge/direct evidence, "yin" and "'dug" for indirect/new information. The evidential markers are particles/auxiliaries.

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Georgian (Kartvelian). Indirect evidential only. Georgian has an evidential perfect (the "I screeve") that marks the proposition as based on inference or report, but has no dedicated direct-evidence marker. The evidential distinction is fused with the TAM system (part of the verbal screeve paradigm).

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              Quechua (Cuzco) (Quechuan). Three-or-more-choice system: direct (‑mi, ‑n), reportative (‑si, ‑s), and conjectural (‑chá). The three enclitics are obligatory on finite clauses and encode the speaker's information source. Quechua is one of the canonical examples of a three-way evidential system. Coded as verbal affixes (enclitics on the verb or predicate).

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Aymara (Aymaran). Three-or-more-choice system: direct/personal knowledge, reportative, and non-personal knowledge (inferential). Like Quechua, Aymara has obligatory evidential suffixes marking information source. Coded as verbal affixes.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  Tuyuca (Tucanoan). Three-or-more-choice system with one of the richest evidential inventories known: five evidential categories --- visual, nonvisual sensory, apparent (inferential), secondhand (reported), and assumed. All five are obligatorily encoded as verbal suffixes. @cite{barnes-1984} is the classic description. Coded as verbal affixes.

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    Kashaya (Pomoan). Three-or-more-choice system: performative/factual (direct), visual, auditory, inferential, and reportative. Coded as verbal suffixes. Kashaya is notable for distinguishing visual from auditory direct evidence. @cite{oswalt-1986} is the primary source.

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      Tariana (Arawakan). Three-or-more-choice system with five evidential categories: visual, nonvisual, inferred, assumed, and reported. Like Tuyuca, Tariana has a five-way system. It is spoken in the multilingual Vaupés area of Brazil where elaborate evidential systems are an areal feature. Verbal affixes.

                                                      Equations
                                                      • One or more equations did not get rendered due to their size.
                                                      Instances For

                                                        West Greenlandic (Eskimo-Aleut). Indirect evidential only. West Greenlandic has an inferential mood (expressed by verbal suffixes) but no grammaticalized direct-evidence marker. The speaker uses the inferential when the proposition is based on reasoning from observable effects.

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          Abkhaz (Northwest Caucasian). Two-choice system: direct (witnessed) vs indirect (nonwitnessed/reported). The evidential distinction is part of the complex verbal morphology and is fused with tense-aspect marking.

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            Finnish (Uralic). No grammatical evidentiality system. Finnish has modal verbs (voida 'can', täytyä 'must', saattaa 'may') but evidential meanings are expressed lexically, not as part of obligatory verbal morphology.

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              All language profiles in the sample.

                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                Does a language have grammatical evidentials?

                                                                Equations
                                                                Instances For

                                                                  Does a language have a direct evidence category?

                                                                  Equations
                                                                  Instances For

                                                                    Count of languages in the sample with a given system type.

                                                                    Equations
                                                                    Instances For

                                                                      Count of languages in the sample with a given coding type.

                                                                      Equations
                                                                      Instances For

                                                                        Number of languages in our sample.

                                                                        In our sample, over a third of languages lack grammatical evidentials (7 out of 18). The sample deliberately overrepresents languages with evidentials for typological diversity.

                                                                        theorem Phenomena.Modality.Typology.verbal_affix_dominant :
                                                                        (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.separateParticle) Phenomena.Modality.Typology.ch78✝¹).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝²).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.partOfTheTenseSystem) Phenomena.Modality.Typology.ch78✝³).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝⁴).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.modalMorpheme) Phenomena.Modality.Typology.ch78✝⁵).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝⁶).length > (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.mixed) Phenomena.Modality.Typology.ch78✝⁷).length

                                                                        Ch 78: Verbal affix or clitic (131/418) is the most common way to encode evidentiality among languages that have it.

                                                                        theorem Phenomena.Modality.Typology.modal_morpheme_rarest :
                                                                        (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.modalMorpheme) Phenomena.Modality.Typology.ch78✝).length < (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.mixed) Phenomena.Modality.Typology.ch78✝¹).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.modalMorpheme) Phenomena.Modality.Typology.ch78✝²).length < (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.partOfTheTenseSystem) Phenomena.Modality.Typology.ch78✝³).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.modalMorpheme) Phenomena.Modality.Typology.ch78✝⁴).length < (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.separateParticle) Phenomena.Modality.Typology.ch78✝⁵).length (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.modalMorpheme) Phenomena.Modality.Typology.ch78✝⁶).length < (List.filter (fun (x : Core.WALS.Datapoint Core.WALS.F78A.EvidentialityCoding) => x.value == Core.WALS.F78A.EvidentialityCoding.verbalAffixOrClitic) Phenomena.Modality.Typology.ch78✝⁷).length

                                                                        Ch 78: Modal morpheme is the rarest evidential coding strategy (7/418).

                                                                        Languages with three or more evidential choices always include a direct evidence category. This follows from the definition: three-choice systems distinguish direct, reportative, and inferential. No language is known to have three evidential categories without including direct evidence.

                                                                        In our sample, every three-or-more language has a direct category.

                                                                        The converse does not hold: two-choice systems also have direct evidence. In fact, in our sample, every language with direct evidence has either a two-choice or three-or-more system.

                                                                        Evidentiality fused with the TAM system is characteristic of the Balkans and Caucasus. In our sample, Turkish, Bulgarian, Georgian, and Abkhaz all use TAM-fused evidentials.

                                                                        Quechua and Aymara, the two major Andean language families, both have three-or-more-choice evidential systems coded as verbal affixes. This is a well-known areal feature of the Andes.

                                                                        The Vaupés-Amazonian area has some of the richest evidential systems. Both Tuyuca and Tariana (from different families but in contact in the Vaupés) have three-or-more evidential categories with five distinctions. This suggests areal diffusion of complex evidential systems.

                                                                        Ch 77: Indirect-only systems (38 languages) are the least common type among languages WITH evidentials (vs 71 two-choice and 28 three-choice). These are languages that only mark non-direct evidence, leaving direct evidence unmarked.

                                                                        In our sample, exactly 2 languages have indirect-only systems.

                                                                        In our sample, all languages with three-or-more evidential choices use verbal affixes as their coding strategy. This is consistent with the cross-linguistic generalization that complex evidential systems tend to use morphologically integrated (affixal) coding.

                                                                        In our sample, the three Western European languages (English, French, German) all lack grammatical evidentials. This is consistent with the broader pattern: grammatical evidentials are essentially absent from Western Europe (the Balkan Sprachbund is the notable exception).

                                                                        Equations
                                                                        Instances For

                                                                          In our sample, every language without grammatical evidentials (Ch 77) has a notApplicable coding (Ch 78).

                                                                          In our sample, every language WITH grammatical evidentials has a real (non-notApplicable) coding strategy.

                                                                          The system and coding fields are consistent: the set of languages with notApplicable coding is exactly the set with noGrammatical system.

                                                                          Languages with direct evidence marking in our sample.

                                                                          The evidential complexity hierarchy: more evidential categories imply at least as many categories as simpler systems. In our sample:

                                                                          threeOrMore.numChoices > directAndIndirect.numChoices > indirectOnly.numChoices > noGrammatical.numChoices

                                                                          In our sample, every language with a three-or-more system also has a direct evidence category (entailed by the type definition, but worth verifying against the data).

                                                                          In our sample, every language with a two-choice system also has a direct evidence category (the two choices are direct vs indirect).

                                                                          East Asian languages in our sample (Mandarin, Japanese, Korean) all lack grammatical evidentials. This is consistent with the broader pattern that East Asia is an evidential-free zone.

                                                                          Equations
                                                                          Instances For

                                                                            Americas languages in our sample (Quechua, Aymara, Tuyuca, Kashaya, Tariana) all have three-or-more evidential categories. The Americas have the highest density of complex evidential systems worldwide.

                                                                            Equations
                                                                            • One or more equations did not get rendered due to their size.
                                                                            Instances For

                                                                              All Americas languages in our sample use verbal affixes.

                                                                              Deontic necessity is not universally split into strong and weak #

                                                                              Narrog (2010, 2012; cited in @cite{rubinstein-2014} Table 1) surveys 200 genealogically diverse languages for grammaticalized deontic necessity. The sample reveals that weak deontic necessity is rarer than strong: only 62 of 200 languages (31%) grammaticalize it. See Rubinstein2014.lean for the full typological data and implications for the comparative analysis of weak necessity.

                                                                              Data imported from Core.Modality.DeonticNecessity.

                                                                              Strong deontic necessity (60 languages) is slightly less common than weak (62), showing that the strong/weak split itself is not universal.