Cross-Linguistic Typology of Modality and Evidentiality (WALS Chapters 74--78) #
@cite{aikhenvald-2004} @cite{de-haan-2013} @cite{vanbogaert-2013} @cite{deandradedehaanValenzuela-2013}
Cross-linguistic data on modality and evidentiality from the World Atlas of Language Structures, covering five parameters:
Ch 74: Situational Possibility: How situational (root, dynamic) possibility ('can', 'be able to') is expressed --- verbal constructions, affixes on verbs, or other markers. Verbal constructions (modal verbs) are the dominant strategy (158/234 = 68%).
Ch 75: Epistemic Possibility: How epistemic possibility ('may', 'might', 'perhaps') is expressed. Unlike situational possibility, affixes on verbs (84/240 = 35%) and other strategies (91/240 = 38%) together outweigh verbal constructions (65/240 = 27%).
Ch 76: Overlap between Situational and Epistemic Modal Marking: Whether the same morpheme(s) express both situational and epistemic modality. Most languages show no overlap (105/207 = 51%), meaning they use distinct forms for root vs epistemic possibility. Some overlap for either possibility or necessity (66/207 = 32%), and fewer overlap for both (36/207 = 17%).
Cross-linguistic data on grammatical evidentiality, covering two parameters:
Ch 77: Semantic Distinctions of Evidentiality: How many and which evidential distinctions a language grammaticalizes. Evidentials encode the speaker's source of information for a proposition --- whether they witnessed it directly, inferred it from indirect evidence, or received it via report. Languages range from no grammatical evidentials at all (English, Mandarin) to systems with three or more obligatory distinctions (Tuyuca, Quechua). The majority of the world's languages (181/418 = 43%) lack grammatical evidentials entirely.
Ch 78: Coding of Evidentiality: How evidentiality is morphologically expressed. Six strategies: no grammatical evidentials, verbal affix or clitic (the dominant pattern among languages with evidentials, 131/418), part of the tense system, separate particle, modal morpheme, or mixed. Both chapters cover the same 418-language sample.
Key findings #
@cite{de-haan-2013} observes that evidentiality is areally concentrated: it is pervasive in the Americas (especially the Andes and Amazonia), common across Central and Inner Asia (Tibetan, Turkic), and well-attested in the Balkans and Caucasus. In other parts of the world --- most of Africa, most of Western Europe, most of East Asia --- grammatical evidentials are absent. When present, evidentials are overwhelmingly verbal affixes; particles and clitics are comparatively rare. Systems with three or more evidential choices always include direct evidence as a grammaticalized category.
WALS Ch 77: How many evidential distinctions a language grammaticalizes.
Four values on a scale of increasing complexity: (1) No grammatical evidentials: evidential source is conveyed lexically or pragmatically, never by obligatory morphology. (2) Indirect evidential only: the language has a single evidential marker indicating indirect (reported, inferred, or both) information source, but no dedicated marker for direct evidence. (3) Two-choice system (direct vs indirect): the language distinguishes direct evidence (visual/sensory witness) from indirect evidence (reportative, inferential, or both). (4) Three-or-more-choice system: the language distinguishes at least direct, reportative, and inferential evidence as separate categories. May include further distinctions (visual vs nonvisual, firsthand vs secondhand report, assumption vs inference from results).
- noGrammatical : EvidentialSystem
No grammatical evidentials. Evidential source may be conveyed by lexical adverbs ("apparently", "reportedly") or pragmatic inference, but is never obligatorily encoded in verbal morphology. (e.g., English, French, Mandarin, German)
- indirectOnly : EvidentialSystem
Indirect evidential only. A single marker indicates that the speaker's information comes from a non-direct source (inference, report, or both), with no dedicated direct-evidence marker. (e.g., Georgian, Tajik, West Greenlandic)
- directAndIndirect : EvidentialSystem
Two-choice system: direct vs indirect evidence. The language obligatorily distinguishes firsthand sensory witness from all other information sources. (e.g., Turkish, Bulgarian, Tibetan, Abkhaz)
- threeOrMore : EvidentialSystem
Three or more evidential choices. The language distinguishes at least direct, reportative, and inferential as separate grammatical categories. May include further splits. (e.g., Quechua, Tuyuca, Kashaya, Aymara)
Instances For
Equations
- Phenomena.Modality.Typology.instBEqEvidentialSystem.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether a language has any grammatical evidential marking.
Equations
- Phenomena.Modality.Typology.EvidentialSystem.noGrammatical.hasEvidentials = false
- Phenomena.Modality.Typology.EvidentialSystem.indirectOnly.hasEvidentials = true
- Phenomena.Modality.Typology.EvidentialSystem.directAndIndirect.hasEvidentials = true
- Phenomena.Modality.Typology.EvidentialSystem.threeOrMore.hasEvidentials = true
Instances For
Whether a language grammaticalizes a direct evidence category.
Equations
- Phenomena.Modality.Typology.EvidentialSystem.directAndIndirect.hasDirect = true
- Phenomena.Modality.Typology.EvidentialSystem.threeOrMore.hasDirect = true
- Phenomena.Modality.Typology.EvidentialSystem.noGrammatical.hasDirect = false
- Phenomena.Modality.Typology.EvidentialSystem.indirectOnly.hasDirect = false
Instances For
Number of evidential choices in the system (0, 1, 2, or 3+).
Equations
Instances For
WALS Ch 78: How evidentiality is morphologically expressed.
Only applicable to languages that HAVE grammatical evidentials. Four coding strategies: (1) Verbal affix: evidential is a bound morpheme on the verb. (2) Clitic: evidential is a clitic (phrasal affix, not bound to verb). (3) Modal particle: evidential is a free-standing particle. (4) Part of the TAM system: evidential distinctions are fused with tense-aspect-mood marking and cannot be separated.
- verbalAffix : EvidentialCoding
Evidential is a verbal affix or clitic (bound morpheme). The dominant strategy worldwide (131/418 languages in WALS Ch 78). (e.g., Quechua ‑mi, ‑si, ‑chá; Turkish ‑mIş; Tuyuca verbal suffixes)
- clitic : EvidentialCoding
Evidential is a clitic (phrasal-level bound morpheme, not specific to the verb). WALS Ch 78 groups this with verbal affixes. (e.g., Tsafiki =ti, Kham =re)
- particle : EvidentialCoding
Evidential is a free separate particle. (65/418 in WALS Ch 78). (e.g., Lhasa Tibetan 'dug, Kalmyk gej)
- partOfTAM : EvidentialCoding
Evidential distinctions are fused into the tense-aspect-mood paradigm and cannot be isolated as a separate morpheme. (e.g., Bulgarian, Georgian, Abkhaz, some Turkic languages)
- notApplicable : EvidentialCoding
Not applicable: language has no grammatical evidentials (Ch 77 value 1). Used for cross-chapter profile consistency.
Instances For
Equations
- Phenomena.Modality.Typology.instBEqEvidentialCoding.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether the coding strategy involves a bound morpheme (affix or clitic).
Equations
- Phenomena.Modality.Typology.EvidentialCoding.verbalAffix.isBound = true
- Phenomena.Modality.Typology.EvidentialCoding.clitic.isBound = true
- Phenomena.Modality.Typology.EvidentialCoding.particle.isBound = false
- Phenomena.Modality.Typology.EvidentialCoding.partOfTAM.isBound = false
- Phenomena.Modality.Typology.EvidentialCoding.notApplicable.isBound = false
Instances For
A single row in a WALS frequency table: a category label and its count.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Sum of counts in a WALS table.
Equations
- Phenomena.Modality.Typology.WALSCount.totalOf cs = List.foldl (fun (acc : Nat) (c : Phenomena.Modality.Typology.WALSCount) => acc + c.count) 0 cs
Instances For
Chapter 77 distribution: semantic distinctions of evidentiality (N = 418).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Chapter 78 distribution: coding of evidentiality (N = 418). Both chapters 77 and 78 cover the same 418-language sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 77 total: 418 languages.
Ch 78 total: 418 languages.
Ch 77 and Ch 78 cover the same 418-language sample.
Ch 77 and Ch 78 use the same sample in WALS v2020.4.
Ch 74: Verbal constructions are the dominant strategy for situational possibility (158/234 = 68%).
Ch 75: The three coding strategies for epistemic possibility are more evenly distributed than for situational possibility.
Ch 76: Most languages show no overlap between situational and epistemic modal marking (105/207 = 51%).
Ch 77 (WALS): Languages without grammatical evidentials form the largest single category.
Ch 78 (WALS): Verbal affix/clitic is the most common coding strategy among languages with evidentials.
A language's evidentiality profile across WALS Chapters 77--78.
- language : String
Language name
- iso : String
ISO 639-3 code
- family : String
Language family
- system : EvidentialSystem
WALS Ch 77: evidential system type
- coding : EvidentialCoding
WALS Ch 78: coding strategy
Evidential marker forms (if applicable)
- notes : String
Notes on the evidential system
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Modality.Typology.instBEqEvidentialityProfile.beq x✝¹ x✝ = false
Instances For
English (Indo-European, Germanic). No grammatical evidentials. Evidential source is conveyed lexically by adverbs like "apparently", "reportedly", "evidently", or by hedging expressions like "I hear that...", "it seems that...". None of these are obligatory or part of the verbal paradigm.
Equations
- One or more equations did not get rendered due to their size.
Instances For
French (Indo-European, Romance). No grammatical evidentials. The conditional tense can convey reportative meaning in journalistic French ("le president serait malade" — 'the president is reportedly sick'), but this is not a dedicated evidential marker; it is a secondary use of the conditional.
Equations
- One or more equations did not get rendered due to their size.
Instances For
German (Indo-European, Germanic). No grammatical evidentials. The modal verbs "sollen" (reportative) and "wollen" (self-report) have evidential-like uses but are full modal verbs, not grammaticalized evidential markers.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mandarin Chinese (Sino-Tibetan). No grammatical evidentials. Evidential source is conveyed by lexical items such as "tinshuo" (听说, 'I hear that'), "juede" (觉得, 'I feel that'), or sentence-final particles like "ba" (吧) for tentativeness.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese (Japonic). No grammatical evidentials in the strict sense. The hearsay particle "soo da" (そうだ) and inferential "rashii" (らしい) have evidential-like functions but are analyzed as modal rather than evidential morphology by @cite{de-haan-2013}. WALS classifies Japanese as lacking grammatical evidentials.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Korean (Koreanic). No grammatical evidentials. Korean has evidential-like constructions (e.g., "-deo-" retrospective, "-da-" reported speech) but these are not classified as grammaticalized evidentials in WALS.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Turkish (Turkic). Two-choice evidential system: direct vs indirect. The past tense paradigm contrasts direct-evidence past (-DI, witnessed) with indirect-evidence past (-mIş, inferred or reported). The -mIş suffix is the best-known example of an indirect evidential in a major language. The distinction is obligatory in past-tense contexts. Coded as part of the TAM system (evidentiality is fused with past tense).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Bulgarian (Indo-European, Slavic). Two-choice evidential system: direct (witnessed) vs indirect (reported, nonwitnessed). Bulgarian is the best-known European language with grammatical evidentials. The distinction is marked by a contrast between the aorist (direct/witnessed) and a separate evidential paradigm (indirect/nonwitnessed). Fused with the TAM system.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tibetan (Sino-Tibetan, Tibeto-Burman). Two-choice evidential system: direct (egophoric/sensory) vs indirect. Lhasa Tibetan uses the copula/auxiliary contrast: "red" and "yod" for personal knowledge/direct evidence, "yin" and "'dug" for indirect/new information. The evidential markers are particles/auxiliaries.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Georgian (Kartvelian). Indirect evidential only. Georgian has an evidential perfect (the "I screeve") that marks the proposition as based on inference or report, but has no dedicated direct-evidence marker. The evidential distinction is fused with the TAM system (part of the verbal screeve paradigm).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Quechua (Cuzco) (Quechuan). Three-or-more-choice system: direct (‑mi, ‑n), reportative (‑si, ‑s), and conjectural (‑chá). The three enclitics are obligatory on finite clauses and encode the speaker's information source. Quechua is one of the canonical examples of a three-way evidential system. Coded as verbal affixes (enclitics on the verb or predicate).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Aymara (Aymaran). Three-or-more-choice system: direct/personal knowledge, reportative, and non-personal knowledge (inferential). Like Quechua, Aymara has obligatory evidential suffixes marking information source. Coded as verbal affixes.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tuyuca (Tucanoan). Three-or-more-choice system with one of the richest evidential inventories known: five evidential categories --- visual, nonvisual sensory, apparent (inferential), secondhand (reported), and assumed. All five are obligatorily encoded as verbal suffixes. @cite{barnes-1984} is the classic description. Coded as verbal affixes.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Kashaya (Pomoan). Three-or-more-choice system: performative/factual (direct), visual, auditory, inferential, and reportative. Coded as verbal suffixes. Kashaya is notable for distinguishing visual from auditory direct evidence. @cite{oswalt-1986} is the primary source.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tariana (Arawakan). Three-or-more-choice system with five evidential categories: visual, nonvisual, inferred, assumed, and reported. Like Tuyuca, Tariana has a five-way system. It is spoken in the multilingual Vaupés area of Brazil where elaborate evidential systems are an areal feature. Verbal affixes.
Equations
- One or more equations did not get rendered due to their size.
Instances For
West Greenlandic (Eskimo-Aleut). Indirect evidential only. West Greenlandic has an inferential mood (expressed by verbal suffixes) but no grammaticalized direct-evidence marker. The speaker uses the inferential when the proposition is based on reasoning from observable effects.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Abkhaz (Northwest Caucasian). Two-choice system: direct (witnessed) vs indirect (nonwitnessed/reported). The evidential distinction is part of the complex verbal morphology and is fused with tense-aspect marking.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish (Uralic). No grammatical evidentiality system. Finnish has modal verbs (voida 'can', täytyä 'must', saattaa 'may') but evidential meanings are expressed lexically, not as part of obligatory verbal morphology.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All language profiles in the sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Does a language have grammatical evidentials?
Equations
Instances For
Does a language have a direct evidence category?
Instances For
Count of languages in the sample with a given system type.
Equations
- Phenomena.Modality.Typology.countBySystem langs s = (List.filter (fun (x : Phenomena.Modality.Typology.EvidentialityProfile) => x.system == s) langs).length
Instances For
Count of languages in the sample with a given coding type.
Equations
- Phenomena.Modality.Typology.countByCoding langs c = (List.filter (fun (x : Phenomena.Modality.Typology.EvidentialityProfile) => x.coding == c) langs).length
Instances For
Number of languages in our sample.
Ch 77: The plurality of languages (181/418 = 43%) lack grammatical evidentials entirely. This is the single largest category.
Ch 77: Languages without grammatical evidentials do NOT outnumber all languages with evidentials combined (181 vs 166 + 71 = 237).
In our sample, over a third of languages lack grammatical evidentials (7 out of 18). The sample deliberately overrepresents languages with evidentials for typological diversity.
Ch 78: Verbal affix or clitic (131/418) is the most common way to encode evidentiality among languages that have it.
Ch 78: Among languages WITH evidentials, verbal affixes account for more than half of all coding strategies (131 out of 237).
Ch 78: Modal morpheme is the rarest evidential coding strategy (7/418).
Ch 77: Among languages with evidentials, indirect-only systems (166) are more common than direct-and-indirect systems (71).
Ch 77: Indirect-only systems are the most common type among languages that HAVE evidentials.
Languages with three or more evidential choices always include a direct evidence category. This follows from the definition: three-choice systems distinguish direct, reportative, and inferential. No language is known to have three evidential categories without including direct evidence.
In our sample, every three-or-more language has a direct category.
The converse does not hold: two-choice systems also have direct evidence. In fact, in our sample, every language with direct evidence has either a two-choice or three-or-more system.
Evidentiality fused with the TAM system is characteristic of the Balkans and Caucasus. In our sample, Turkish, Bulgarian, Georgian, and Abkhaz all use TAM-fused evidentials.
Separate particle is the second most common coding strategy after verbal affix or clitic (65/418 vs 131/418).
Quechua and Aymara, the two major Andean language families, both have three-or-more-choice evidential systems coded as verbal affixes. This is a well-known areal feature of the Andes.
The Vaupés-Amazonian area has some of the richest evidential systems. Both Tuyuca and Tariana (from different families but in contact in the Vaupés) have three-or-more evidential categories with five distinctions. This suggests areal diffusion of complex evidential systems.
Ch 77: Indirect-only systems (38 languages) are the least common type among languages WITH evidentials (vs 71 two-choice and 28 three-choice). These are languages that only mark non-direct evidence, leaving direct evidence unmarked.
In our sample, exactly 2 languages have indirect-only systems.
In our sample, all languages with three-or-more evidential choices use verbal affixes as their coding strategy. This is consistent with the cross-linguistic generalization that complex evidential systems tend to use morphologically integrated (affixal) coding.
In our sample, the three Western European languages (English, French, German) all lack grammatical evidentials. This is consistent with the broader pattern: grammatical evidentials are essentially absent from Western Europe (the Balkan Sprachbund is the notable exception).
Equations
Instances For
In our sample, every language without grammatical evidentials (Ch 77) has a notApplicable coding (Ch 78).
In our sample, every language WITH grammatical evidentials has a real (non-notApplicable) coding strategy.
The system and coding fields are consistent: the set of languages with notApplicable coding is exactly the set with noGrammatical system.
System type distribution in our sample.
Coding strategy distribution in our sample (excluding notApplicable).
Languages with evidentials in our sample.
Languages with direct evidence marking in our sample.
The evidential complexity hierarchy: more evidential categories imply at least as many categories as simpler systems. In our sample:
threeOrMore.numChoices > directAndIndirect.numChoices > indirectOnly.numChoices > noGrammatical.numChoices
In our sample, every language with a three-or-more system also has a direct evidence category (entailed by the type definition, but worth verifying against the data).
In our sample, every language with a two-choice system also has a direct evidence category (the two choices are direct vs indirect).
East Asian languages in our sample (Mandarin, Japanese, Korean) all lack grammatical evidentials. This is consistent with the broader pattern that East Asia is an evidential-free zone.
Equations
Instances For
Americas languages in our sample (Quechua, Aymara, Tuyuca, Kashaya, Tariana) all have three-or-more evidential categories. The Americas have the highest density of complex evidential systems worldwide.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All Americas languages in our sample use verbal affixes.
Deontic necessity is not universally split into strong and weak #
Narrog (2010, 2012; cited in @cite{rubinstein-2014} Table 1) surveys
200 genealogically diverse languages for grammaticalized deontic necessity.
The sample reveals that weak deontic necessity is rarer than strong:
only 62 of 200 languages (31%) grammaticalize it. See
Rubinstein2014.lean for the full typological data and implications
for the comparative analysis of weak necessity.
Data imported from Core.Modality.DeonticNecessity.
Only 62 of 200 languages grammaticalize weak deontic necessity (31%).
Strong deontic necessity (60 languages) is slightly less common than weak (62), showing that the strong/weak split itself is not universal.