Indefinite Pronoun Typology (@cite{haspelmath-1997} / WALS Ch 46) #
@cite{haspelmath-1997} @cite{haspelmath-2013} @cite{kadmon-landman-1993} @cite{ladusaw-1979}
Formalizes the core results of @cite{haspelmath-1997}'s cross-linguistic study of indefinite pronouns, one of the most celebrated results in semantic typology.
The Implicational Map #
The key insight: indefinite pronouns across languages can be classified by which functions they can serve. Nine function types form an implicational map — if a single pronoun series covers two non-adjacent functions, it must also cover all functions between them on the map.
The nine functions, ordered on the map:
specificKnown — specificUnknown — irrealis — question — conditional — indirectNeg — directNeg
|
freeChoice — comparative ———+
A pronoun series can cover any contiguous region on this map. This explains why, e.g., English "any" covers {question, conditional, indirectNeg, comparative, freeChoice} — a contiguous set — but no language has a single form for {specificKnown, directNeg} without covering all intermediate functions.
WALS Chapter 46 #
WALS classifies languages by the morphological source of their indefinite pronouns. The chapter is based on @cite{haspelmath-1997}'s cross-linguistic sample of 326 languages:
| Value | Count |
|---|---|
| Interrogative-based | 194 |
| Generic-noun-based | 85 |
| Special | 22 |
| Mixed | 23 |
| Existential construction | 2 |
| Total | 326 |
The nine function types on @cite{haspelmath-1997}'s implicational map for indefinite pronouns. These represent the semantic/pragmatic contexts in which indefinite pronoun forms are used.
The ordering reflects positions on the map, from most referential (specificKnown) to most universal (freeChoice).
- specificKnown : IndefiniteFunction
Specific known: "Somebody called. I know who it was." Speaker has a specific referent in mind and knows their identity.
- specificUnknown : IndefiniteFunction
Specific unknown: "Somebody called. I don't know who." Speaker has a specific referent in mind but doesn't know their identity.
- irrealis : IndefiniteFunction
Irrealis non-specific: "Please bring me something to read." The referent is hypothetical, not yet established in the discourse.
- question : IndefiniteFunction
Question: "Did anybody call?" In polar or wh-questions.
- conditional : IndefiniteFunction
Conditional: "If anybody calls, tell them I'm busy." In the antecedent of a conditional.
- indirectNeg : IndefiniteFunction
Indirect negation: "I don't think that anybody called." In the semantic scope of negation but not in the same clause.
- directNeg : IndefiniteFunction
Direct negation: "Nobody called." / "I didn't see anybody." Direct clause negation: the indefinite is in the same clause as the negative marker.
- comparative : IndefiniteFunction
Comparative: "She's taller than anybody." Standard of comparison in comparative constructions.
- freeChoice : IndefiniteFunction
Free choice: "Anybody can do that." Universal-like meaning: all members of the domain qualify.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
All nine function types, listed in map order.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Adjacent functions on @cite{haspelmath-1997}'s implicational map.
The map forms a connected graph:
specKnown — specUnknown — irrealis — question — conditional — indNeg — dirNeg
|
freeChoice — comparative —+
The crucial typological claim: any pronoun series covers a contiguous region on this graph. If a form is used for functions A and C, and B lies on the unique path between A and C, then the form is also used for B.
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Polarity.Typology.adjacentFunctions Phenomena.Polarity.Typology.IndefiniteFunction.specificKnown = [Phenomena.Polarity.Typology.IndefiniteFunction.specificUnknown]
- Phenomena.Polarity.Typology.adjacentFunctions Phenomena.Polarity.Typology.IndefiniteFunction.freeChoice = [Phenomena.Polarity.Typology.IndefiniteFunction.comparative]
Instances For
Adjacency is symmetric: if A is adjacent to B, then B is adjacent to A.
Every function has at least one neighbor (the map is connected).
The map has exactly 8 edges (undirected). We count ordered pairs and divide by 2: each edge appears once in each direction.
Check whether a function is in a given set (list membership).
Instances For
BFS on the implicational map restricted to a given set of functions.
Starting from start, explore all reachable nodes through edges whose
endpoints are both in funcs. Returns the set of reachable nodes.
This is the core algorithm for checking contiguity: a set of functions is contiguous iff BFS from any member reaches all other members.
Equations
- Phenomena.Polarity.Typology.bfsReachable funcs start fuel = Phenomena.Polarity.Typology.bfsReachable.go funcs [start] [start] fuel
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Polarity.Typology.bfsReachable.go funcs queue visited 0 = visited
- Phenomena.Polarity.Typology.bfsReachable.go funcs [] visited fuel = visited
Instances For
A set of functions is contiguous on the implicational map iff BFS from its first element reaches all elements in the set.
This is Haspelmath's key constraint: every pronoun series must cover a contiguous region on the map.
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Polarity.Typology.isContiguous [] = true
Instances For
An indefinite pronoun series: a named form (or morphological pattern) together with the set of functions it covers on the map.
- form : String
Surface form or morphological marker (e.g., "some-", "any-", "no-").
- functions : List IndefiniteFunction
The functions this series covers on the implicational map.
- notes : String
Optional notes on the series.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Phenomena.Polarity.Typology.instBEqIndefinitePronounSeries.beq x✝¹ x✝ = false
Instances For
Number of functions covered by a series.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of distinct indefinite pronoun series in a language.
Equations
- p.seriesCount = p.series.length
Instances For
All functions covered across all series.
Equations
- p.allFunctions = (List.flatMap (fun (x : Phenomena.Polarity.Typology.IndefinitePronounSeries) => x.functions) p.series).eraseDups
Instances For
Whether every series in a profile is contiguous on the map.
Equations
Instances For
Whether the profile covers all nine functions.
Equations
Instances For
Whether the series in a profile have disjoint function sets (no function appears in two different series).
Equations
- One or more equations did not get rendered due to their size.
Instances For
A single row in a WALS frequency table.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Sum of counts in a WALS table.
Equations
- Phenomena.Polarity.Typology.WALSCount.totalOf cs = List.foldl (fun (acc : Nat) (c : Phenomena.Polarity.Typology.WALSCount) => acc + c.count) 0 cs
Instances For
WALS Chapter 46 distribution (N = 326).
The five values classify languages by the morphological source of their indefinite pronouns, computed from the WALS 46A dataset.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Ch 46 total: 326 languages.
Interrogative-based is the most common single category.
Interrogative-based languages outnumber all other categories combined.
Look up a language profile's WALS 46A morphological source classification.
Equations
- p.wals46A = Option.map (fun (x : Core.WALS.Datapoint Core.WALS.F46A.IndefinitePronouns) => x.value) (Core.WALS.F46A.lookupISO p.iso)
Instances For
English (Indo-European, Germanic) #
English has four main indefinite pronoun series:
- some- series: specific known + specific unknown
- any- (NPI): question + conditional + indirect negation
- no- series: direct negation
- any- (FC) / every-: free choice + comparative
The split between NPI-any and FC-any is well-known; Haspelmath treats them as distinct series. English "some-" also has an irrealis use in some dialects, but the canonical analysis gives irrealis to "any".
Equations
- One or more equations did not get rendered due to their size.
Instances For
Russian (Indo-European, Slavic) #
Russian is a classic example of a language with many indefinite series, corresponding to fine-grained function distinctions:
- кто-то (kto-to): specific known
- кто-нибудь (kto-nibud'): specific unknown + irrealis
- кто-либо (kto-libo): question + conditional + indirect negation
- никто (nikto): direct negation (with negative concord)
- кто угодно (kto ugodno): free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
German (Indo-European, Germanic) #
German has a rich indefinite system with five series:
- jemand: specific known + specific unknown
- irgend(ein/wer): irrealis + question
- wer (in conditionals): conditional
- niemand: direct negation + indirect negation
- jeder: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese (Japonic) #
Japanese indefinite pronouns are built compositionally from wh-words (dare 'who', nani 'what') plus particles (-ka, -mo, -demo):
- dare-ka: specific known + specific unknown + irrealis + question
- dare-mo (negative): direct negation + indirect negation
- dare-demo: free choice + comparative + conditional
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mandarin Chinese (Sino-Tibetan) #
Mandarin uses a small set of indefinite forms with wide functional range:
- yǒu rén (有人): specific known + specific unknown
- shéi (谁, non-interrogative): irrealis + question + conditional + indirect negation + direct negation + comparative + free choice
The wh-word shéi in non-interrogative uses covers a remarkably wide contiguous region, from irrealis to free choice.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Turkish (Turkic) #
Turkish uses birisi/biri for specific functions and a combination of kimse and hiç kimse for negative and polarity-sensitive functions:
- birisi: specific known + specific unknown
- biri: irrealis
- kimse: question + conditional + indirect negation
- hiç kimse: direct negation
- herhangi biri: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hindi-Urdu (Indo-European, Indo-Aryan) #
Hindi uses koii as a general indefinite with wide distribution, plus specialized negative and free choice forms:
- koii: specific known + specific unknown + irrealis + question + conditional
- koii nahiiN: direct negation + indirect negation
- koii bhii: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Italian (Indo-European, Romance) #
Italian distinguishes qualcuno (specific) from negative nessuno and FC qualunque/qualsiasi:
- qualcuno: specific known + specific unknown + irrealis
- nessuno: question + conditional + indirect negation + direct negation
- qualunque/qualsiasi: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish (Uralic) #
Finnish has a differentiated system with five series:
- joku: specific known + specific unknown
- jokin (non-human): irrealis
- kukaan: question + conditional + indirect negation
- ei kukaan: direct negation
- kuka tahansa: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Korean (Koreanic) #
Korean, like Japanese, uses wh-words as indefinites with particles:
- nwukwu-nka: specific known + specific unknown
- nwukwu: irrealis + question + conditional
- nwukwu-to (neg): indirect negation + direct negation
- nwukwu-na / nwukwu-tunci: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hungarian (Uralic) #
Hungarian is notable for having many series, including a dedicated interrogative indefinite:
- valaki: specific known + specific unknown
- valaki (irrealis): irrealis
- valaki (question): question
- senki sem: conditional + indirect negation + direct negation
- akárki / bárki: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Georgian (Kartvelian) #
Georgian has a system with 4-5 series, using suffixes -γac, -me, and reduplication for different functions:
- vinme: specific known + specific unknown + irrealis
- vinme (question): question + conditional
- aravin: indirect negation + direct negation
- nebismieri / vinc: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Quechua (Quechuan) #
Quechua (Imbabura variety) uses a relatively undifferentiated system:
- pi-taj: specific known + specific unknown + irrealis + question
- pi-pash: conditional + indirect negation
- mana pi-pash: direct negation
- ima-pash / maijan-pash: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Yoruba (Niger-Congo, Atlantic-Congo) #
Yoruba has a relatively undifferentiated system with 2 main series:
- ẹnìkan: specific known + specific unknown + irrealis + question + conditional
- ẹ̀nìkẹ́ni / kò sí ẹnìkan: indirect negation + direct negation + comparative + free choice
Equations
- One or more equations did not get rendered due to their size.
Instances For
Thai (Kra-Dai) #
Thai uses khraj for most indefinite functions, with kh̄raj kɔ̂ for free choice:
- khraj (ใคร): specific known + specific unknown + irrealis + question + conditional
- mâj mii khraj (ไม่มีใคร): indirect negation + direct negation
- khraj kɔ̂ (ใครก็): free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tagalog (Austronesian) #
Tagalog uses may as existential and wala as negative existential:
- may isang: specific known + specific unknown + irrealis
- sinuman: question + conditional + indirect negation
- walang: direct negation
- kahit sino: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
Swahili (Niger-Congo, Bantu) #
Swahili uses mtu (person) with various modifiers:
- mtu (fulani): specific known + specific unknown + irrealis + question + conditional
- mtu ye yote (neg): indirect negation + direct negation
- mtu ye yote: free choice + comparative
Equations
- One or more equations did not get rendered due to their size.
Instances For
All language profiles in our sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of languages in our sample.
The central typological claim: every pronoun series covers a contiguous region on Haspelmath's implicational map.
We verify this for every series in every language in our sample.
English: all series are contiguous on the map.
Russian: all series are contiguous.
German: all series are contiguous.
Japanese: all series are contiguous.
Mandarin: all series are contiguous.
Turkish: all series are contiguous.
Hindi: all series are contiguous.
Italian: all series are contiguous.
Finnish: all series are contiguous.
Korean: all series are contiguous.
Hungarian: all series are contiguous.
Georgian: all series are contiguous.
Quechua: all series are contiguous.
Yoruba: all series are contiguous.
Thai: all series are contiguous.
Tagalog: all series are contiguous.
Swahili: all series are contiguous.
Master contiguity theorem: every series in every language in our sample is contiguous on Haspelmath's implicational map.
Every language in our sample covers all nine functions.
Every language's series have disjoint function sets — no function appears in two different series. Together with coverage (§9), this means the series form a partition of the nine function types.
Master disjointness theorem: every language's series are disjoint.
Master partition theorem: every language's series form a partition of the nine function types (contiguous + covering + disjoint).
Generalization 1: Direct negation always has a strategy #
Every language has at least one series that covers directNeg. This reflects the functional universal that every language can express sentential negation of an indefinite.
Every language has a series covering direct negation.
Generalization 2: Free choice and comparative pattern together #
In our sample, free choice and comparative are always covered by the same series. This reflects their shared universal/widened-domain semantics. @cite{haspelmath-1997}: "the comparative function is semantically similar to free choice."
Free choice and comparative are always in the same series.
Generalization 3: Specific known is rarely shared with polarity functions #
Specific known is the most referential function, semantically distant from polarity-sensitive uses. In our sample, whenever specificKnown and directNeg are in different series (which is always), the specific-known series does not extend past conditional on the map.
Specific known and direct negation are never in the same series.
Generalization 4: More series means more precise function encoding #
Languages with more series make finer distinctions on the map. We verify that the average number of functions per series decreases as series count increases.
Mandarin (2 series) has a higher average coverage per series than Russian (5 series). This demonstrates that fewer series = broader coverage per form.
Generalization 5: Specific known is typically separate #
In languages with 3+ series, specific known tends to be separate from polarity-sensitive functions.
In languages with 3+ series, specificKnown is never in the same series as question or conditional.
Generalization 6: The polarity cluster #
Question, conditional, and indirect negation frequently pattern together (or at least contiguously). These are the classic polarity-sensitive contexts in the formal semantics literature (downward-entailing or nonveridical).
The polarity cluster: in every language, there exists a series that covers at least two of {question, conditional, indirectNeg}.
Count of languages with a given number of series.
Equations
- Phenomena.Polarity.Typology.countBySeriesCount langs n = (List.filter (fun (p : Phenomena.Polarity.Typology.IndefinitePronounProfile) => p.seriesCount == n) langs).length
Instances For
Series count distribution in our sample.
Verify the series count and key properties for each language.
Every language in our sample appears in the WALS 46A dataset. We verify each profile's morphological source classification, bridging our Haspelmath function-map data to the WALS morphological-source typology.
Distribution in our 17-language sample:
- Interrogative-based: Russian, Japanese, Korean, Hungarian, Georgian,
Quechua, Thai (7)
- Generic-noun-based: English, Turkish, Italian, Yoruba, Swahili (5)
- Special: Hindi, Finnish (2)
- Mixed: German, Mandarin (2)
- Existential construction: Tagalog (1)
Every language in our sample has a WALS 46A entry.
Wh-based indefinite systems #
Languages that use wh-words as indefinites (Japanese, Korean, Mandarin, Thai) tend to have fewer series, because the bare wh-word covers a wide contiguous range on the map.
Equations
Instances For
Wh-based indefinite languages average fewer series than others.
Negative concord languages #
Languages with negative concord (Russian, Italian, Hungarian) tend to have the direct negation function grouped with adjacent polarity functions rather than isolated in a single-function series.
In negative concord languages, at least one has the directNeg function in a series with more than one function.
WALS morphological source and series count #
Interrogative-based languages (which build indefinites from wh-words) should correlate with our wh-based language list. We verify this and test whether morphological source predicts series differentiation.
All four wh-based languages are classified as interrogative-based or mixed in WALS 46A.
Languages classified as interrogative-based in WALS 46A.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Seven of our 17 languages are interrogative-based.
Interrogative-based languages in our sample average ≤ 4 series per language (total series ≤ 28 across 7 languages).
We demonstrate that certain function sets are NOT contiguous on the map, confirming that Haspelmath's map correctly rules them out as possible single-series ranges.
A hypothetical series covering {specificKnown, directNeg} without the intervening functions is not contiguous.
A hypothetical series covering {specificKnown, freeChoice} without intervening functions is not contiguous.
A hypothetical series covering {specificKnown, comparative} is not contiguous.
A hypothetical series covering {specificUnknown, directNeg} skipping irrealis through indirectNeg is not contiguous.
But {specificKnown, specificUnknown} IS contiguous (adjacent).
And {question, conditional, indirectNeg} IS contiguous (a path).
The full set of all nine functions is contiguous (the map is connected).
Connection to Polarity Item Theory #
Haspelmath's implicational map connects directly to formal theories of polarity sensitivity:
Functions 4–7 (question through directNeg) correspond to the classic downward-entailing / nonveridical environments that license NPIs.
Functions 8–9 (comparative, freeChoice) correspond to free choice items, which have been analyzed as universal quantifiers or domain-widened indefinites.
Functions 1–3 (specific known/unknown, irrealis) correspond to positive polarity and epistemic specificity, which are anti-licensed by negation.
The contiguity constraint on the map thus has a semantic explanation: adjacent functions share semantic properties (monotonicity, veridicality, specificity) that make it natural for a single form to cover them.
A Haspelmath function corresponds to a downward-entailing (or nonveridical) licensing context. Functions 4–7 on the map: question, conditional, indirect negation, direct negation (@cite{ladusaw-1979}).
Equations
Instances For
A Haspelmath function corresponds to a free choice context. Functions 8–9 on the map: comparative, freeChoice.
Equations
Instances For
The NPI region (question through directNeg) is contiguous.
The FC region (comparative, freeChoice) is contiguous.
The specific/irrealis region is contiguous.
The NPI+FC region (question through freeChoice, the full polarity-sensitive span) is contiguous.
Minimum series count in our sample.
Maximum series count in our sample.
Total number of distinct series across all languages.
The most common series count in our sample is 4 (six languages).
Verify consistency between Typology profiles and Fragment PolarityItems entries. For each language with a Fragment PolarityItems file, we check that Fragment NPI entries are licensed in contexts corresponding to Haspelmath functions the Typology profile assigns to polarity-sensitive series, and that Fragment FCI entries have obligatory domain alternatives when the Typology profile assigns free choice functions.