Documentation

Linglib.Phenomena.Polarity.Typology

Indefinite Pronoun Typology (@cite{haspelmath-1997} / WALS Ch 46) #

@cite{haspelmath-1997} @cite{haspelmath-2013} @cite{kadmon-landman-1993} @cite{ladusaw-1979}

Formalizes the core results of @cite{haspelmath-1997}'s cross-linguistic study of indefinite pronouns, one of the most celebrated results in semantic typology.

The Implicational Map #

The key insight: indefinite pronouns across languages can be classified by which functions they can serve. Nine function types form an implicational map — if a single pronoun series covers two non-adjacent functions, it must also cover all functions between them on the map.

The nine functions, ordered on the map:

specificKnownspecificUnknownirrealisquestionconditionalindirectNegdirectNeg
                                                                                       |
                                                         freeChoicecomparative ———+

A pronoun series can cover any contiguous region on this map. This explains why, e.g., English "any" covers {question, conditional, indirectNeg, comparative, freeChoice} — a contiguous set — but no language has a single form for {specificKnown, directNeg} without covering all intermediate functions.

WALS Chapter 46 #

WALS classifies languages by the morphological source of their indefinite pronouns. The chapter is based on @cite{haspelmath-1997}'s cross-linguistic sample of 326 languages:

ValueCount
Interrogative-based194
Generic-noun-based85
Special22
Mixed23
Existential construction2
Total326

The nine function types on @cite{haspelmath-1997}'s implicational map for indefinite pronouns. These represent the semantic/pragmatic contexts in which indefinite pronoun forms are used.

The ordering reflects positions on the map, from most referential (specificKnown) to most universal (freeChoice).

  • specificKnown : IndefiniteFunction

    Specific known: "Somebody called. I know who it was." Speaker has a specific referent in mind and knows their identity.

  • specificUnknown : IndefiniteFunction

    Specific unknown: "Somebody called. I don't know who." Speaker has a specific referent in mind but doesn't know their identity.

  • irrealis : IndefiniteFunction

    Irrealis non-specific: "Please bring me something to read." The referent is hypothetical, not yet established in the discourse.

  • question : IndefiniteFunction

    Question: "Did anybody call?" In polar or wh-questions.

  • conditional : IndefiniteFunction

    Conditional: "If anybody calls, tell them I'm busy." In the antecedent of a conditional.

  • indirectNeg : IndefiniteFunction

    Indirect negation: "I don't think that anybody called." In the semantic scope of negation but not in the same clause.

  • directNeg : IndefiniteFunction

    Direct negation: "Nobody called." / "I didn't see anybody." Direct clause negation: the indefinite is in the same clause as the negative marker.

  • comparative : IndefiniteFunction

    Comparative: "She's taller than anybody." Standard of comparison in comparative constructions.

  • freeChoice : IndefiniteFunction

    Free choice: "Anybody can do that." Universal-like meaning: all members of the domain qualify.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      All nine function types, listed in map order.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        Adjacent functions on @cite{haspelmath-1997}'s implicational map.

        The map forms a connected graph:

        specKnown — specUnknown — irrealisquestionconditional — indNeg — dirNeg
                                                                                  |
                                                        freeChoicecomparative —+
        

        The crucial typological claim: any pronoun series covers a contiguous region on this graph. If a form is used for functions A and C, and B lies on the unique path between A and C, then the form is also used for B.

        Equations
        Instances For

          Adjacency is symmetric: if A is adjacent to B, then B is adjacent to A.

          Every function has at least one neighbor (the map is connected).

          The map has exactly 8 edges (undirected). We count ordered pairs and divide by 2: each edge appears once in each direction.

          Check whether a function is in a given set (list membership).

          Equations
          Instances For

            BFS on the implicational map restricted to a given set of functions. Starting from start, explore all reachable nodes through edges whose endpoints are both in funcs. Returns the set of reachable nodes.

            This is the core algorithm for checking contiguity: a set of functions is contiguous iff BFS from any member reaches all other members.

            Equations
            Instances For
              Equations
              Instances For

                A set of functions is contiguous on the implicational map iff BFS from its first element reaches all elements in the set.

                This is Haspelmath's key constraint: every pronoun series must cover a contiguous region on the map.

                Equations
                Instances For

                  An indefinite pronoun series: a named form (or morphological pattern) together with the set of functions it covers on the map.

                  • form : String

                    Surface form or morphological marker (e.g., "some-", "any-", "no-").

                  • The functions this series covers on the implicational map.

                  • notes : String

                    Optional notes on the series.

                  Instances For
                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      Number of functions covered by a series.

                      Equations
                      Instances For

                        A language's indefinite pronoun profile: its name, series inventory, and metadata.

                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Number of distinct indefinite pronoun series in a language.

                            Equations
                            Instances For

                              Whether the series in a profile have disjoint function sets (no function appears in two different series).

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                A single row in a WALS frequency table.

                                Instances For
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For
                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      WALS Chapter 46 distribution (N = 326).

                                      The five values classify languages by the morphological source of their indefinite pronouns, computed from the WALS 46A dataset.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        English (Indo-European, Germanic) #

                                        English has four main indefinite pronoun series:

                                        The split between NPI-any and FC-any is well-known; Haspelmath treats them as distinct series. English "some-" also has an irrealis use in some dialects, but the canonical analysis gives irrealis to "any".

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Russian (Indo-European, Slavic) #

                                          Russian is a classic example of a language with many indefinite series, corresponding to fine-grained function distinctions:

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            German (Indo-European, Germanic) #

                                            German has a rich indefinite system with five series:

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              Japanese (Japonic) #

                                              Japanese indefinite pronouns are built compositionally from wh-words (dare 'who', nani 'what') plus particles (-ka, -mo, -demo):

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Mandarin Chinese (Sino-Tibetan) #

                                                Mandarin uses a small set of indefinite forms with wide functional range:

                                                The wh-word shéi in non-interrogative uses covers a remarkably wide contiguous region, from irrealis to free choice.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  Turkish (Turkic) #

                                                  Turkish uses birisi/biri for specific functions and a combination of kimse and hiç kimse for negative and polarity-sensitive functions:

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    Hindi-Urdu (Indo-European, Indo-Aryan) #

                                                    Hindi uses koii as a general indefinite with wide distribution, plus specialized negative and free choice forms:

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      Italian (Indo-European, Romance) #

                                                      Italian distinguishes qualcuno (specific) from negative nessuno and FC qualunque/qualsiasi:

                                                      Equations
                                                      • One or more equations did not get rendered due to their size.
                                                      Instances For

                                                        Finnish (Uralic) #

                                                        Finnish has a differentiated system with five series:

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          Korean (Koreanic) #

                                                          Korean, like Japanese, uses wh-words as indefinites with particles:

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            Hungarian (Uralic) #

                                                            Hungarian is notable for having many series, including a dedicated interrogative indefinite:

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              Georgian (Kartvelian) #

                                                              Georgian has a system with 4-5 series, using suffixes -γac, -me, and reduplication for different functions:

                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                Quechua (Quechuan) #

                                                                Quechua (Imbabura variety) uses a relatively undifferentiated system:

                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For

                                                                  Yoruba (Niger-Congo, Atlantic-Congo) #

                                                                  Yoruba has a relatively undifferentiated system with 2 main series:

                                                                  Equations
                                                                  • One or more equations did not get rendered due to their size.
                                                                  Instances For

                                                                    Thai (Kra-Dai) #

                                                                    Thai uses khraj for most indefinite functions, with kh̄raj kɔ̂ for free choice:

                                                                    Equations
                                                                    • One or more equations did not get rendered due to their size.
                                                                    Instances For

                                                                      Tagalog (Austronesian) #

                                                                      Tagalog uses may as existential and wala as negative existential:

                                                                      Equations
                                                                      • One or more equations did not get rendered due to their size.
                                                                      Instances For

                                                                        Swahili (Niger-Congo, Bantu) #

                                                                        Swahili uses mtu (person) with various modifiers:

                                                                        Equations
                                                                        • One or more equations did not get rendered due to their size.
                                                                        Instances For

                                                                          All language profiles in our sample.

                                                                          Equations
                                                                          • One or more equations did not get rendered due to their size.
                                                                          Instances For

                                                                            Number of languages in our sample.

                                                                            The central typological claim: every pronoun series covers a contiguous region on Haspelmath's implicational map.

                                                                            We verify this for every series in every language in our sample. 
                                                                            

                                                                            English: all series are contiguous on the map.

                                                                            Master contiguity theorem: every series in every language in our sample is contiguous on Haspelmath's implicational map.

                                                                            Every language in our sample covers all nine functions.

                                                                            Every language's series have disjoint function sets — no function appears in two different series. Together with coverage (§9), this means the series form a partition of the nine function types.

                                                                            Master disjointness theorem: every language's series are disjoint.

                                                                            Master partition theorem: every language's series form a partition of the nine function types (contiguous + covering + disjoint).

                                                                            Generalization 1: Direct negation always has a strategy #

                                                                            Every language has at least one series that covers directNeg. This reflects the functional universal that every language can express sentential negation of an indefinite.

                                                                            Generalization 2: Free choice and comparative pattern together #

                                                                            In our sample, free choice and comparative are always covered by the same series. This reflects their shared universal/widened-domain semantics. @cite{haspelmath-1997}: "the comparative function is semantically similar to free choice."

                                                                            Generalization 3: Specific known is rarely shared with polarity functions #

                                                                            Specific known is the most referential function, semantically distant from polarity-sensitive uses. In our sample, whenever specificKnown and directNeg are in different series (which is always), the specific-known series does not extend past conditional on the map.

                                                                            Generalization 4: More series means more precise function encoding #

                                                                            Languages with more series make finer distinctions on the map. We verify that the average number of functions per series decreases as series count increases.

                                                                            Mandarin (2 series) has a higher average coverage per series than Russian (5 series). This demonstrates that fewer series = broader coverage per form.

                                                                            Generalization 5: Specific known is typically separate #

                                                                            In languages with 3+ series, specific known tends to be separate from polarity-sensitive functions.

                                                                            Generalization 6: The polarity cluster #

                                                                            Question, conditional, and indirect negation frequently pattern together (or at least contiguously). These are the classic polarity-sensitive contexts in the formal semantics literature (downward-entailing or nonveridical).

                                                                            The polarity cluster: in every language, there exists a series that covers at least two of {question, conditional, indirectNeg}.

                                                                            Series count distribution in our sample.

                                                                            Verify the series count and key properties for each language.

                                                                            Every language in our sample appears in the WALS 46A dataset. We verify each profile's morphological source classification, bridging our Haspelmath function-map data to the WALS morphological-source typology.

                                                                            Distribution in our 17-language sample:
                                                                            - Interrogative-based: Russian, Japanese, Korean, Hungarian, Georgian,
                                                                              Quechua, Thai (7)
                                                                            - Generic-noun-based: English, Turkish, Italian, Yoruba, Swahili (5)
                                                                            - Special: Hindi, Finnish (2)
                                                                            - Mixed: German, Mandarin (2)
                                                                            - Existential construction: Tagalog (1) 
                                                                            

                                                                            Every language in our sample has a WALS 46A entry.

                                                                            Wh-based indefinite systems #

                                                                            Languages that use wh-words as indefinites (Japanese, Korean, Mandarin, Thai) tend to have fewer series, because the bare wh-word covers a wide contiguous range on the map.

                                                                            Wh-based indefinite languages average fewer series than others.

                                                                            Negative concord languages #

                                                                            Languages with negative concord (Russian, Italian, Hungarian) tend to have the direct negation function grouped with adjacent polarity functions rather than isolated in a single-function series.

                                                                            In negative concord languages, at least one has the directNeg function in a series with more than one function.

                                                                            WALS morphological source and series count #

                                                                            Interrogative-based languages (which build indefinites from wh-words) should correlate with our wh-based language list. We verify this and test whether morphological source predicts series differentiation.

                                                                            Languages classified as interrogative-based in WALS 46A.

                                                                            Equations
                                                                            • One or more equations did not get rendered due to their size.
                                                                            Instances For

                                                                              Interrogative-based languages in our sample average ≤ 4 series per language (total series ≤ 28 across 7 languages).

                                                                              We demonstrate that certain function sets are NOT contiguous on the map, confirming that Haspelmath's map correctly rules them out as possible single-series ranges.

                                                                              A hypothetical series covering {specificKnown, directNeg} without the intervening functions is not contiguous.

                                                                              A hypothetical series covering {specificKnown, freeChoice} without intervening functions is not contiguous.

                                                                              A hypothetical series covering {specificUnknown, directNeg} skipping irrealis through indirectNeg is not contiguous.

                                                                              The full set of all nine functions is contiguous (the map is connected).

                                                                              Connection to Polarity Item Theory #

                                                                              Haspelmath's implicational map connects directly to formal theories of polarity sensitivity:

                                                                              1. Functions 4–7 (question through directNeg) correspond to the classic downward-entailing / nonveridical environments that license NPIs.

                                                                              2. Functions 8–9 (comparative, freeChoice) correspond to free choice items, which have been analyzed as universal quantifiers or domain-widened indefinites.

                                                                              3. Functions 1–3 (specific known/unknown, irrealis) correspond to positive polarity and epistemic specificity, which are anti-licensed by negation.

                                                                              The contiguity constraint on the map thus has a semantic explanation: adjacent functions share semantic properties (monotonicity, veridicality, specificity) that make it natural for a single form to cover them.

                                                                              A Haspelmath function corresponds to a downward-entailing (or nonveridical) licensing context. Functions 4–7 on the map: question, conditional, indirect negation, direct negation (@cite{ladusaw-1979}).

                                                                              Equations
                                                                              Instances For

                                                                                A Haspelmath function corresponds to a free choice context. Functions 8–9 on the map: comparative, freeChoice.

                                                                                Equations
                                                                                Instances For

                                                                                  Total number of distinct series across all languages.

                                                                                  Verify consistency between Typology profiles and Fragment PolarityItems entries. For each language with a Fragment PolarityItems file, we check that Fragment NPI entries are licensed in contexts corresponding to Haspelmath functions the Typology profile assigns to polarity-sensitive series, and that Fragment FCI entries have obligatory domain alternatives when the Typology profile assigns free choice functions.