Documentation

Linglib.Phenomena.FillerGap.Typology

Cross-Linguistic Typology of Relativization (WALS Chapters 122--123) #

@cite{comrie-1989} @cite{keenan-comrie-1977} @cite{comrie-2013}

Cross-linguistic data on relative clause formation strategies from two WALS chapters by @cite{comrie-2013}, supplemented with the @cite{keenan-comrie-1977} Accessibility Hierarchy.

Ch 122: Relativization on Subjects #

How languages form relative clauses on subject position. The main strategies are: gap (the relativized position is simply empty), pronoun retention (a resumptive pronoun fills the relativized position), and relative pronoun (a dedicated wh-element or relative pronoun fills the position and typically fronts). Additional types include non-reduction (the head noun is repeated inside the relative clause).

Sample: 166 languages (WALS v2020.4). Gap strategy is the most common for subject relativization (125/166 = 75.3%), reflecting the high accessibility of the subject position on the @cite{keenan-comrie-1977} hierarchy.

Ch 123: Relativization on Obliques #

Whether oblique positions (instrumentals, locatives, etc.) can be relativized, and if so by what strategy. Many languages that use the gap strategy on subjects switch to pronoun retention or relative pronouns for obliques, or cannot relativize obliques at all.

Sample: 112 languages (WALS v2020.4). Gap remains the most common strategy (55/112 = 49.1%), but pronoun retention is much more common than for subjects (20/112 = 17.9% vs 5/166 = 3.0%), and 10 languages cannot relativize obliques at all.

@cite{keenan-comrie-1977} Accessibility Hierarchy #

The Accessibility Hierarchy ranks grammatical positions by their accessibility to relativization:

Subject > Direct Object > Indirect Object > Oblique > Genitive > Object of Comparison

The paper states three Hierarchy Constraints (HCs):

  1. HC₁: A language must be able to relativize subjects.
  2. HC₂ (Continuity): Any RC-forming strategy must apply to a continuous segment of the AH.
  3. HC₃ (Cut-off): Strategies that apply at one point of the AH may in principle cease to apply at any lower point.

From these, the Primary Relativization Constraint follows: if a language's primary strategy can apply to a low position on the AH, it can also apply to all higher positions. Non-primary strategies need not satisfy this — they may cover a continuous segment that excludes subjects (e.g., the +case strategy covering IO-OBL-GEN-OCOMP but not SU-DO in Welsh and Arabic).

The hierarchy is one of the most robust typological universals in syntax, supported by data from hundreds of languages. It correlates with processing difficulty, frequency of relativization in corpora, and acquisition order.

Relative Clause Position #

Cross-linguistically, post-nominal relative clauses (the man [who left]) are more common than pre-nominal ones ([left who] man), and internally-headed relative clauses are rare. This correlates with basic word order: VO languages strongly prefer post-nominal, while OV languages may use pre-nominal.

WALS Ch 122: Strategy used to relativize the subject position.

The strategy dimension classifies how the relativized position inside the relative clause is handled: is it left empty (gap), filled by a resumptive pronoun (pronoun retention), or filled by a relative pronoun that also marks the clause boundary (relative pronoun)?

Additional types capture non-reduction (head noun repeated inside the RC), mixed strategies within a single language, and the absence of relative clauses entirely.

  • gap : SubjRelStrategy

    Gap strategy: the relativized position is simply empty. E.g., English "the man [that _ left]", Japanese "[ _ kaetta] hito". The most common strategy for subjects (125/166 in WALS).

  • pronounRetention : SubjRelStrategy

    Pronoun-retention: a resumptive pronoun occupies the relativized position. E.g., Arabic (dialectal) "ar-rajul [illi huwa raah]" 'the-man [that he left]'. Rare for subjects (5/166 in WALS) but much more common for lower positions on the AH.

  • relativePronoun : SubjRelStrategy

    Relative pronoun: a dedicated relative pronoun or wh-word fills the relativized position and typically fronts to clause-initial position. E.g., English "the man [who left]", German "der Mann [der ging]". Concentrated in European languages (12/166 in WALS).

  • nonReduction : SubjRelStrategy

    Non-reduction: the head noun is repeated (or a full NP appears) inside the relative clause. The relativized position is not "reduced" to a gap or pronoun. E.g., Bambara "tye [tye ye so san] ye n deme" 'man [man PST horse buy] PST me help'. (24/166 in WALS).

  • mixed : SubjRelStrategy

    Mixed: the language productively uses more than one of the above strategies for subject relativization. E.g., English uses both gap ("the man [that _ left]") and relative pronoun ("the man [who left]"). WALS does not distinguish a "mixed" category; this value is used only in our language profiles.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      WALS Ch 123: Strategy used to relativize oblique positions (instrumental, locative, etc.), or whether obliques can be relativized at all.

      Obliques are low on the @cite{keenan-comrie-1977} Accessibility Hierarchy, and many languages that freely relativize subjects cannot relativize obliques. Those that can often use a different strategy than for subjects (typically shifting from gap to pronoun retention or relative pronoun).

      • gap : OblRelStrategy

        Gap strategy on obliques. Still the most common strategy, though less dominant than for subjects. E.g., English (with stranding): "the city [that I lived in _]". (55/112 in WALS).

      • pronounRetention : OblRelStrategy

        Pronoun-retention on obliques. Much more common than for subjects, since resumptive pronouns help recover the oblique role. E.g., Arabic "al-madina [illi saafartu ila-ha]" 'the-city [that I-traveled to-it]'. (20/112 in WALS).

      • relativePronoun : OblRelStrategy

        Relative pronoun on obliques. E.g., English "the city [in which I lived _]", German "die Stadt [in der ich wohnte]". (13/112 in WALS).

      • nonReduction : OblRelStrategy

        Non-reduction on obliques. (14/112 in WALS).

      • mixed : OblRelStrategy

        Mixed strategies for oblique relativization. WALS does not distinguish a "mixed" category; used only in our profiles.

      • notRelativizable : OblRelStrategy

        Obliques cannot be relativized in this language. The language uses alternative constructions (e.g., nominalization, paraphrase). (10/112 in WALS).

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          A single row in a WALS frequency table: a category label and its count.

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                Chapter 122 distribution: relativization strategies on subjects. Computed from the WALS v2020.4 CLDF data in Core.WALS.F122A. @cite{comrie-kuteva-2013a}.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Chapter 123 distribution: relativization strategies on obliques. Computed from the WALS v2020.4 CLDF data in Core.WALS.F123A. @cite{comrie-kuteva-2013b}.

                  The sample for Ch 123 is smaller than Ch 122 because some languages in the Ch 122 sample could not be assessed for oblique relativization.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    Ch 122 total: 166 languages (from WALS v2020.4).

                    Ch 123 total: 112 languages (from WALS v2020.4).

                    A language's relativization profile across WALS Chapters 122--123 plus additional typological properties.

                    Each profile records the strategy used for subject and oblique relativization, the relative clause position, the lowest position on the Accessibility Hierarchy that can be relativized, and the language's area (for areal generalizations like the concentration of relative pronouns in Europe).

                    • language : String

                      Language name.

                    • iso : String

                      ISO 639-3 code.

                    • subjStrategy : SubjRelStrategy

                      Ch 122: Strategy for relativizing subjects.

                    • oblStrategy : OblRelStrategy

                      Ch 123: Strategy for relativizing obliques.

                    • rcPosition : Core.RCPosition

                      Position of relative clause with respect to head noun.

                    • lowestRelativizable : Core.AHPosition

                      Lowest position on the AH that can be relativized. If a language can relativize obliques, this is.oblique or lower; if it can only relativize subjects, this is.subject.

                    • isEuropean : Bool

                      Whether the language is in Europe (for areal generalization about relative pronoun concentration).

                    • notes : String

                      Notes on the relativization system.

                    Instances For
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        English: gap strategy for both subjects and obliques, with relative pronouns as an alternative. "The man [that _ left]" (gap on subject), "the city [that I lived in _]" (gap with preposition stranding on oblique), "the man [who _ left]" (relative pronoun on subject), "the city [in which I lived _]" (relative pronoun on oblique). English can relativize all positions on the AH.

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          German: relative pronoun strategy (der/die/das) for both subjects and obliques. "Der Mann [der _ ging]" (subject), "die Stadt [in der ich wohnte]" (oblique). Like English, German can relativize all AH positions, using relative pronouns at all levels.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            French: relative pronoun strategy. "L'homme [qui est parti]" (subject), "la ville [dans laquelle j'habitais]" / "la ville [ou j'habitais]" (oblique). Uses different relative pronouns for different AH positions: qui (subject), que (direct object), dont (genitive), lequel (oblique).

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Russian: relative pronoun kotoryj (declined for case, gender, number). "Chelovek [kotoryj ushol]" 'man [who left]' (subject), "gorod [v kotorom ja zhil]" 'city [in which I lived]' (oblique). All AH positions relativizable.

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                Arabic (Modern Standard): gap strategy on subjects, pronoun retention on obliques. "ar-rajul [alladhi ghadara _]" 'the-man [who left _]' (gap/relative pronoun hybrid on subject); "al-madina [allati saafarta ilay-ha]" 'the-city [which traveled-2SG to-it]' (pronoun retention on oblique). The shift from gap to resumptive on lower AH positions is a textbook illustration of the Accessibility Hierarchy.

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  Hebrew (Modern): gap strategy on subjects, pronoun retention on obliques. "Ha-ish [she-halakh _]" 'the-man [that-left _]' (gap on subject), "ha-ir [she-garti ba-h]" 'the-city [that-lived-I in-it]' (resumptive on oblique). Like Arabic, exemplifies the gap-to-resumptive shift.

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    Japanese: gap strategy with pre-nominal relative clauses. "[ _ kaetta] hito" '[ _ left] person'. No relative pronoun or complementizer in the RC. All AH positions can be relativized using the gap strategy, though oblique relativization may require context to identify the role.

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Korean: gap strategy with pre-nominal relative clauses, parallel to Japanese. "[ _ tteonagan] saram" '[ _ left] person'. Pre-nominal RC with no overt relative pronoun.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        Mandarin Chinese: gap strategy with pre-nominal relative clauses, marked by the relativizer de. "[ _ zou-le de] ren" '[ _ left DE] person'. Gap strategy extends to obliques, though lower positions become progressively harder.

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Turkish: gap strategy with pre-nominal relative clauses. Uses participial suffixes (-en, -dik) on the verb inside the RC. "[ _ gid-en] adam" '[ _ go-PART] man'. Gap strategy used for subjects; obliques use a different participial suffix.

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Hindi-Urdu: correlative strategy (double-headed). "Jo aadmii aayaa, vo aadmii meraa bhaaii hai" 'which man came, that man my brother is'. The relative clause is left-adjoined with jo as a relative pronoun; the main clause has a correlative demonstrative vo. Can also use post-nominal RCs in formal/written registers.

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              Bambara: internally-headed relative clauses with non-reduction strategy. "Ne ye [tye ye so san] tye ye" 'I saw [man PST horse buy] man PRT'. The head noun appears inside the RC rather than external to it. One of the best-known examples of internally-headed RCs.

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Swahili: gap strategy on subjects with relative marker amba-. "Mtu [amba-ye ali-ondoka _]" 'person [REL-who PST-leave _]'. Pronoun retention on lower positions. Can relativize down to obliques.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  Tagalog: gap strategy with post-nominal relative clauses, introduced by the linker na/ng. "Ang lalaki [na umalis _]" 'the man [LNK left _]'. Subject-only relativization in many analyses (reflecting the voice system: only the "ang-phrase" can be relativized, requiring voice alternation to relativize non-subjects).

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    Malagasy: gap strategy with post-nominal RCs. Like Tagalog, requires voice alternation to relativize non-subjects (only the subject/topic can be gapped). "Ny lehilahy [izay nandao _]" 'the man [that left _]'.

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      Finnish: relative pronoun joka (declines for case). "Mies [joka lahti]" 'man [who left]' (subject), "kaupunki [jossa asuin]" 'city [where I-lived]' (oblique). The relative pronoun takes the case required by its role inside the RC.

                                                      Equations
                                                      • One or more equations did not get rendered due to their size.
                                                      Instances For

                                                        Welsh: gap strategy with a pre-verbal particle a (subject) or y (non-subject). "Y dyn [a adawodd _]" 'the man [PRT left _]'. Uses pronoun retention for lower AH positions. Post-nominal RCs in a VSO language.

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          Navajo: internally-headed/correlative relative clauses. The head noun appears inside the RC, often with a demonstrative in the main clause. Pre-nominal word order. Limited relativization to higher AH positions.

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            Yoruba: non-reduction strategy with a post-nominal RC introduced by ti. Pronoun retention common for obliques. "Okunrin [ti o lo _]" 'man [that he left _]'. Can relativize subjects with both gap and pronoun retention.

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              Mam (SJO): gap strategy on subjects via Agent Focus (AF) voice alternation. Obliques can be extracted with dedicated morphology (=(y)a'), but oblique relativization is limited. Extraction is tracked by =(y)a' on Voice⁰/Dir⁰. @cite{elkins-torrence-brown-2026}.

                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                All relativization profiles in the sample.

                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For

                                                                  Can a language relativize the given AH position? Per the accessibility hierarchy: if the language can relativize its lowest position, it can relativize everything above it.

                                                                  Equations
                                                                  Instances For

                                                                    Count of languages with a given subject relativization strategy.

                                                                    Equations
                                                                    Instances For

                                                                      Count of languages with a given oblique relativization strategy.

                                                                      Equations
                                                                      Instances For

                                                                        Number of languages in our sample.

                                                                        WALS Ch 122: The gap strategy is the single most common strategy for subject relativization, reflecting the high accessibility of the subject position. Gap (125) > Non-reduction (24) > Rel pronoun (12) > Pronoun retention (5). Computed from WALS data.

                                                                        Pronoun retention is rare for subjects (5/166 = 3.0%) but much more common for obliques (20/112 = 17.9%). This is a key prediction of the Accessibility Hierarchy: lower positions require "heavier" strategies to recover the grammatical role. Computed from WALS data.

                                                                        Some languages cannot relativize obliques at all (10/112 = 8.9% in WALS). This contrasts with subjects, where the WALS 122A enum does not even include a "not possible" value — all sampled languages can relativize subjects. Computed from WALS data.

                                                                        In our sample, all languages with the relative pronoun strategy on subjects are either European or have European-contact influence. Relative pronouns are concentrated in Indo-European and Uralic families. Hindi-Urdu (jo) is Indo-European; Finnish (joka) is Uralic but geographically European.

                                                                        Internally-headed relative clauses are rare cross-linguistically. In our sample, only Bambara uses this strategy.

                                                                        Post-nominal relative clauses are more common than pre-nominal ones in our sample, reflecting the cross-linguistic preference.

                                                                        The Accessibility Hierarchy holds in our sample: every language that can relativize obliques can also relativize direct objects and subjects. This verifies the implicational direction of the hierarchy.

                                                                        The gap-to-resumptive shift: languages that use gap on subjects but pronoun retention on obliques exist in our sample (Hebrew, Arabic, Welsh, Swahili). This shift is predicted by the AH: gap is the "lightest" strategy and suffices for accessible positions, but lower positions need the "heavier" resumptive strategy to recover the grammatical role.

                                                                        In our sample, all pre-nominal RC languages use the gap strategy on subjects. This makes structural sense: pre-nominal RCs are typically participial or nominalized, and the gap strategy is the simplest way to form them. Relative pronouns, which need clause-initial position, are incompatible with pre-nominal structure.

                                                                        Some languages can only relativize subjects (lowest = subject on AH). In our sample, Tagalog and Malagasy have this restriction, reflecting the Austronesian voice-and-extraction system where only the "subject" (topic/pivot) can be extracted.

                                                                        European relative-pronoun languages in our sample can relativize all positions on the AH (down to object of comparison or oblique). The inflected relative pronoun strategy is powerful enough to handle all positions, since the pronoun can take the case required by its role.

                                                                        Every language in our sample can relativize subjects. This is consistent with the AH: the subject position is the most accessible, and virtually all languages with relative clauses can relativize it.