Lexical Typology (WALS Chapters 129--142) #
@cite{dryer-haspelmath-2013}
Cross-linguistic data on lexical categorization from 16 WALS features spanning body-part terminology (Ch 129--130), colour terminology (Ch 132--135), pronominal root patterns (Ch 136--137), the Wanderwort "tea" (Ch 138), sign language features (Ch 139--140), writing systems (Ch 141), and para-linguistic click usage (Ch 142).
These chapters address a question that sits at the intersection of lexical semantics and anthropological linguistics: how do languages carve up conceptual space into words? The body-part and colour chapters are classic case studies in the universals-vs-relativity debate. The pronoun chapters probe whether certain phonological shapes are universally associated with person reference. The tea chapter traces a single Wanderwort across the globe, providing a window into contact history through lexical borrowing.
Body-Part Terms (Ch 129--130) #
- F129A: Hand and Arm -- whether a language uses the same or different words for 'hand' and 'arm'. Identical forms (228/617 = 37%) vs distinct forms (389/617 = 63%).
- F130A: Finger and Hand -- whether a language uses the same or different words for 'finger' and 'hand'. Identical forms are rare (72/593 = 12%).
- F130B: Cultural Categories -- among languages with finger=hand identity, the cultural type: hunter-gatherers (46/72), farmer-foragers (18/72), full-fledged farmers (8/72).
Colour Terms (Ch 132--135) #
- F132A: Non-Derived Basic Colour Categories -- how many non-derived basic colour categories a language has (3 to 6).
- F133A: Basic Colour Categories -- total number of basic colour categories including derived ones (3--4 to 11).
- F134A: Green and Blue -- whether a language distinguishes green from blue, merges them (grue), or has other patterns.
- F135A: Red and Yellow -- whether a language distinguishes red from yellow or merges them.
Pronominal Roots (Ch 136--137) #
- F136A: M-T Pronouns -- whether the language has an m/t pattern in 1SG/2SG pronouns (paradigmatic or non-paradigmatic).
- F136B: M in 1SG -- whether 1SG has an m-initial form.
- F137A: N-M Pronouns -- whether the language has an n/m pattern in 1SG/2SG pronouns.
- F137B: M in 2SG -- whether 2SG has an m-initial form.
Tea (Ch 138) #
- F138A: Tea -- whether the word for 'tea' derives from Sinitic cha, Min Nan te, or is an independent form.
Sign Language Features (Ch 139--140) #
- F139A: Irregular Negatives -- how many irregular negative signs a sign language has.
- F140A: Question Particles -- whether a sign language uses question particles.
Writing Systems (Ch 141) and Clicks (Ch 142) #
- F141A: Writing Systems -- type of writing system (only 6 languages in WALS sample; highly incomplete).
- F142A: Para-Linguistic Usages of Clicks -- whether clicks are used for logical meanings (negation/affirmation) or affective meanings (disgust/annoyance).
Whether a language uses the same or different lexemes for 'hand' and 'arm'. Many languages worldwide use a single term covering both concepts (e.g., Japanese te, Russian ruka), while others lexically distinguish them (e.g., English hand vs arm).
- identical : HandArmRelation
The same word covers both 'hand' and 'arm'.
- different : HandArmRelation
Distinct words for 'hand' and 'arm'.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether a language uses the same or different lexemes for 'finger' and 'hand'. Identity of 'finger' and 'hand' is cross-linguistically rare (12% of sample) and correlates with subsistence type.
- identical : FingerHandRelation
The same word covers both 'finger' and 'hand'.
- different : FingerHandRelation
Distinct words for 'finger' and 'hand'.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of non-derived basic colour categories (F132A). Follows the Berlin & Kay hierarchy: languages range from 3 to 6 non-derived basic colour terms.
- three : NonDerivedColourCount
- threeHalf : NonDerivedColourCount
- four : NonDerivedColourCount
- fourHalf : NonDerivedColourCount
- five : NonDerivedColourCount
- fiveHalf : NonDerivedColourCount
- six : NonDerivedColourCount
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Total number of basic colour categories including derived ones (F133A). Ranges from 3--4 (minimal systems) to 11 (maximal, e.g., English, Russian).
- v3to4 : BasicColourCount
- v4to5 : BasicColourCount
- v6to6h : BasicColourCount
- v7to7h : BasicColourCount
- v8to8h : BasicColourCount
- v9to10 : BasicColourCount
- v11 : BasicColourCount
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
How a language treats the green-blue region of colour space (F134A). The classic grue/green-blue distinction.
- distinct : GreenBlueRelation
Separate terms for green and blue.
- merged : GreenBlueRelation
A single 'grue' term covering both green and blue.
- blackGreenBlue : GreenBlueRelation
A single term covering black, green, and blue.
- blackBlueVsGreen : GreenBlueRelation
Black/blue merged, green separate.
- yellowGreenBlue : GreenBlueRelation
Yellow, green, blue all merged.
- yellowGreenVsBlue : GreenBlueRelation
Yellow/green merged, blue separate.
- noTerm : GreenBlueRelation
No green or blue term at all.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
How a language treats the red-yellow region of colour space (F135A).
- distinct : RedYellowRelation
Separate terms for red and yellow.
- merged : RedYellowRelation
A single term covering both red and yellow.
- yellowGreenBlueVsRed : RedYellowRelation
Yellow/green/blue merged, vs red.
- yellowGreenVsRed : RedYellowRelation
Yellow/green merged, vs red.
- noTerm : RedYellowRelation
No red or yellow term at all.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
M-T pronoun pattern (F136A): whether 1SG has /m/ and 2SG has /t/, a widespread cross-linguistic pattern noted by many typologists.
- absent : MTPronounPattern
No M-T pattern in the pronoun paradigm.
- paradigmatic : MTPronounPattern
M-T pattern is paradigmatic (systematic across forms).
- nonParadigmatic : MTPronounPattern
M-T pattern is non-paradigmatic (sporadic).
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether 1SG has an m-initial or m-containing form (F136B).
Instances For
Equations
- Phenomena.LexicalTypology.Typology.instBEqMIn1SG.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
N-M pronoun pattern (F137A): whether 1SG has /n/ and 2SG has /m/.
- absent : NMPronounPattern
No N-M pattern.
- paradigmatic : NMPronounPattern
N-M pattern is paradigmatic.
- nonParadigmatic : NMPronounPattern
N-M pattern is non-paradigmatic.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether 2SG has an m-initial or m-containing form (F137B).
Instances For
Equations
- Phenomena.LexicalTypology.Typology.instBEqMIn2SG.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Origin of the word for 'tea' (F138A). One of the most striking Wanderworter: nearly all words for tea worldwide derive from either Sinitic cha (spread overland via the Silk Road) or Min Nan te (spread by sea via Dutch trade).
- cha : TeaWordOrigin
Derived from Sinitic cha (e.g., Hindi chai, Russian chaj, Turkish cay, Japanese cha, Arabic shay).
- te : TeaWordOrigin
Derived from Min Nan Chinese te (e.g., English tea, French the, German Tee, Spanish te, Finnish tee).
- other : TeaWordOrigin
Independent form, not from either source.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Number of irregular negative signs in a sign language (F139A).
- none : IrregularNegativeCount
- one : IrregularNegativeCount
- some : IrregularNegativeCount
- many : IrregularNegativeCount
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Question particle usage in sign languages (F140A).
- none : SignQuestionParticle
- one : SignQuestionParticle
- moreThanOne : SignQuestionParticle
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Type of writing system (F141A). Note: WALS sample is tiny (6 languages) and only covers non-alphabetic systems in the Americas and West Africa.
- alphabetic : WritingSystemType
- consonantal : WritingSystemType
- alphasyllabic : WritingSystemType
- syllabic : WritingSystemType
- logographic : WritingSystemType
- mixedLogographicSyllabic : WritingSystemType
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Para-linguistic usage of clicks (F142A). Click sounds are used para-linguistically even in languages that lack phonemic clicks.
- logical : ClickUsage
Clicks used for logical meanings: negation ("tsk-tsk" = no), affirmation, or other propositional functions.
- affective : ClickUsage
Clicks used for affective/expressive meanings: annoyance, disapproval, sympathy, or attention-getting.
- otherOrNone : ClickUsage
Other usage or no para-linguistic clicks attested.
Instances For
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
A language's lexical typology profile across WALS Chapters 129--142.
Fields are Option because coverage varies enormously across features:
the body-part chapters cover ~600 languages, the pronoun/tea chapters
cover ~230, the colour chapters ~120, and writing systems only 6.
- language : String
Language name.
- iso : String
ISO 639-3 code.
- family : String
Language family.
- handArm : Option HandArmRelation
F129A: Whether 'hand' and 'arm' are the same word.
- fingerHand : Option FingerHandRelation
F130A: Whether 'finger' and 'hand' are the same word.
- nonDerivedColours : Option NonDerivedColourCount
F132A: Number of non-derived basic colour categories.
- basicColours : Option BasicColourCount
F133A: Total number of basic colour categories.
- greenBlue : Option GreenBlueRelation
F134A: Green-blue distinction.
- redYellow : Option RedYellowRelation
F135A: Red-yellow distinction.
- mtPronouns : Option MTPronounPattern
F136A: M-T pronoun pattern.
F136B: M in 1SG.
- nmPronouns : Option NMPronounPattern
F137A: N-M pronoun pattern.
F137B: M in 2SG.
- tea : Option TeaWordOrigin
F138A: Origin of the word for 'tea'.
- clicks : Option ClickUsage
F142A: Para-linguistic click usage.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
English (Indo-European, Germanic). Distinct hand/arm and finger/hand. 6 non-derived colour categories, 11 total basic colours (the Berlin-Kay maximum). Green and blue are distinct; red and yellow are distinct. No M-T pronoun pattern but 1SG "me/my" has /m/. Tea from Min Nan te. Clicks used affectively (tut-tut for disapproval).
Equations
- One or more equations did not get rendered due to their size.
Instances For
French (Indo-European, Romance). Distinct hand/arm (main vs bras) and finger/hand (doigt vs main). 6 non-derived, 11 total colour categories. Green (vert) and blue (bleu) distinct. M-T paradigmatic (moi/toi). Tea from te (the).
Equations
- One or more equations did not get rendered due to their size.
Instances For
German (Indo-European, Germanic). Distinct hand/arm (Hand vs Arm) and finger/hand (Finger vs Hand). 6 non-derived, 11 total colour categories. Green (grun) and blue (blau) distinct. M-T paradigmatic (mich/dich). Tea from te (Tee). Clicks used affectively.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Spanish (Indo-European, Romance). Distinct hand/arm (mano vs brazo) and finger/hand (dedo vs mano). 6 non-derived, 11 total colour categories. Green (verde) and blue (azul) distinct. M-T paradigmatic (me/te). Tea from te. Clicks affective.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Russian (Indo-European, Slavic). Ruka covers both 'hand' and 'arm' (identical). Distinct finger (palec) and hand (ruka). 6 non-derived colour categories; 11 total basic colours (Russian famously distinguishes sinij 'dark blue' from goluboj 'light blue', but WALS counts both). Green (zelenyj) and blue (sinij) distinct. M-T paradigmatic (menja/tebja). Tea from cha (chaj). Clicks affective.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese (Japonic). Te covers both 'hand' and 'arm' (identical). Distinct finger (yubi) and hand (te). 6 non-derived, 11 total colour categories. Green (midori) and blue (ao) are distinct in modern Japanese (though ao historically covered both). Red (aka) and yellow (kiiro) distinct. No M-T pattern. Tea from cha. Clicks affective.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mandarin Chinese (Sino-Tibetan). Distinct hand/arm (shou vs bei/gebo) and finger/hand. 6 non-derived colour categories; 8--8.5 total basic colours. Green (lu) and blue (lan) distinct. Red (hong) and yellow (huang) distinct. No M-T pattern; no /m/ in 1SG (wo). Tea from cha.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Korean (Koreanic). Distinct hand/arm and finger/hand. 6 non-derived, 11 total colour categories. Green and blue distinct. No M-T pattern. Tea from cha. Clicks affective.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Turkish (Turkic). Distinct hand/arm (el vs kol) and finger/hand (parmak vs el). M-T paradigmatic (ben/sen with older forms showing m/t). Tea from cha (cay). Clicks used for logical meanings (tongue-click for negation).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish (Uralic). Distinct hand/arm (kasi vs kasivarsi) and finger/hand (sormi vs kasi). M-T paradigmatic (mina/sina). Tea from te (tee). Clicks affective.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hungarian (Uralic). Distinct hand/arm (kez vs kar) and finger/hand (ujj vs kez). M-T paradigmatic (like Finnish). Tea from te. Clicks affective.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hindi (Indo-European, Indo-Aryan). M-T paradigmatic (main/tum show m/t). 1SG has /m/ (main). Tea from cha (chai). Clicks used for logical meanings (tongue-click for negation in many South Asian languages).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Arabic (Egyptian) (Afro-Asiatic, Semitic). No M-T pattern; no /m/ in 1SG (ana). Tea from cha (shay).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Swahili (Niger-Congo, Bantu). Mkono covers both 'hand' and 'arm' (identical). Distinct finger/hand. No M-T pattern; 1SG has /m/ (mimi). Tea from cha. Clicks affective.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tagalog (Austronesian). Distinct hand/arm and finger/hand. No M-T pattern. Tea from cha. Clicks: other/none.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All language profiles in the sample.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Most languages distinguish 'hand' from 'arm' (389 vs 228).
Finger-hand identity is rare: only 72/593 = 12% of languages.
Among languages with finger=hand identity, hunter-gatherers dominate (46/72 = 64%).
Grue (merged green/blue) is the majority pattern for the green-blue dimension (68/120 = 57%), far exceeding the distinct-terms pattern (30).
Red and yellow are almost always distinguished: 98/120 = 82% of languages have separate terms.
The M-T pronoun pattern is a minority pattern: only 30/230 languages show any form of it. Most languages (200/230) lack it.
The cha route for tea is slightly more common than te: 110 vs 84 languages. Both vastly outnumber independent forms (36).
Affective click usage is more common than logical click usage (71 vs 47 languages).
In our sample, all European languages with an M-T pattern also have /m/ in 1SG (expected, since the M-T pattern implies 1SG=m).
The cha/te split in our sample follows geography: East/South Asian + Russian languages use cha, while Western European languages use te.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
All languages in our sample with colour data have exactly 6 non-derived basic colour categories.
All languages in our sample with colour data distinguish green from blue (no grue languages in our major-language sample).