Documentation

Linglib.Theories.Diachronic.Lexicalization

Lexicalization: Efficient Encoding of Emerging Concepts #

@cite{xu-etal-2024}

Inaugural module of Theories/Diachronic/: formal theories of language change.

Xu et al. (2024) unify word reuse and combination (compounding) under a single information-theoretic account. Both strategies for encoding novel concepts are shaped by the same tradeoff between minimizing speaker effort (word length) and minimizing information loss (listener confusion). Attested encodings in English, French, and Finnish sit near the Pareto frontier of this tradeoff.

Architecture #

The model extends RSA's speaker-listener framework to lexicon evolution:

The listener uses prototype-based categorization (Eq. 1) — an L0 whose meaning function is exp(-γ · d(c, q_w)).
The speaker trades off informativity against word-length cost — an S1 with beliefAction scoring where cost(w) = β · l(w).
The encoding E* is the set of (concept, form) pairs for emerging concepts. Its efficiency is measured against the Pareto frontier of possible encodings (Core.Efficiency).

Connection to LCEC #

The LCEC (Morphology.WP.LCEC) states that I-complexity (conditional entropy across paradigm cells) is uniformly low despite high E-complexity. The analog here: lexicons maintain low information loss despite high polysemy, because reuse and compounding are informationally efficient. Both are instances of a general principle: natural languages achieve low integrative complexity despite high enumerative complexity, via structured redundancy.

inductive Diachronic.Lexicalization.Strategy :

Strategy by which a novel concept enters the lexicon. @cite{xu-etal-2024} Table 1: reuse items (R) vs. compounds (C).

reuse : Strategy
Reuse an existing word for a new meaning. E.g., "mouse" (rodent → peripheral), "dish" (plate → antenna).
combination : Strategy
Combine existing words into a compound. E.g., "birthday card", "spreadsheet", "urban renewal".

Instances For

instance Diachronic.Lexicalization.instDecidableEqStrategy :

DecidableEq Strategy

Equations

Diachronic.Lexicalization.instDecidableEqStrategy x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

instance Diachronic.Lexicalization.instReprStrategy :

Equations

Diachronic.Lexicalization.instReprStrategy = { reprPrec := Diachronic.Lexicalization.instReprStrategy.repr }

def Diachronic.Lexicalization.instReprStrategy.repr :

Strategy → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

instance Diachronic.Lexicalization.instBEqStrategy :

Equations

Diachronic.Lexicalization.instBEqStrategy = { beq := Diachronic.Lexicalization.instBEqStrategy.beq }

def Diachronic.Lexicalization.instBEqStrategy.beq :

Strategy → Strategy → Bool

Equations

Diachronic.Lexicalization.instBEqStrategy.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)

Instances For

inductive Diachronic.Lexicalization.Literality :

Literality of the form-meaning relationship. Literal items are semantically transparent and tend to be more communicatively efficient (@cite{xu-etal-2024} §Item-Level Variation).

literal : Literality
Literal: form directly relates to the intended concept.
- Reuse: intended meaning is a hyponym of the existing sense.
- Combination: endocentric compound (head = superordinate).
nonliteral : Literality
Nonliteral: metaphorical or metonymic relationship.
- Reuse: e.g., "mouse" for computer peripheral.
- Combination: exocentric, e.g., "boîte noire" = flight recorder.

Instances For

instance Diachronic.Lexicalization.instDecidableEqLiterality :

DecidableEq Literality

Equations

Diachronic.Lexicalization.instDecidableEqLiterality x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

instance Diachronic.Lexicalization.instReprLiterality :

Repr Literality

Equations

Diachronic.Lexicalization.instReprLiterality = { reprPrec := Diachronic.Lexicalization.instReprLiterality.repr }

def Diachronic.Lexicalization.instReprLiterality.repr :

Literality → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

instance Diachronic.Lexicalization.instBEqLiterality :

Equations

Diachronic.Lexicalization.instBEqLiterality = { beq := Diachronic.Lexicalization.instBEqLiterality.beq }

def Diachronic.Lexicalization.instBEqLiterality.beq :

Literality → Literality → Bool

Equations

Diachronic.Lexicalization.instBEqLiterality.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)

Instances For

structure Diachronic.Lexicalization.FormConceptPair :

A form-concept pair in an emerging encoding (one entry in E*).

form : String
concept : String
strategy : Strategy
literality : Literality
formLength : ℕ

Instances For

instance Diachronic.Lexicalization.instReprFormConceptPair :

Repr FormConceptPair

Equations

Diachronic.Lexicalization.instReprFormConceptPair = { reprPrec := Diachronic.Lexicalization.instReprFormConceptPair.repr }

def Diachronic.Lexicalization.instReprFormConceptPair.repr :

FormConceptPair → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

def Diachronic.Lexicalization.encodingCosts (pairs : List FormConceptPair) (needProb : String → ℚ) (listenerScore : String → String → ℚ) :

Core.Efficiency.CostPair

Communicative costs of an encoding, parameterized by a listener model.

listenerScore concept form is the probability that the listener recovers concept from form. In @cite{xu-etal-2024}, this is the prototype-based categorization model (Eq. 1): m̂_{w,L}(c) ∝ exp{-γ · d(c, q_w)}.

Effort (Eq. 2) = expected word length. Information loss (Eq. 3) = expected surprisal under listener distribution. The aggregate is weighted by concept need probability.

Equations

One or more equations did not get rendered due to their size.

Instances For

def Diachronic.Lexicalization.unifiedObjective (pairs : List FormConceptPair) (needProb : String → ℚ) (listenerScore : String → String → ℚ) (β : ℚ) :

Unified objective (Eq. 5): L_β = info_loss + β · effort. Parameterizes the Pareto frontier.

Equations

Diachronic.Lexicalization.unifiedObjective pairs needProb listenerScore β = Core.Efficiency.weightedCost (Diachronic.Lexicalization.encodingCosts pairs needProb listenerScore) β

Instances For

def Diachronic.Lexicalization.asS1ScoreSpec (β : ℚ) (length : String → ℚ) :

RSA.S1ScoreSpec String String Unit

The prototype-based listener IS an RSA L0, and the unified objective IS an S1 with beliefAction scoring.

To instantiate: set RSAConfigData with

U := Form, W := Concept
meaning _ c w := exp(-γ · d(c, prototype(w)))
s1Spec := .beliefAction (fun w => β * length(w))

This function constructs the corresponding S1 scoring rule.

Equations

Diachronic.Lexicalization.asS1ScoreSpec β length = RSA.S1ScoreSpec.beliefAction fun (w : String) => β * length w

Instances For

def Diachronic.Lexicalization.moreEfficientThan (attested baseline : Core.Efficiency.CostPair) (optimalAt : ℚ → Core.Efficiency.CostPair) (βs : List ℚ) :

Efficiency Claim (Figs. 2–3): attested encodings are closer to the Pareto frontier than baseline encodings (random or near-synonym).

Equations

Diachronic.Lexicalization.moreEfficientThan attested baseline optimalAt βs = (Core.Efficiency.efficiencyLoss attested optimalAt βs < Core.Efficiency.efficiencyLoss baseline optimalAt βs)

Instances For

def Diachronic.Lexicalization.strategyTradeoff (reuseCosts compoundCosts : Core.Efficiency.CostPair) :

Strategy Tradeoff (§Strategy Comparison): reuse items tend shorter; compounds tend more informative. The two strategies occupy complementary regions of the effort-informativity space.

Equations

Diachronic.Lexicalization.strategyTradeoff reuseCosts compoundCosts = (reuseCosts.cost₁ ≤ compoundCosts.cost₁ ∧ compoundCosts.cost₂ ≤ reuseCosts.cost₂)

Instances For

def Diachronic.Lexicalization.literalAdvantage (literalCosts nonliteralCosts : Core.Efficiency.CostPair) (optimalAt : ℚ → Core.Efficiency.CostPair) (βs : List ℚ) :

Literal Advantage (§Item-Level Variation): literal items (hyponymic reuse, endocentric compounds) are more efficient than nonliteral ones, because semantic transparency reduces information loss.

Equations

One or more equations did not get rendered due to their size.

Instances For