Documentation

Linglib.Theories.Syntax.DependencyGrammar.Formal.MemorySurprisal.MorphemeOrder

Study 3: Morpheme Order Optimization (Japanese & Sesotho) #

@cite{bybee-1985} @cite{demuth-1992} @cite{doke-mofokeng-1967} @cite{hahn-degen-futrell-2021} @cite{kaiser-yamamoto-2013}

@cite{hahn-degen-futrell-2021} Study 3: morpheme orders in Japanese verb suffixes and Sesotho verb affixes are near-optimally efficient in terms of memory-surprisal trade-offs. The real morpheme orders achieve lower AUC than random baselines.

Japanese Verb Suffixes (SI §4.1, Table 2) #

Seven suffix slots ordered from stem outward:

Derivation: -su (suru)
VALENCE: causative -(s)ase
VOICE: passive/potential -are, -rare
MOOD: desiderative -ta, politeness -mas
NEGATION: -na
TENSE/ASPECT/MOOD: -ta (past), -yoo (future)
Nonfinite: -te

Real order AUC ≈ 47.0 (morpheme level), baselines ≈ 47.2 (SI Figure 6).

Sesotho Verb Affixes (SI §4.2) #

Prefixes: subject agreement, negation, tense/aspect, object agreement Suffixes: reversive, causative, neuter, applicative, completive, reciprocal, passive, tense, mood, interrogative/relative

Real order AUC ≈ 38.7 (morpheme level), baselines ≈ 38.8 (SI Figure 8).

@cite{bybee-1985} Relevance Hierarchy #

stem < valence < voice < aspect < tense < mood < agreement

Both Japanese suffixes and Sesotho suffixes respect this hierarchy: morphemes closer to the stem are more semantically relevant to the verb.

Data Source #

https://github.com/m-hahn/memory-surprisal. Morpheme order data from SI §4.1-4.2, AUC values from SI Figures 6 and 8.

Japanese verb suffixes #

From SI §4.1, ordered from stem outward. The numbering follows @cite{kaiser-yamamoto-2013} and the UD segmentation used in the paper.

Slot	Category	Morpheme	Example
1	derivation	-su (suru)	derives verbs from Sino-Japanese
2	valence	-(s)ase	causative
3	voice	-are, -rare	passive/potential
4	mood	-ta (desiderative)	"want to"
5	agreement	-mas	politeness
6	negation	-na	negation
7	tense	-ta (past), -yoo	past/future

def DepGrammar.MemorySurprisal.MorphemeOrder.japaneseSuffixSlots :

List Core.Morphology.MorphCategory

Japanese suffix slots from stem outward (SI Table 2).

Equations

One or more equations did not get rendered due to their size.

Instances For

theorem DepGrammar.MemorySurprisal.MorphemeOrder.japanese_partial_bybee :

have slots := [Core.Morphology.MorphCategory.derivation , Core.Morphology.MorphCategory.valence , Core.Morphology.MorphCategory.voice , Core.Morphology.MorphCategory.mood ]; Core.Morphology.respectsRelevanceHierarchy slots = true

Japanese suffix order respects Bybee's hierarchy through the voice slot.

The ordering is: derivation < valence < voice < mood, which matches the relevance hierarchy. Negation and tense come after mood, which is also consistent.

Sesotho verb affixes #

From SI §4.2 and @cite{demuth-1992}, @cite{doke-mofokeng-1967}.

Prefixes (from word edge inward toward stem):

Subject agreement (sm)
Negation (-sa-)
Tense/Aspect/Mood (t')
Object agreement (om) / Reflexive (rf)

Suffixes (from stem outward):

Reversive (rv) — valence
Causative (c) — valence
Neuter (nt) — valence
Applicative (ap) — valence
Completive (cl) — valence (reduplication of applicative)
Reciprocal (rc) — voice
Passive (p) — voice
Tense (t^) — tense (perfect -il-)
Mood (m^) — mood (imperative, subjunctive, indicative)
Interrogative/Relative (wh/rl) — nonfinite

def DepGrammar.MemorySurprisal.MorphemeOrder.sesothoSuffixSlots :

List Core.Morphology.MorphCategory

Sesotho suffix template: morpheme categories from stem outward (SI §4.2).

Equations

One or more equations did not get rendered due to their size.

Instances For

def DepGrammar.MemorySurprisal.MorphemeOrder.sesothoPrefixSlots :

List Core.Morphology.MorphCategory

Sesotho prefix template: morpheme categories from word edge inward.

Equations

One or more equations did not get rendered due to their size.

Instances For

theorem DepGrammar.MemorySurprisal.MorphemeOrder.sesotho_suffixes_respect_bybee :

Core.Morphology.respectsRelevanceHierarchy sesothoSuffixSlots = true

Sesotho suffixes respect the relevance hierarchy: valence < voice < tense < mood < nonfinite.

Trade-off curves #

Approximate AUC values from SI Figures 6 and 8. AUC is computed over the memory-surprisal trade-off curve (lower = more efficient). Values are multiplied by 100 for integer arithmetic.

def DepGrammar.MemorySurprisal.MorphemeOrder.japaneseRealAUC100 :

Japanese morpheme-level AUC × 100. Real ≈ 47.0, Random ≈ 47.2.

Equations

DepGrammar.MemorySurprisal.MorphemeOrder.japaneseRealAUC100 = 4700

Instances For

def DepGrammar.MemorySurprisal.MorphemeOrder.japaneseRandomAUC100 :

Equations

DepGrammar.MemorySurprisal.MorphemeOrder.japaneseRandomAUC100 = 4720

Instances For

def DepGrammar.MemorySurprisal.MorphemeOrder.sesothoRealAUC100 :

Sesotho morpheme-level AUC × 100. Real ≈ 38.7, Random ≈ 38.8.

Equations

DepGrammar.MemorySurprisal.MorphemeOrder.sesothoRealAUC100 = 3870

Instances For

def DepGrammar.MemorySurprisal.MorphemeOrder.sesothoRandomAUC100 :

Equations

DepGrammar.MemorySurprisal.MorphemeOrder.sesothoRandomAUC100 = 3880

Instances For

theorem DepGrammar.MemorySurprisal.MorphemeOrder.japanese_morpheme_efficient :

japaneseRealAUC100 < japaneseRandomAUC100

Japanese real morpheme order is more efficient than random baselines.

theorem DepGrammar.MemorySurprisal.MorphemeOrder.sesotho_morpheme_efficient :

sesothoRealAUC100 < sesothoRandomAUC100

Sesotho real morpheme order is more efficient than random baselines.

Prediction accuracy #

From SI Figure 7 (Japanese) and Figure 9 (Sesotho). The optimized morpheme order achieves higher pairwise prediction accuracy than random baselines, and similar accuracy to the real order.

Values are accuracy × 1000.

def DepGrammar.MemorySurprisal.MorphemeOrder.japanesePairsOptimized :

Japanese morpheme prediction: optimized pairs accuracy × 1000.

Equations

DepGrammar.MemorySurprisal.MorphemeOrder.japanesePairsOptimized = 953

Instances For

def DepGrammar.MemorySurprisal.MorphemeOrder.japanesePairsRandom :

Japanese morpheme prediction: random baseline pairs accuracy × 1000.

Equations

DepGrammar.MemorySurprisal.MorphemeOrder.japanesePairsRandom = 496

Instances For

def DepGrammar.MemorySurprisal.MorphemeOrder.japanesePairsReal :

Japanese morpheme prediction: real order pairs accuracy × 1000.

Equations

DepGrammar.MemorySurprisal.MorphemeOrder.japanesePairsReal = 953

Instances For

theorem DepGrammar.MemorySurprisal.MorphemeOrder.japanese_optimized_beats_random :

japanesePairsOptimized > japanesePairsRandom

Optimized and real orders vastly outperform random baseline for Japanese.

theorem DepGrammar.MemorySurprisal.MorphemeOrder.japanese_real_matches_optimized :

japanesePairsReal = japanesePairsOptimized

The real Japanese order achieves accuracy matching the optimized order.

Bridge: Japanese causative is innermost suffix #

The Bybee hierarchy predicts valence morphemes should be closest to the stem. Japanese -(s)ase (causative = valence) is slot 2, the first functional suffix after derivation. This is consistent with both:

The relevance hierarchy (valence is closest to stem meaning)
The memory-surprisal explanation (valence affects subcategorization, so placing it near the stem concentrates predictive information locally)

theorem DepGrammar.MemorySurprisal.MorphemeOrder.japanese_causative_is_valence :

japaneseSuffixSlots [1]? = some Core.Morphology.MorphCategory.valence

Japanese causative -(s)ase is in the valence slot (slot 2).

theorem DepGrammar.MemorySurprisal.MorphemeOrder.valence_is_innermost_functional :

japaneseSuffixSlots [0]? = some Core.Morphology.MorphCategory.derivation ∧ japaneseSuffixSlots [1]? = some Core.Morphology.MorphCategory.valence

The valence slot is the first functional slot (after derivation).

Bridge: Japanese -(s)ase = Song's COMPACT morphological causative #

Japanese -(s)ase is classified as a morphological COMPACT causative in @cite{song-1996}. The ik_ase entry in Fragments/Japanese/Predicates confirms this: causativeBuilder = some.make.

theorem DepGrammar.MemorySurprisal.MorphemeOrder.ik_ase_is_causative :

Fragments.Japanese.Predicates.ik_ase.causativeBuilder.isSome = true

The Japanese -(s)ase causative entry is causative (derived from causativeBuilder).

theorem DepGrammar.MemorySurprisal.MorphemeOrder.ik_ase_is_make :

Fragments.Japanese.Predicates.ik_ase.causativeBuilder = some NadathurLauer2020.Builder.CausativeBuilder.make

Japanese -(s)ase uses the.make causative builder (direct causation).

Bridge: Relevance hierarchy ↔ Information locality #

The Bybee hierarchy orders morphemes by semantic relevance to the stem. The memory-surprisal framework predicts that information-locality-optimal orderings place highly predictive morphemes close to the stem. These two predictions converge: semantic relevance correlates with predictive information, so the relevance hierarchy is an instantiation of information locality at the morpheme level.

The near-optimality of Japanese and Sesotho morpheme orders (japanese_morpheme_efficient, sesotho_morpheme_efficient) shows that real morpheme orders are close to the information-locality optimum, and the fact that they respect the relevance hierarchy (japanese_partial_bybee, sesotho_suffixes_respect_bybee) shows that the relevance hierarchy captures the right notion of locality.

theorem DepGrammar.MemorySurprisal.MorphemeOrder.relevance_hierarchy_implies_locality :

Core.Morphology.respectsRelevanceHierarchy sesothoSuffixSlots = true ∧ sesothoRealAUC100 < sesothoRandomAUC100