Documentation

Linglib.Theories.Syntax.DependencyGrammar.Formal.MemorySurprisal.MorphemeOrder

Study 3: Morpheme Order Optimization (Japanese & Sesotho) #

@cite{bybee-1985} @cite{demuth-1992} @cite{doke-mofokeng-1967} @cite{hahn-degen-futrell-2021} @cite{kaiser-yamamoto-2013}

@cite{hahn-degen-futrell-2021} Study 3: morpheme orders in Japanese verb suffixes and Sesotho verb affixes are near-optimally efficient in terms of memory-surprisal trade-offs. The real morpheme orders achieve lower AUC than random baselines.

Japanese Verb Suffixes (SI §4.1, Table 2) #

Seven suffix slots ordered from stem outward:

  1. Derivation: -su (suru)
  2. VALENCE: causative -(s)ase
  3. VOICE: passive/potential -are, -rare
  4. MOOD: desiderative -ta, politeness -mas
  5. NEGATION: -na
  6. TENSE/ASPECT/MOOD: -ta (past), -yoo (future)
  7. Nonfinite: -te

Real order AUC ≈ 47.0 (morpheme level), baselines ≈ 47.2 (SI Figure 6).

Sesotho Verb Affixes (SI §4.2) #

Prefixes: subject agreement, negation, tense/aspect, object agreement Suffixes: reversive, causative, neuter, applicative, completive, reciprocal, passive, tense, mood, interrogative/relative

Real order AUC ≈ 38.7 (morpheme level), baselines ≈ 38.8 (SI Figure 8).

@cite{bybee-1985} Relevance Hierarchy #

stem < valence < voice < aspect < tense < mood < agreement

Both Japanese suffixes and Sesotho suffixes respect this hierarchy: morphemes closer to the stem are more semantically relevant to the verb.

Data Source #

https://github.com/m-hahn/memory-surprisal. Morpheme order data from SI §4.1-4.2, AUC values from SI Figures 6 and 8.

Japanese verb suffixes #

From SI §4.1, ordered from stem outward. The numbering follows @cite{kaiser-yamamoto-2013} and the UD segmentation used in the paper.

SlotCategoryMorphemeExample
1derivation-su (suru)derives verbs from Sino-Japanese
2valence-(s)asecausative
3voice-are, -rarepassive/potential
4mood-ta (desiderative)"want to"
5agreement-maspoliteness
6negation-nanegation
7tense-ta (past), -yoopast/future

Japanese suffix slots from stem outward (SI Table 2).

Equations
  • One or more equations did not get rendered due to their size.
Instances For

    Japanese suffix order respects Bybee's hierarchy through the voice slot.

    The ordering is: derivation < valence < voice < mood, which matches the relevance hierarchy. Negation and tense come after mood, which is also consistent.

    Sesotho verb affixes #

    From SI §4.2 and @cite{demuth-1992}, @cite{doke-mofokeng-1967}.

    Prefixes (from word edge inward toward stem):

    1. Subject agreement (sm)
    2. Negation (-sa-)
    3. Tense/Aspect/Mood (t')
    4. Object agreement (om) / Reflexive (rf)

    Suffixes (from stem outward):

    1. Reversive (rv) — valence
    2. Causative (c) — valence
    3. Neuter (nt) — valence
    4. Applicative (ap) — valence
    5. Completive (cl) — valence (reduplication of applicative)
    6. Reciprocal (rc) — voice
    7. Passive (p) — voice
    8. Tense (t^) — tense (perfect -il-)
    9. Mood (m^) — mood (imperative, subjunctive, indicative)
    10. Interrogative/Relative (wh/rl) — nonfinite

    Sesotho suffix template: morpheme categories from stem outward (SI §4.2).

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Sesotho prefix template: morpheme categories from word edge inward.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        Sesotho suffixes respect the relevance hierarchy: valence < voice < tense < mood < nonfinite.

        Trade-off curves #

        Approximate AUC values from SI Figures 6 and 8. AUC is computed over the memory-surprisal trade-off curve (lower = more efficient). Values are multiplied by 100 for integer arithmetic.

        Japanese morpheme-level AUC × 100. Real ≈ 47.0, Random ≈ 47.2.

        Equations
        Instances For

          Sesotho morpheme-level AUC × 100. Real ≈ 38.7, Random ≈ 38.8.

          Equations
          Instances For

            Japanese real morpheme order is more efficient than random baselines.

            Sesotho real morpheme order is more efficient than random baselines.

            Prediction accuracy #

            From SI Figure 7 (Japanese) and Figure 9 (Sesotho). The optimized morpheme order achieves higher pairwise prediction accuracy than random baselines, and similar accuracy to the real order.

            Values are accuracy × 1000.

            Japanese morpheme prediction: optimized pairs accuracy × 1000.

            Equations
            Instances For

              Japanese morpheme prediction: random baseline pairs accuracy × 1000.

              Equations
              Instances For

                Japanese morpheme prediction: real order pairs accuracy × 1000.

                Equations
                Instances For

                  Optimized and real orders vastly outperform random baseline for Japanese.

                  The real Japanese order achieves accuracy matching the optimized order.

                  Bridge: Japanese causative is innermost suffix #

                  The Bybee hierarchy predicts valence morphemes should be closest to the stem. Japanese -(s)ase (causative = valence) is slot 2, the first functional suffix after derivation. This is consistent with both:

                  1. The relevance hierarchy (valence is closest to stem meaning)
                  2. The memory-surprisal explanation (valence affects subcategorization, so placing it near the stem concentrates predictive information locally)

                  Bridge: Japanese -(s)ase = Song's COMPACT morphological causative #

                  Japanese -(s)ase is classified as a morphological COMPACT causative in @cite{song-1996}. The ik_ase entry in Fragments/Japanese/Predicates confirms this: causativeBuilder = some.make.

                  The Japanese -(s)ase causative entry is causative (derived from causativeBuilder).

                  Bridge: Relevance hierarchy ↔ Information locality #

                  The Bybee hierarchy orders morphemes by semantic relevance to the stem. The memory-surprisal framework predicts that information-locality-optimal orderings place highly predictive morphemes close to the stem. These two predictions converge: semantic relevance correlates with predictive information, so the relevance hierarchy is an instantiation of information locality at the morpheme level.

                  The near-optimality of Japanese and Sesotho morpheme orders (japanese_morpheme_efficient, sesotho_morpheme_efficient) shows that real morpheme orders are close to the information-locality optimum, and the fact that they respect the relevance hierarchy (japanese_partial_bybee, sesotho_suffixes_respect_bybee) shows that the relevance hierarchy captures the right notion of locality.