Study 1: Artificial Language Learning (@cite{fedzechkina-newport-2012}/2017) #
@cite{fedzechkina-newport-2012} @cite{fedzechkina-newport-2017} @cite{hahn-degen-futrell-2021}
@cite{hahn-degen-futrell-2021} Study 1 reanalyzes @cite{fedzechkina-newport-2012}: learners of an artificial language with flexible word order converge toward orders that minimize dependency length — and these orders also achieve more efficient memory-surprisal trade-offs.
Setup #
Two mini-languages with identical lexicons but different word orders:
- Language A: complex NP placed sentence-initially (short dependencies)
- Language B: complex NP placed sentence-finally (long dependencies)
Mixed-complexity sentences (one simple NP + one complex NP) create the critical contrast. Language A's order minimizes dependency length because the verb is closer to both arguments.
Key Result #
Learners exposed to a 50/50 mixture of both orders converge toward Language A's order (~67% use by end of training), showing a learning bias for dependency-length-minimizing (= memory-efficient) orders.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Word order for a transitive sentence.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Concrete examples #
"The big cat chased the dog" with complex NP = "the big cat" (3 words) and simple NP = "the dog" (2 words).
Language A (SOV, complex first): the big cat | the dog | chased Language B (SOV, complex last): the dog | the big cat | chased
Language A SOV: "the-big-cat the-dog chased" Words: the(0) big(1) cat(2) the(3) dog(4) chased(5) Dependencies:
- det: cat(2) ← the(0) length 2
- amod: cat(2) ← big(1) length 1
- nsubj: chased(5) ← cat(2) length 3
- det: dog(4) ← the(3) length 1
- obj: chased(5) ← dog(4) length 1 Total = 8
Equations
- One or more equations did not get rendered due to their size.
Instances For
Language B SOV: "the-dog the-big-cat chased" Words: the(0) dog(1) the(2) big(3) cat(4) chased(5) Dependencies:
- det: dog(1) ← the(0) length 1
- nsubj: chased(5) ← dog(1) length 4 (long!)
- det: cat(4) ← the(2) length 2
- amod: cat(4) ← big(3) length 1
- obj: chased(5) ← cat(4) length 1 Total = 9
Equations
- One or more equations did not get rendered due to their size.
Instances For
Language A has shorter total dependency length than Language B.
Language A trade-off curve (5 points, from Figure 7). Lower AUC = more efficient.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Language B trade-off curve (5 points, from Figure 7). Higher AUC = less efficient.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Language A has lower AUC (more efficient trade-off).
Language A satisfies the efficient trade-off hypothesis vs B.
Learner convergence rate: proportion choosing Language A's order × 1000.
By end of training, ~67% of productions used the short-dependency order (@cite{fedzechkina-newport-2012}, Figure 2). This exceeds chance (50%).
Instances For
Learners converge above chance toward the efficient order.
Bridge theorem: the language with shorter dependencies also has the more efficient memory-surprisal trade-off.
This connects DLM (structural) to information-theoretic efficiency: shorter dependencies concentrate predictive information locally, yielding steeper I_t decay and better trade-off curves.