Documentation

Linglib.Phenomena.WordOrder.Studies.ArnoldEtAl2000

Heaviness vs. Newness in Constituent Ordering #

@cite{arnold-wasow-losongco-ginstrom-2000}

A corpus analysis and an elicitation experiment disentangle two confounded predictors of English constituent ordering:

  1. Heaviness — structural complexity, measured by relative word count
  2. Newness — discourse status: given/inferable vs. new

These factors are naturally confounded: new referents require more descriptive material, so they tend to be heavier. Arnold et al. use logistic regression to show that in both constructions studied — dative alternation and heavy NP shift — both weight and newness independently predict construction choice.

Studies #

Constructions #

The "heavy/new last" principle: speakers place heavier and newer constituents later. In DA, DO puts the theme last; PD puts the recipient last. In HNPS, shifting puts the direct object after the PP (later position).

Central Finding #

Both heaviness and newness independently contribute to ordering in both constructions. Neither factor can be reduced to the other. The interaction between them (significant only in the experiment) shows they function as competing constraints: each factor's effect is larger when the other is less constraining.

Bridges #

Constructions studied in the corpus analysis.

  • dativeAlternation : Construction

    Dative alternation with "give": DO (V Rec Theme) vs. PD (V Theme to-Rec).

  • heavyNPShift : Construction

    Heavy NP shift: nonshifted (V DO PP) vs. shifted (V PP DO). Uses "bring...to" and "take...into account."

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Corpus verb token counts (Table 1).

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Heaviness categories for dative alternation (Table 2). Measured as relative length: theme NP length − goal NP length.

          • themeShorter : DAHeaviness

            Theme shorter: theme − goal ≤ −2

          • themeEqualGoal : DAHeaviness

            Theme ≈ goal: theme − goal between −1 and 1

          • themeLonger : DAHeaviness

            Theme longer: theme − goal ≥ 2

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Heaviness categories for heavy NP shift (Table 3). Measured as relative length: DO length − PP length.

              Instances For
                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Most DA items have theme longer than goal (57%): English datives typically have longer themes, consistent with the heavy-last tendency.

                  48 participants (24 pairs), 42 sessions included post-exclusion, 1684 instructions in final analysis.

                  Equations
                  Instances For

                    Newness conditions in the experiment.

                    • themeGiven : ExpNewness

                      Theme is given (= goal is new)

                    • bothGiven : ExpNewness

                      Both constituents are given

                    • goalGiven : ExpNewness

                      Goal is given (= theme is new)

                    Instances For
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        "Both given" is extremely rare (< 2%), confirming the experiment successfully manipulated newness as a between-constituent contrast.

                        Which factors were selected by the logistic regression for each analysis.

                        • label : String
                        • heavinessSig : Bool

                          Heaviness is a significant predictor

                        • newnessSig : Bool

                          Newness is a significant predictor

                        • interactionSig : Bool

                          Newness × heaviness interaction is significant

                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For
                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Corpus DA: both heaviness and newness significant, no interaction.

                              Equations
                              Instances For

                                Corpus HNPS: heaviness, newness, AND verb significant, no interactions. (Verb effect: take into account has higher shifting rate than bring to, likely because it is an opaque collocation.)

                                Equations
                                Instances For

                                  Experiment DA: heaviness, newness, AND their interaction significant. (Production difficulty also significant but omitted from structure.)

                                  Equations
                                  Instances For

                                    No interaction in either corpus analysis: heaviness and newness contribute independently.

                                    The experiment finds a significant interaction: heaviness has the largest effect when both constituents share newness status, and vice versa.

                                    −2 × Log Likelihood Ratio values (× 10 for integer encoding) from the paper's logistic regressions. Larger values = stronger predictor.

                                    Equations
                                    Instances For

                                      In the experiment, newness dominates: its effect is 30× larger than heaviness. This reversal reflects the narrower heaviness range in the experiment (Table 6: range −8 to 20 words) vs. corpus (−29 to 35).

                                      Heaviness effect is stronger in the corpus than in the experiment, consistent with the wider weight range in naturally occurring data.

                                      Newness effect is stronger in the experiment than in the corpus, consistent with the experiment's more controlled newness manipulation (immediate mention vs. within-agenda-item mention).

                                      Average difference in NP length (phrase 1 − phrase 2, × 10) for each heaviness category, from Table 6. Shows the actual weight contrasts across the three data sets.

                                      For DA: phrase 1 = theme NP, phrase 2 = goal NP. For HNPS: phrase 1 = direct object NP, phrase 2 = prepositional phrase.

                                      Instances For
                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For
                                          Equations
                                          Instances For
                                            Equations
                                            Instances For

                                              The corpus data spans a far wider heaviness range than the experiment. This explains why heaviness dominates in the corpus but not the experiment: with less variation in weight, there is less for the weight factor to predict.

                                              HNPS has the widest heaviness range overall, spanning 65 words of difference between the lightest and heaviest items.

                                              Arnold et al.'s "given" (previously mentioned or inferable from something mentioned within the current agenda item in the corpus; established by question or mention in the immediately preceding utterance in the experiment) maps to DiscourseStatus.given.

                                              Their classification collapses @cite{prince-1981}'s three-way given/inferable/new into two categories: inferables are grouped with given.

                                              Equations
                                              Instances For

                                                Arnold et al.'s "new" (not previously mentioned and not inferable) maps to DiscourseStatus.new. This is broader than @cite{kratzer-selkirk-2020}'s .new — it includes material that K&S would mark as .focused ([FoC]-marked, contrasted).

                                                Equations
                                                Instances For

                                                  DLM: Correct on weight, blind to discourse #

                                                  totalDepLength is defined over Dependency = (headIdx × depIdx × DepRel). The function never accesses t.words, so no property of the words — form, category, features, discourse status — enters the computation.

                                                  Arnold et al.'s finding that newness significantly predicts ordering in BOTH constructions (even after controlling for heaviness) means DLM alone is insufficient as a complete account of constituent ordering.

                                                  theorem Phenomena.WordOrder.Studies.ArnoldEtAl2000.totalDepLength_word_invariant (deps : List DepGrammar.Dependency) (rootIdx : ) (words1 words2 : List Word) :
                                                  DepGrammar.DependencyLength.totalDepLength { words := words1, deps := deps, rootIdx := rootIdx } = DepGrammar.DependencyLength.totalDepLength { words := words2, deps := deps, rootIdx := rootIdx }

                                                  DLM word-invariance. totalDepLength yields the same value for any two trees sharing the same dependency structure, regardless of the words.

                                                  theorem Phenomena.WordOrder.Studies.ArnoldEtAl2000.dlm_discourse_blind (deps : List DepGrammar.Dependency) (rootIdx : ) (givenWords newWords : List Word) :
                                                  DepGrammar.DependencyLength.totalDepLength { words := givenWords, deps := deps, rootIdx := rootIdx } = DepGrammar.DependencyLength.totalDepLength { words := newWords, deps := deps, rootIdx := rootIdx }

                                                  DLM assigns identical cost to trees differing only in whether NPs are discourse-given or discourse-new.

                                                  theorem Phenomena.WordOrder.Studies.ArnoldEtAl2000.depLength_ignores_relation (h d : ) (r1 r2 : UD.DepRel) :
                                                  DepGrammar.DependencyLength.depLength { headIdx := h, depIdx := d, depType := r1 } = DepGrammar.DependencyLength.depLength { headIdx := h, depIdx := d, depType := r2 }

                                                  Even at the single-dependency level, depLength ignores the grammatical relation. The cost is purely |headIdx - depIdx|.

                                                  A pure-discourse ordering model: the preference for placing a constituent in late position is determined solely by its discourse status.

                                                  Instances For

                                                    A pure-discourse model is weight-blind by type: for a fixed discourse status, it assigns the same preference regardless of constituent length.

                                                    Arnold et al.'s corpus results refute pure-discourse accounts: heaviness is significant in BOTH constructions even after controlling for newness. A weight-blind model cannot explain these results.

                                                    @[reducible, inline]

                                                    The minimal adequate model type: a function of both weight and discourse status, encoding Arnold et al.'s central finding.

                                                    Equations
                                                    Instances For