Documentation

Linglib.Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021

@cite{anand-hardt-mccloskey-2021} — Corpus Data #

@cite{anand-hardt-mccloskey-2021}

Distributional findings from the Santa Cruz Sluicing Corpus (SCSS), a 4,700-example annotated data set of naturally occurring English sluices.

Key Findings #

  1. Sprouting dominates: 65.5% of sluices have no overt correlate, overturning the theoretical literature's focus on merger.

  2. Why dominates: why accounts for 53.8% of all sluices (2,529/4,700).

  3. Mismatches are systematic: Tense (129), modality (394), and polarity (28) mismatches between antecedent and ellipsis site are attested. Argument structure mismatches (voice, transitivity) are entirely absent.

  4. Embedding is the norm: 72.4% of sluices are embedded.

Whether a sluice has an overt correlate in the antecedent.

Merger: "Someone₁ left, but I don't know who₁" — someone is the correlate. Sprouting: "John left, but I don't know why" — no overt correlate.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Embedding context of a sluice.

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Dimension along which antecedent and ellipsis site can differ.

          The SCSS documents which mismatches are attested and which are absent. Attested mismatches challenge strict syntactic identity requirements; absent mismatches (especially argument structure) constrain theories of ellipsis licensing.

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Discourse contexts licensing polarity reversal under sluicing (§5.3).

              Polarity reversal is pragmatically conditioned — it requires the discourse context to make the reversed polarity salient.

              Instances For
                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Subtypes of novel lexical material in sluice paraphrases (§5.4).

                  These are cases where the paraphrase of the elided material contains words not present in the antecedent clause.

                  Instances For
                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      Summary statistics from the SCSS.

                      Percentages are stored as tenths (655 = 65.5%) to avoid rationals while preserving the paper's reported precision.

                      • totalSluices :
                      • sproutingPctTenths :
                      • mergerPctTenths :
                      • rootPctTenths :
                      • embeddedPctTenths :
                      • whyCount :
                      • whyPctTenths :
                      • antecedentlessCount :
                      Instances For
                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          The SCSS corpus summary.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Every attested dimension has a nonzero count; every unattested dimension has a zero count.

                            Head pairs for a simple transitive vP: v selects V, V selects D. This is the argument domain structure of "someone left" / "John ate something" — any clause with a single verb and a DP argument.

                            Equations
                            Instances For

                              Head pairs for an intransitive vP: v selects V only. Used for the antecedent of sprouting examples like "John left."

                              Equations
                              Instances For

                                SIC licenses basic sluicing: "Someone left, but I don't know who."

                                Antecedent "someone left" and ellipsis "[who] left" share the same verb, so their argument domains have identical head pairs.

                                SIC licenses object sluicing: "John ate something, but I don't know what."

                                Same verb "ate" → same head pairs.

                                A SluicingLicense for same-verb sluices is licensed.

                                SIC correctly predicts basicSluice is grammatical.

                                SIC correctly predicts objectSluice is grammatical.

                                Head pairs for a dative-assigning transitive vP. V assigns dative case to its DP complement (e.g., German helfen).

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  Head pairs for an accusative-assigning transitive vP. V assigns accusative case to its DP complement (e.g., German sehen).

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    SIC correctly predicts germanCaseMatch is grammatical: dative wh-phrase matches dative correlate. The SIC is licensed because the argument domains have structurally identical head pairs (both assign dative).

                                    SIC correctly predicts germanCaseMismatch is ungrammatical: accusative wh-phrase does not match dative correlate. The SIC blocks sluicing because dative ≠ accusative within the argument domain (@cite{merchant-2001}: German wem/wen data).

                                    T is outside the argument domain: tense mismatches are licit. The corpus confirms: 129 tense mismatches attested.

                                    Mod is outside the argument domain: modal mismatches are licit. The corpus confirms: 394 modal mismatches — the most frequent type.

                                    The SIC cleanly separates tolerated from untolerated mismatches: every mismatch dimension inside the argument domain has 0 corpus attestations; every dimension outside has nonzero attestations.

                                    Head pairs for an active transitive vP with voice flavor.

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Head pairs for a passive transitive vP with voice flavor.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        The SIC correctly blocks voice mismatches in sluicing: active v[agentive] ≠ passive v[nonThematic], and both are within the argument domain (F1 ≤ F1). The corpus confirms: 0 voice mismatches.

                                        This resolves the voice puzzle from the earlier analysis. The puzzle arose from treating voice as outside the argument domain (like T/Mod), but @cite{anand-hardt-mccloskey-2021} shows that voice flavor is encoded on v, which IS inside the argument domain.

                                        Sprouting is the dominant sluice kind (65.5%), overturning the literature's focus on merger.

                                        Why accounts for the majority of sluicing. Since virtually all why-sluices are sprouting (reason adjuncts lack overt correlates), the prototypical sluice is "John left, but I don't know why" (sprouting, reason), not "Someone left, but I don't know who" (merger, entity).

                                        Merchant's deletion domain theory converges with the corpus: sluicing (C[E]) predicts both voice and transitivity mismatches are blocked, and the SCSS finds exactly 0 attestations of each.

                                        This is a convergent prediction with the SIC (§ 7.2): both the SIC (structural identity within the argument domain) and the deletion domain analysis ([E] on C → Voice inside TP) independently predict that voice and argument structure mismatches are blocked in sluicing.