@cite{anand-hardt-mccloskey-2021} — Corpus Data #
@cite{anand-hardt-mccloskey-2021}
Distributional findings from the Santa Cruz Sluicing Corpus (SCSS), a 4,700-example annotated data set of naturally occurring English sluices.
Key Findings #
Sprouting dominates: 65.5% of sluices have no overt correlate, overturning the theoretical literature's focus on merger.
Why dominates: why accounts for 53.8% of all sluices (2,529/4,700).
Mismatches are systematic: Tense (129), modality (394), and polarity (28) mismatches between antecedent and ellipsis site are attested. Argument structure mismatches (voice, transitivity) are entirely absent.
Embedding is the norm: 72.4% of sluices are embedded.
Whether a sluice has an overt correlate in the antecedent.
Merger: "Someone₁ left, but I don't know who₁" — someone is the correlate. Sprouting: "John left, but I don't know why" — no overt correlate.
- merger : SluiceKind
- sprouting : SluiceKind
Instances For
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Dimension along which antecedent and ellipsis site can differ.
The SCSS documents which mismatches are attested and which are absent. Attested mismatches challenge strict syntactic identity requirements; absent mismatches (especially argument structure) constrain theories of ellipsis licensing.
- tense : MismatchDimension
- modality : MismatchDimension
- polarity : MismatchDimension
- newWords : MismatchDimension
- voice : MismatchDimension
- argumentStructure : MismatchDimension
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Whether a mismatch dimension is attested in the corpus.
Equations
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.tense.attested = true
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.modality.attested = true
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.polarity.attested = true
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.newWords.attested = true
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.voice.attested = false
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.argumentStructure.attested = false
Instances For
Corpus count for each mismatch dimension (SCSS §5).
Equations
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.tense.corpusCount = 129
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.modality.corpusCount = 394
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.polarity.corpusCount = 28
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.newWords.corpusCount = 71
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.voice.corpusCount = 0
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.MismatchDimension.argumentStructure.corpusCount = 0
Instances For
Discourse contexts licensing polarity reversal under sluicing (§5.3).
Polarity reversal is pragmatically conditioned — it requires the discourse context to make the reversed polarity salient.
- negRaising : PolarityReversalContext
- withoutAdjunct : PolarityReversalContext
- disjunction : PolarityReversalContext
- qudPartialAnswer : PolarityReversalContext
- howEmbedded : PolarityReversalContext
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Subtypes of novel lexical material in sluice paraphrases (§5.4).
These are cases where the paraphrase of the elided material contains words not present in the antecedent clause.
- copularClause : NewWordsSubtype
- existentialInterp : NewWordsSubtype
- strandedPreposition : NewWordsSubtype
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Summary statistics from the SCSS.
Percentages are stored as tenths (655 = 65.5%) to avoid rationals while preserving the paper's reported precision.
- totalSluices : ℕ
- sproutingPctTenths : ℕ
- mergerPctTenths : ℕ
- rootPctTenths : ℕ
- embeddedPctTenths : ℕ
- whyCount : ℕ
- whyPctTenths : ℕ
- antecedentlessCount : ℕ
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
The SCSS corpus summary.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Sprouting + merger = 100%.
Root + embedded = 100%.
Sprouting is the majority kind.
Why is the majority wh-remnant type.
Embedding is the majority context.
Modality mismatches are the most frequent mismatch type.
Voice mismatches are absent.
Argument structure mismatches are absent.
Every attested dimension has a nonzero count; every unattested dimension has a zero count.
Head pairs for a simple transitive vP: v selects V, V selects D. This is the argument domain structure of "someone left" / "John ate something" — any clause with a single verb and a DP argument.
Equations
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.transitiveVP = [{ head := Minimalism.Cat.v, complement := Minimalism.Cat.V }, { head := Minimalism.Cat.V, complement := Minimalism.Cat.D }]
Instances For
Head pairs for an intransitive vP: v selects V only. Used for the antecedent of sprouting examples like "John left."
Equations
- Phenomena.Ellipsis.Studies.AnandHardtMcCloskey2021.intransitiveVP = [{ head := Minimalism.Cat.v, complement := Minimalism.Cat.V }]
Instances For
SIC licenses basic sluicing: "Someone left, but I don't know who."
Antecedent "someone left" and ellipsis "[who] left" share the same verb, so their argument domains have identical head pairs.
SIC licenses object sluicing: "John ate something, but I don't know what."
Same verb "ate" → same head pairs.
A SluicingLicense for same-verb sluices is licensed.
SIC correctly predicts basicSluice is grammatical.
SIC correctly predicts objectSluice is grammatical.
Head pairs for a dative-assigning transitive vP. V assigns dative case to its DP complement (e.g., German helfen).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Head pairs for an accusative-assigning transitive vP. V assigns accusative case to its DP complement (e.g., German sehen).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Same-case head pairs are structurally identical (case match OK).
Case mismatch blocks structural identity.
SIC correctly predicts germanCaseMatch is grammatical:
dative wh-phrase matches dative correlate. The SIC is licensed
because the argument domains have structurally identical head pairs
(both assign dative).
SIC correctly predicts germanCaseMismatch is ungrammatical:
accusative wh-phrase does not match dative correlate. The SIC
blocks sluicing because dative ≠ accusative within the argument
domain (@cite{merchant-2001}: German wem/wen data).
V and v are inside the argument domain: argument structure must match. The corpus confirms: zero argument structure mismatches.
T is outside the argument domain: tense mismatches are licit. The corpus confirms: 129 tense mismatches attested.
Mod is outside the argument domain: modal mismatches are licit. The corpus confirms: 394 modal mismatches — the most frequent type.
The SIC cleanly separates tolerated from untolerated mismatches: every mismatch dimension inside the argument domain has 0 corpus attestations; every dimension outside has nonzero attestations.
Head pairs for an active transitive vP with voice flavor.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Head pairs for a passive transitive vP with voice flavor.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The SIC correctly blocks voice mismatches in sluicing: active v[agentive] ≠ passive v[nonThematic], and both are within the argument domain (F1 ≤ F1). The corpus confirms: 0 voice mismatches.
This resolves the voice puzzle from the earlier analysis. The puzzle arose from treating voice as outside the argument domain (like T/Mod), but @cite{anand-hardt-mccloskey-2021} shows that voice flavor is encoded on v, which IS inside the argument domain.
Sprouting is the dominant sluice kind (65.5%), overturning the literature's focus on merger.
Why accounts for the majority of sluicing. Since virtually all why-sluices are sprouting (reason adjuncts lack overt correlates), the prototypical sluice is "John left, but I don't know why" (sprouting, reason), not "Someone left, but I don't know who" (merger, entity).
Merchant's deletion domain theory converges with the corpus: sluicing (C[E]) predicts both voice and transitivity mismatches are blocked, and the SCSS finds exactly 0 attestations of each.
This is a convergent prediction with the SIC (§ 7.2): both the SIC (structural identity within the argument domain) and the deletion domain analysis ([E] on C → Voice inside TP) independently predict that voice and argument structure mismatches are blocked in sluicing.