Erk & Herbelot 2024 — How to Marry a Star #
@cite{erk-herbelot-2024}
Erk, K. & Herbelot, A. (2024). How to Marry a Star: Probabilistic Constraints for Meaning in Context. Journal of Semantics 40, 549–583.
Core Mechanism #
A Situation Description System (SDS) pairs a DRS with a directed graphical model over latent concepts, semantic roles, and scenarios. Word meaning in context is modeled as a distribution over concepts, computed via Product of Experts:
P(concept | context) ∝ P_selectional(concept | role) × P_scenario(concept | frame)
Key Predictions #
- Agreement (both factors prefer same concept) → confident disambiguation
- Conflict (factors prefer different concepts) → pun/zeugma/ambiguity
- Dominance (one factor much stronger) → that factor determines reading
Structure of This File #
- §1: Concept types for the paper's running examples
- §2: Selectional constraints only (§4 of paper)
- §3: Selectional + scenario constraints (§5 of paper)
- §4: Context variations on "marry a star"
- §5: Multi-word and chained disambiguation
- §6: Constraint strength interaction
- §7: Concept features projected into DRS (§6 of paper)
- §8: Fine-grained sense distinctions (§7 of paper)
- §9: Connection to copredication
Concepts for "bat": animal vs sports equipment.
- animal : BatConcept
- equipment : BatConcept
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Concepts for "star": famous person vs celestial body.
- celebrity : StarConcept
- celestial : StarConcept
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Concepts for "port": harbor, wine, or computer port.
- harbor : PortConcept
- wine : PortConcept
- computer : PortConcept
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
"A bat was sleeping" #
SLEEP provides a strong selectional preference for animate subjects. With no scenario constraint (neutral context), selectional dominates.
Equations
- One or more equations did not get rendered due to their size.
Instances For
"A player was holding a bat" #
HOLD has weak selectional preference (both concepts are holdable), but "player" activates a SPORTS scenario that favors equipment.
Equations
- One or more equations did not get rendered due to their size.
Instances For
"The astronomer married the star" #
The paper's signature example. Selectional and scenario constraints pull in opposite directions, producing a tie that predicts the pun reading.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Neutral context: selectional dominates → CELEBRITY.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hollywood context: both factors agree → CELEBRITY reinforced.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Sci-fi context: selectional weakened, scenario pulls to celestial.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Comparison of Contexts #
| Context | P(CELEBRITY) | Reading |
|---|---|---|
| Neutral | 0.90 | Selectional wins |
| Astronomer | 0.50 | Tie → pun |
| Producer | 0.99 | Agreement |
| Alien | 0.39 | Scenario wins |
"The sailor liked the port" #
Three-way ambiguity. LIKE is neutral; "sailor" activates NAUTICAL frame. HARBOR wins, WINE remains plausible (sailors drink!), COMPUTER unlikely.
Equations
- One or more equations did not get rendered due to their size.
Instances For
"The coach told the star to play" #
Multiple context words ("coach" + "play") reinforce SPORTS frame. TELL selects animate recipient. Both factors agree → confident CELEBRITY.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Varying Scenario Strength #
"The child saw the bat" with parameterized scenario strength. SAW is perceptually neutral; the scenario from CHILD varies.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Feature Projection #
@cite{mcrae-etal-2005}: concepts have features with associated probabilities. After disambiguation, features are projected as additional DRS conditions.
For "a bat was sleeping":
- Posterior: P(ANIMAL) = 0.95
- Feature
can_fly: P(can_fly | ANIMAL) = 1.0, P(can_fly | EQUIPMENT) = 0.0 - Projected: P(can_fly | context) = 0.95 × 1.0 + 0.05 × 0.0 = 0.95
- canFly : BatFeature
- isBlack : BatFeature
- hasWings : BatFeature
- isWooden : BatFeature
- isLong : BatFeature
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
After "the astronomer married the star" (tie), features are mixed.
- isHuman : StarFeature
- isCelestialBody : StarFeature
- emitsLight : StarFeature
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
The "argument" Example (Table 3) #
"She seems to revel in arguments and loses no opportunity to declare her political principles."
Three annotators rated 5 WordNet senses on a 1–5 scale. Annotators systematically disagree, reflecting genuine uncertainty about word meaning in context.
WordNet senses of "argument" used in the paper.
- evidence : ArgumentSense
- quarrel : ArgumentSense
- proCon : ArgumentSense
- parameter : ArgumentSense
- logicalReasoning : ArgumentSense
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Graded sense applicability rating from a single annotator.
- sense : ArgumentSense
- rating : ℕ
Rating on 1–5 scale (5 = fits completely, 1 = does not fit at all)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Table 3: three annotators' ratings for "argument" in the sentence above.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
All annotators agree that parameter does not apply.
At least one annotator gives quarrel the top rating.
SDS and Copredication #
Copredication (from Phenomena.Polysemy.Data) is a degenerate case of SDS
where both concepts have non-zero posterior under different selectional
constraints applied simultaneously.
"The book is heavy and interesting":
- "heavy" selects PHYSICAL → P(PHYSICAL | heavy) is high
- "interesting" selects INFORMATIONAL → P(INFORMATIONAL | interesting) is high
- Both aspects are active → copredication is acceptable
SDS predicts this: there is no scenario conflict (both aspects coexist in normal contexts), and neither selectional constraint zeros out the other aspect's concept.
Copredication is acceptable when both aspects survive selectional filtering.
This connects Polysemy.Data.bookHeavyInteresting to SDS: the acceptability
follows from both concepts having non-zero posterior.
SDS and Humor: Formal Correspondence with @cite{kao-levy-goodman-2016} #
Both frameworks capture the same phenomenon from different angles:
| @cite{kao-levy-goodman-2016} | SDS |
|---|---|
| Multiple meanings m_a, m_b | Multiple concepts c_1, c_2 |
| Words w supporting meanings | Constraints from predicates/context |
| Ambiguity (entropy) | Posterior uncertainty |
| Distinctiveness (KL div) | Conflict between factors |
Kao's Distinctiveness measures whether different words support different meanings. SDS Conflict measures whether selectional and scenario factors prefer different concepts.
These are equivalent when we identify:
- Selectional constraints ≈ evidence from predicate words
- Scenario constraints ≈ evidence from context words
Posterior uncertainty: Gini impurity as entropy proxy for the posterior.
Corresponds to Kao's "ambiguity" measure.
- Returns 0 when one concept has probability 1 (no ambiguity)
- Returns near 0.5 for two-concept systems at maximum ambiguity
Equations
- One or more equations did not get rendered due to their size.
Instances For
Two concepts are "tied" when their posteriors are approximately equal. Corresponds to high ambiguity in Kao's model.
Equations
- One or more equations did not get rendered due to their size.
Instances For
A sentence is predicted to be a pun when:
- High posterior uncertainty (ambiguity) — both meanings plausible
- Conflict between factors (distinctiveness) — different support for each
Equations
- One or more equations did not get rendered due to their size.
Instances For
Funniness prediction based on conflict degree.
Kao found that distinctiveness (not ambiguity) predicts fine-grained funniness.
conflictDegree serves the same role.
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
"The magician got so mad he pulled his hare out" as SDS.
Selectional: "pulled out" slightly prefers hair (idiomatic), but magician context activates MAGIC frame favoring rabbits.
Equations
- One or more equations did not get rendered due to their size.
Instances For
SDS conflict corresponds exactly to different argmax across factors.
For a disambiguation scenario, hasConflict is true iff the selectional and
scenario factors have different argmax concepts. This is the formal content
of the SDS↔Kao distinctiveness correspondence.
Summary: SDS and Humor #
| Concept | @cite{kao-levy-goodman-2016} | SDS |
|---|---|---|
| Latent variable | Meaning m | Concept c |
| Evidence integration | P(m|w) via Bayes | Product of Experts |
| Uncertainty | Ambiguity (entropy) | Posterior uncertainty |
| Distinct support | Distinctiveness (KL div) | Conflict (argmax difference) |
| Humor prediction | Amb ↑ AND Dist ↑ | Uncertainty ↑ AND Conflict |
Both formalize the same intuition: Puns arise when different sources of evidence point to different interpretations.
- In Kao: different words support different meanings
- In SDS: selectional and scenario factors prefer different concepts
TODO: Full formalization requires formalizing Kao's generative model (KaoModel)
with relatedness : W → M → ℚ, defining kaoToSDS translation, and proving
quantitative bounds distinctiveness(model) ≥ f(conflictDegree(kaoToSDS model)).
This needs Real.log from Mathlib for KL divergence.