Documentation

Linglib.Phenomena.Polysemy.Studies.ErkHerbelot2024

Erk & Herbelot 2024 — How to Marry a Star #

@cite{erk-herbelot-2024}

Erk, K. & Herbelot, A. (2024). How to Marry a Star: Probabilistic Constraints for Meaning in Context. Journal of Semantics 40, 549–583.

Core Mechanism #

A Situation Description System (SDS) pairs a DRS with a directed graphical model over latent concepts, semantic roles, and scenarios. Word meaning in context is modeled as a distribution over concepts, computed via Product of Experts:

P(concept | context) ∝ P_selectional(concept | role) × P_scenario(concept | frame)

Key Predictions #

  1. Agreement (both factors prefer same concept) → confident disambiguation
  2. Conflict (factors prefer different concepts) → pun/zeugma/ambiguity
  3. Dominance (one factor much stronger) → that factor determines reading

Structure of This File #

Concepts for "bat": animal vs sports equipment.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Concepts for "star": famous person vs celestial body.

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Concepts for "port": harbor, wine, or computer port.

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              "A bat was sleeping" #

              SLEEP provides a strong selectional preference for animate subjects. With no scenario constraint (neutral context), selectional dominates.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                "A player was holding a bat" #

                HOLD has weak selectional preference (both concepts are holdable), but "player" activates a SPORTS scenario that favors equipment.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  "The astronomer married the star" #

                  The paper's signature example. Selectional and scenario constraints pull in opposite directions, producing a tie that predicts the pun reading.

                  Neutral context: selectional dominates → CELEBRITY.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    Hollywood context: both factors agree → CELEBRITY reinforced.

                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      Sci-fi context: selectional weakened, scenario pulls to celestial.

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        Comparison of Contexts #

                        ContextP(CELEBRITY)Reading
                        Neutral0.90Selectional wins
                        Astronomer0.50Tie → pun
                        Producer0.99Agreement
                        Alien0.39Scenario wins

                        "The sailor liked the port" #

                        Three-way ambiguity. LIKE is neutral; "sailor" activates NAUTICAL frame. HARBOR wins, WINE remains plausible (sailors drink!), COMPUTER unlikely.

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          "The coach told the star to play" #

                          Multiple context words ("coach" + "play") reinforce SPORTS frame. TELL selects animate recipient. Both factors agree → confident CELEBRITY.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Varying Scenario Strength #

                            "The child saw the bat" with parameterized scenario strength. SAW is perceptually neutral; the scenario from CHILD varies.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Feature Projection #

                              @cite{mcrae-etal-2005}: concepts have features with associated probabilities. After disambiguation, features are projected as additional DRS conditions.

                              For "a bat was sleeping":

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For
                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  After "the astronomer married the star" (tie), features are mixed.

                                  Instances For
                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For
                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        The "argument" Example (Table 3) #

                                        "She seems to revel in arguments and loses no opportunity to declare her political principles."

                                        Three annotators rated 5 WordNet senses on a 1–5 scale. Annotators systematically disagree, reflecting genuine uncertainty about word meaning in context.

                                        WordNet senses of "argument" used in the paper.

                                        Instances For
                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Graded sense applicability rating from a single annotator.

                                            • rating :

                                              Rating on 1–5 scale (5 = fits completely, 1 = does not fit at all)

                                            Instances For
                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Table 3: three annotators' ratings for "argument" in the sentence above.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For
                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For
                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      SDS and Copredication #

                                                      Copredication (from Phenomena.Polysemy.Data) is a degenerate case of SDS where both concepts have non-zero posterior under different selectional constraints applied simultaneously.

                                                      "The book is heavy and interesting":

                                                      SDS predicts this: there is no scenario conflict (both aspects coexist in normal contexts), and neither selectional constraint zeros out the other aspect's concept.

                                                      Copredication is acceptable when both aspects survive selectional filtering. This connects Polysemy.Data.bookHeavyInteresting to SDS: the acceptability follows from both concepts having non-zero posterior.

                                                      SDS and Humor: Formal Correspondence with @cite{kao-levy-goodman-2016} #

                                                      Both frameworks capture the same phenomenon from different angles:

                                                      @cite{kao-levy-goodman-2016}SDS
                                                      Multiple meanings m_a, m_bMultiple concepts c_1, c_2
                                                      Words w supporting meaningsConstraints from predicates/context
                                                      Ambiguity (entropy)Posterior uncertainty
                                                      Distinctiveness (KL div)Conflict between factors

                                                      Kao's Distinctiveness measures whether different words support different meanings. SDS Conflict measures whether selectional and scenario factors prefer different concepts.

                                                      These are equivalent when we identify:

                                                      Posterior uncertainty: Gini impurity as entropy proxy for the posterior.

                                                      Corresponds to Kao's "ambiguity" measure.

                                                      • Returns 0 when one concept has probability 1 (no ambiguity)
                                                      • Returns near 0.5 for two-concept systems at maximum ambiguity
                                                      Equations
                                                      • One or more equations did not get rendered due to their size.
                                                      Instances For
                                                        def Phenomena.Polysemy.Studies.ErkHerbelot2024.isTied {α : Type u_1} {Θ : Type u_2} [Semantics.Probabilistic.SDS.Core.SDSConstraintSystem α Θ] (sys : α) (c1 c2 : Θ) (tolerance : := 1 / 10) :

                                                        Two concepts are "tied" when their posteriors are approximately equal. Corresponds to high ambiguity in Kao's model.

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For
                                                          def Phenomena.Polysemy.Studies.ErkHerbelot2024.isPredictedPun {α : Type u_1} {Θ : Type u_2} [Semantics.Probabilistic.SDS.Core.SDSConstraintSystem α Θ] [BEq Θ] (sys : α) (uncertaintyThreshold : := 4 / 10) :

                                                          A sentence is predicted to be a pun when:

                                                          1. High posterior uncertainty (ambiguity) — both meanings plausible
                                                          2. Conflict between factors (distinctiveness) — different support for each
                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            Funniness prediction based on conflict degree.

                                                            Kao found that distinctiveness (not ambiguity) predicts fine-grained funniness. conflictDegree serves the same role.

                                                            Equations
                                                            Instances For

                                                              Concepts for the hare/hair ambiguity

                                                              Instances For
                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For

                                                                  "The magician got so mad he pulled his hare out" as SDS.

                                                                  Selectional: "pulled out" slightly prefers hair (idiomatic), but magician context activates MAGIC frame favoring rabbits.

                                                                  Equations
                                                                  • One or more equations did not get rendered due to their size.
                                                                  Instances For

                                                                    SDS conflict corresponds exactly to different argmax across factors.

                                                                    For a disambiguation scenario, hasConflict is true iff the selectional and scenario factors have different argmax concepts. This is the formal content of the SDS↔Kao distinctiveness correspondence.

                                                                    Summary: SDS and Humor #

                                                                    Concept@cite{kao-levy-goodman-2016}SDS
                                                                    Latent variableMeaning mConcept c
                                                                    Evidence integrationP(m|w) via BayesProduct of Experts
                                                                    UncertaintyAmbiguity (entropy)Posterior uncertainty
                                                                    Distinct supportDistinctiveness (KL div)Conflict (argmax difference)
                                                                    Humor predictionAmb ↑ AND Dist ↑Uncertainty ↑ AND Conflict

                                                                    Both formalize the same intuition: Puns arise when different sources of evidence point to different interpretations.

                                                                    TODO: Full formalization requires formalizing Kao's generative model (KaoModel) with relatedness : W → M → ℚ, defining kaoToSDS translation, and proving quantitative bounds distinctiveness(model) ≥ f(conflictDegree(kaoToSDS model)). This needs Real.log from Mathlib for KL divergence.