The Probability Monad #
@cite{lassiter-goodman-2017}
We define P α abstractly as a structure with pure and bind operations
satisfying the monad laws. This allows us to reason about probabilistic
programs without committing to a specific representation (PMF, measure, etc.).
In Grove & White notation:
Abstract probability monad interface.
A probability monad provides:
pure: Lift a value to a trivial distributionbind: Sequence probabilistic computations- Monad laws as equalities
This is the semantic interface; implementations may use PMFs, measures, etc.
Note: We use Type instead of Type* to avoid universe issues. For semantic work, this is typically sufficient.
- pure {α : Type} : α → P α
Trivial distribution concentrated at a value
- bind {α β : Type} : P α → (α → P β) → P β
Sequence: sample from m, then continue with k
Left identity:
pure v >>= k = k vRight identity:
m >>= pure = m- bind_assoc {α β γ : Type} (m : P α) (n : α → P β) (o : β → P γ) : bind (bind m n) o = bind m fun (x : α) => bind (n x) o
Associativity:
(m >>= n) >>= o = m >>= (λx. n x >>= o)
Instances
Map a function over a distribution
Equations
Instances For
Sequence two distributions, ignoring the first result
Equations
- Semantics.Dynamic.Probabilistic.ProbMonad.seq m n = Semantics.Dynamic.Probabilistic.ProbMonad.bind m fun (x : α) => n
Instances For
Parameterized State Monad #
In Grove & White's parameterized state monad, the state type can change during computation. This models how discourse updates can modify the structure of the context (e.g., pushing questions onto the QUD stack).
P^σ_σ' α = σ → P(α × σ')
The parameters σ and σ' are the input and output state types.
Parameterized probabilistic state monad.
Maps input state σ to a distribution over (value, output state) pairs. The output state type σ' can differ from σ, allowing type-changing updates.
This is Grove & White's P^σ_σ' α.
Equations
- Semantics.Dynamic.Probabilistic.PState P σ σ' α = (σ → P (α × σ'))
Instances For
Bind for the parameterized state monad.
Sequences stateful-probabilistic computations, threading state through.
Grove & White: do { x ← m; k x } = λs. (x, s') ← m(s); k(x)(s')
Equations
Instances For
Set (put) a component of the state.
Returns the new state created by upd, with a trivial value.
Equations
Instances For
Conditioning #
Grove & White's observe operation conditions a distribution on a boolean.
observe : Bool → P Unit
observe true continues; observe false blocks
This is the mechanism for assertion: update the CG by observing the asserted proposition is true.
Extended probability monad with conditioning.
Adds observe for conditioning on boolean observations.
- fail {α : Type} : P α
The zero distribution (blocks all continuations)
Condition on a boolean: continue if true, block if false
Observing true is a no-op
Observing false blocks all continuations
fail is a left zero for bind
Instances
Observe filters: observe true then return is identity
Observe false blocks: any continuation after observe false gives fail
Choice Operations #
RSA's S1 isn't just conditioning - it's choosing an utterance weighted by utility.
This requires a choose operation in addition to observe.
choose : (α → ℚ) → P α -- sample from weighted distribution
The relationship:
Probability monad with choice (for speaker models).
Adds choose for sampling from weighted distributions.
This is what S1's softmax requires.
Sample from a weighted distribution over a finite type
- choose_uniform {α : Type} [Fintype α] [Nonempty α] : (choose fun (x : α) => 1) = choose fun (x : α) => 1
choose with uniform weights is like a uniform prior
- choose_observe {α : Type} [Fintype α] (w : α → ℚ) (p : α → Bool) : (ProbMonad.bind (choose w) fun (a : α) => ProbMonad.bind (CondProbMonad.observe (p a)) fun (x : Unit) => ProbMonad.pure a) = choose fun (a : α) => w a * if p a = true then 1 else 0
choose then observe is like weighted observe
Instances
Softmax choice: choose with exp-scaled weights.
This is S1's decision rule: P(u) ∝ exp(α · utility(u))
Note: We use ℚ so can't compute exp directly. In practice, implementations use Float or work with log-probabilities. This definition is for the interface.
Equations
- Semantics.Dynamic.Probabilistic.ChoiceProbMonad.softmaxChoice utility temperature = Semantics.Dynamic.Probabilistic.ChoiceProbMonad.choose utility
Instances For
Threshold Semantics #
Threshold semantics + threshold uncertainty produces graded truth values. This is a special case of the PDS framework.
For a gradable adjective like "tall":
measure : Entity → ℝgives heightsthreshold : θis the standard of comparison⟦tall⟧_θ(x) = measure(x) > θis Boolean
With uncertainty over θ:
⟦tall⟧(x) = E_θ[⟦tall⟧_θ(x)] = P(measure(x) > θ)
This probability is the graded truth value.
Threshold semantics: entity satisfies predicate if measure exceeds threshold.
Equations
- Semantics.Dynamic.Probabilistic.thresholdSem measure threshold x = decide (measure x > threshold)
Instances For
Graded truth from threshold uncertainty.
Given a prior over thresholds, the graded truth is the probability that the entity's measure exceeds the threshold.
Equations
Instances For
For a point-mass prior (no uncertainty), graded truth reduces to Boolean.
This shows that graded semantics reduces to Boolean semantics when there's no parameter uncertainty.
Connection to RSA #
Grove & White's framework connects to RSA as follows:
- Literal meaning φ: a function
ι → Bool(proposition) - Common ground: a distribution
P ιover indices - Assertion:
observe(φ(i))fori ← cg - Probability computation:
Pr[φ] = E_i[1_{φ(i)}]
RSA's graded φ emerges from:
- Boolean φ_θ indexed by parameters θ
- A prior distribution over θ
- Marginalization: φ(x) = E_θ[φ_θ(x)]
This is exactly Lassiter & Goodman's "threshold + uncertainty = graded".
Probability of a proposition in a finite distribution.
This is Pr[φ] = E_i[1_{φ(i)}] in Grove & White notation.
For finite distributions, this is the sum of masses where φ holds.
Equations
Instances For
Probability of a true proposition is the total mass (1 for normalized distributions).