Channel Capacity and Capacity-Achieving Priors #
@cite{cover-thomas-2006} @cite{zaslavsky-etal-2019}
A naming system (lexicon) can be viewed as a communication channel: the input is a meaning (color, object, etc.), the output is a word, and the conditional distribution p(w|c) specifies how likely each word is for each meaning. The channel capacity — the maximum mutual information achievable by any input distribution — measures the maximal amount of information about meanings that the lexicon can convey.
@cite{zaslavsky-etal-2019} show that a capacity-achieving prior (CAP) links communicative need p(c) and communicative precision S(c) through a simple linear relationship: −log p(c) = S(c) + log Z.
The CAP condition p(c) ∝ exp(−S(c)) is the necessary and sufficient first-order condition for maximizing I(W;C) over the probability simplex (@cite{cover-thomas-2006}; Blahut 1972; Arimoto 1972).
Connection to RSA #
The naming channel p(w|c) is the RSA literal speaker S₀: given a state c, how likely is each utterance w? The posterior p(c|w) is the RSA literal listener L₀. Mutual information I(W;C) measures how much the listener learns from the speaker. Channel capacity is the best-case informativity.
Main definitions #
NamingChannel: a conditional distribution p(w|c) (stochastic matrix)marginalWord: p(w) = Σ_c p(c) · p(w|c)posterior: p(c|w) via Bayes' rulecommPrecision: expected surprisal S(c)mutualInfo: I(W;C)IsCAP: capacity-achieving prior predicatechannelCapacity: sup_{p(c)} I(W;C)
Main results #
cap_linear: at a CAP, −log p(c) = S(c) + log ZmutualInfo_eq_log_Z_of_cap: at a CAP, I(W;C) = log Zgibbs_inequality: KL divergence is non-negativemutualInfo_le_log_card: I(W;C) ≤ log |W|channelCapacity_le_log_card: C* ≤ log |W|
Marginal word probability p(w) = Σ_c p(c) · p(w|c).
Equations
- Core.ChannelCapacity.marginalWord nc prior w = ∑ c : C, prior c * nc.encode c w
Instances For
Posterior p(c|w) via Bayes' rule. p(c|w) = p(w|c) · p(c) / Σ_{c'} p(w|c') · p(c').
This is the listener's belief about the meaning after hearing word w (= RSA literal listener L₀).
Equations
- Core.ChannelCapacity.posterior nc prior w c = nc.encode c w * prior c / Core.ChannelCapacity.marginalWord nc prior w
Instances For
Communicative precision (expected surprisal) of a meaning c. S(c) = −Σ_w p(w|c) · log p(c|w).
Lower S(c) means the naming system communicates c more precisely: the listener can recover c from the word with less uncertainty.
Defined in @cite{zaslavsky-etal-2019}.
Equations
- Core.ChannelCapacity.commPrecision nc prior c = -∑ w : W, nc.encode c w * Real.log (Core.ChannelCapacity.posterior nc prior w c)
Instances For
Mutual information I(W;C) for a naming channel under prior p(c). I(W;C) = Σ_{c,w} p(c) · p(w|c) · log(p(c|w) / p(c)).
Measures how much information the word conveys about the meaning on average.
Equations
- Core.ChannelCapacity.mutualInfo nc prior = ∑ c : C, ∑ w : W, prior c * nc.encode c w * Real.log (Core.ChannelCapacity.posterior nc prior w c / prior c)
Instances For
A prior is capacity-achieving iff p(c) ∝ exp(−S(c)). This is the necessary and sufficient condition for p(c) to maximize I(W;C) over all priors on C.
The CAP condition from @cite{zaslavsky-etal-2019}; derived from the Blahut-Arimoto algorithm (@cite{cover-thomas-2006}).
Equations
- Core.ChannelCapacity.IsCAP nc prior = ∃ Z > 0, ∀ (c : C), prior c > 0 → prior c = Real.exp (-Core.ChannelCapacity.commPrecision nc prior c) / Z
Instances For
The CAP linear relation: at a capacity-achieving prior, −log p(c) = S(c) + log Z.
From @cite{zaslavsky-etal-2019}. It means that
communicative need (−log p(c)) and communicative precision (S(c))
are linearly related with slope 1 and intercept log Z.
At a CAP, log Z equals the mutual information (see
mutualInfo_eq_log_Z_of_cap).
Proof: take log of both sides of p(c) = exp(−S(c)) / Z, then negate. Uses log(a/b) = log a − log b and log(exp x) = x.
Variant of cap_linear taking an existential IsCAP witness.
Marginal word probability is non-negative.
The marginal word distribution sums to 1.
At a CAP, mutual information equals log Z.
Proof: factor p(c) out of the double sum and show each inner sum
Σ_w p(w|c) · log(p(c|w)/p(c)) = log Z. Using sum_encode_log_ratio,
the inner sum equals −S(c) − log p(c). Then cap_linear gives
−log p(c) = S(c) + log Z, so the inner sum = log Z.
This means the normalization constant Z in the CAP condition determines the channel capacity.
Gibbs inequality: KL divergence is non-negative. D_KL(p ∥ q) = Σ p(i) log(p(i)/q(i)) ≥ 0 for distributions p, q. Uses the fundamental inequality log x ≤ x − 1.
Mutual information is bounded by the log of the vocabulary size: I(W;C) ≤ log |W|.
Proof chain: I(W;C) ≤ H(W) ≤ log |W|, where the first inequality uses conditional entropy H(W|C) ≥ 0, and the second uses Gibbs' inequality (KL divergence of p(w) from uniform is non-negative).
Channel capacity: the supremum of mutual information over all valid priors. C* = sup_{p(c)} I(W;C) (eq. 3 of @cite{zaslavsky-etal-2019}).
A capacity-achieving prior (CAP) attains this supremum.
Equations
Instances For
Channel capacity is bounded by log of the vocabulary size:
C* ≤ log |W|. Follows directly from mutualInfo_le_log_card.
Mutual information bound for a vocabulary of size k:
I(W;C) ≤ log k. Useful corollary when the word type is Fin k.