Documentation

Linglib.Core.ChannelCapacity

Channel Capacity and Capacity-Achieving Priors #

@cite{cover-thomas-2006} @cite{zaslavsky-etal-2019}

A naming system (lexicon) can be viewed as a communication channel: the input is a meaning (color, object, etc.), the output is a word, and the conditional distribution p(w|c) specifies how likely each word is for each meaning. The channel capacity — the maximum mutual information achievable by any input distribution — measures the maximal amount of information about meanings that the lexicon can convey.

@cite{zaslavsky-etal-2019} show that a capacity-achieving prior (CAP) links communicative need p(c) and communicative precision S(c) through a simple linear relationship: −log p(c) = S(c) + log Z.

The CAP condition p(c) ∝ exp(−S(c)) is the necessary and sufficient first-order condition for maximizing I(W;C) over the probability simplex (@cite{cover-thomas-2006}; Blahut 1972; Arimoto 1972).

Connection to RSA #

The naming channel p(w|c) is the RSA literal speaker S₀: given a state c, how likely is each utterance w? The posterior p(c|w) is the RSA literal listener L₀. Mutual information I(W;C) measures how much the listener learns from the speaker. Channel capacity is the best-case informativity.

Main definitions #

Main results #

A naming system as a communication channel. encode c w = p(w|c): probability of using word w for meaning c.

The row-stochastic constraint (nonneg + rows sum to 1) makes this a valid conditional distribution.

  • encode : CW

    p(w|c): probability of word w given meaning c.

  • encode_nonneg (c : C) (w : W) : 0 self.encode c w
  • encode_sum_one (c : C) : w : W, self.encode c w = 1
Instances For
    noncomputable def Core.ChannelCapacity.marginalWord {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) (w : W) :

    Marginal word probability p(w) = Σ_c p(c) · p(w|c).

    Equations
    Instances For
      noncomputable def Core.ChannelCapacity.posterior {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) (w : W) (c : C) :

      Posterior p(c|w) via Bayes' rule. p(c|w) = p(w|c) · p(c) / Σ_{c'} p(w|c') · p(c').

      This is the listener's belief about the meaning after hearing word w (= RSA literal listener L₀).

      Equations
      Instances For
        noncomputable def Core.ChannelCapacity.commPrecision {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) (c : C) :

        Communicative precision (expected surprisal) of a meaning c. S(c) = −Σ_w p(w|c) · log p(c|w).

        Lower S(c) means the naming system communicates c more precisely: the listener can recover c from the word with less uncertainty.

        Defined in @cite{zaslavsky-etal-2019}.

        Equations
        Instances For
          noncomputable def Core.ChannelCapacity.mutualInfo {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) :

          Mutual information I(W;C) for a naming channel under prior p(c). I(W;C) = Σ_{c,w} p(c) · p(w|c) · log(p(c|w) / p(c)).

          Measures how much information the word conveys about the meaning on average.

          Equations
          Instances For
            def Core.ChannelCapacity.IsCAP {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) :

            A prior is capacity-achieving iff p(c) ∝ exp(−S(c)). This is the necessary and sufficient condition for p(c) to maximize I(W;C) over all priors on C.

            The CAP condition from @cite{zaslavsky-etal-2019}; derived from the Blahut-Arimoto algorithm (@cite{cover-thomas-2006}).

            Equations
            Instances For
              theorem Core.ChannelCapacity.cap_linear {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) {Z : } (hZ : Z > 0) (hCAP : ∀ (c : C), prior c > 0prior c = Real.exp (-commPrecision nc prior c) / Z) {c : C} (hc : prior c > 0) :
              -Real.log (prior c) = commPrecision nc prior c + Real.log Z

              The CAP linear relation: at a capacity-achieving prior, −log p(c) = S(c) + log Z.

              From @cite{zaslavsky-etal-2019}. It means that communicative need (−log p(c)) and communicative precision (S(c)) are linearly related with slope 1 and intercept log Z. At a CAP, log Z equals the mutual information (see mutualInfo_eq_log_Z_of_cap).

              Proof: take log of both sides of p(c) = exp(−S(c)) / Z, then negate. Uses log(a/b) = log a − log b and log(exp x) = x.

              theorem Core.ChannelCapacity.cap_linear' {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) (hCAP : IsCAP nc prior) {c : C} (hc : prior c > 0) :
              Z > 0, -Real.log (prior c) = commPrecision nc prior c + Real.log Z

              Variant of cap_linear taking an existential IsCAP witness.

              theorem Core.ChannelCapacity.marginalWord_nonneg {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) (hp : ∀ (c : C), 0 prior c) (w : W) :
              0 marginalWord nc prior w

              Marginal word probability is non-negative.

              theorem Core.ChannelCapacity.marginalWord_sum_one {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) (hsum : c : C, prior c = 1) :
              w : W, marginalWord nc prior w = 1

              The marginal word distribution sums to 1.

              theorem Core.ChannelCapacity.mutualInfo_eq_log_Z_of_cap {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) {Z : } (hZ : Z > 0) (hCAP : ∀ (c : C), prior c > 0prior c = Real.exp (-commPrecision nc prior c) / Z) (hprior_nonneg : ∀ (c : C), 0 prior c) (hprior_sum : c : C, prior c = 1) (hprior_pos : ∀ (c : C), prior c > 0) :
              mutualInfo nc prior = Real.log Z

              At a CAP, mutual information equals log Z.

              Proof: factor p(c) out of the double sum and show each inner sum Σ_w p(w|c) · log(p(c|w)/p(c)) = log Z. Using sum_encode_log_ratio, the inner sum equals −S(c) − log p(c). Then cap_linear gives −log p(c) = S(c) + log Z, so the inner sum = log Z.

              This means the normalization constant Z in the CAP condition determines the channel capacity.

              theorem Core.ChannelCapacity.gibbs_inequality {ι : Type u_1} [Fintype ι] (p q : ι) (hp_nn : ∀ (i : ι), 0 p i) (hq_pos : ∀ (i : ι), 0 < q i) (hp_sum : i : ι, p i = 1) (hq_sum : i : ι, q i = 1) :
              0 i : ι, p i * Real.log (p i / q i)

              Gibbs inequality: KL divergence is non-negative. D_KL(p ∥ q) = Σ p(i) log(p(i)/q(i)) ≥ 0 for distributions p, q. Uses the fundamental inequality log x ≤ x − 1.

              theorem Core.ChannelCapacity.mutualInfo_le_log_card {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) (prior : C) (hprior_nonneg : ∀ (c : C), 0 prior c) (hprior_sum : c : C, prior c = 1) :

              Mutual information is bounded by the log of the vocabulary size: I(W;C) ≤ log |W|.

              Proof chain: I(W;C) ≤ H(W) ≤ log |W|, where the first inequality uses conditional entropy H(W|C) ≥ 0, and the second uses Gibbs' inequality (KL divergence of p(w) from uniform is non-negative).

              noncomputable def Core.ChannelCapacity.channelCapacity {C W : Type} [Fintype C] [Fintype W] (nc : NamingChannel C W) :

              Channel capacity: the supremum of mutual information over all valid priors. C* = sup_{p(c)} I(W;C) (eq. 3 of @cite{zaslavsky-etal-2019}).

              A capacity-achieving prior (CAP) attains this supremum.

              Equations
              Instances For

                Channel capacity is bounded by log of the vocabulary size: C* ≤ log |W|. Follows directly from mutualInfo_le_log_card.

                theorem Core.ChannelCapacity.mutualInfo_le_log_fin {C : Type} [Fintype C] {k : } (nc : NamingChannel C (Fin k)) (prior : C) (hp : ∀ (c : C), 0 prior c) (hsum : c : C, prior c = 1) :
                mutualInfo nc prior Real.log k

                Mutual information bound for a vocabulary of size k: I(W;C) ≤ log k. Useful corollary when the word type is Fin k.