Documentation

Linglib.Core.InformationTheory

Information-Theoretic Primitives (ℚ-valued) #

@cite{ackerman-malouf-2013} @cite{cheng-holyoak-1995} @cite{cover-thomas-2006} @cite{dunn-2025} @cite{ellis-2006}

Domain-agnostic information-theoretic functions over rational numbers, suitable for decidable computation. These are used by both pragmatic models (RSA) and morphological complexity metrics.

For the ℝ-valued counterpart, see Core.shannonEntropy in Linglib/Core/Agent/RationalAction.lean (§4), which uses Real.log and supports proofs of non-negativity, max-entropy bounds, and Gibbs VP.

Main definitions #

Natural logarithm approximated as a rational (for decidable proofs).

We use a simple linear approximation log₂(x) ≈ 3 * (x - 1) / (x + 1). This is only used for concrete computations; proofs use abstract properties. For exact proofs about limiting behavior, we would need Mathlib.Analysis.

Equations
Instances For
    def Core.InformationTheory.entropy {α : Type} [BEq α] (dist : List (α × )) :

    Shannon entropy of a distribution (in bits).

    H(X) = -Σ_x P(x) log P(x)

    Note: 0 log 0 is defined as 0 (standard convention).

    This is the ℚ counterpart of Core.shannonEntropy in RationalAction.lean, using log2Approx (rational linear approximation) instead of Real.log. Suitable for decidable computation; for mathematical proofs, use shannonEntropy and its theorems (shannonEntropy_nonneg, shannonEntropy_le_log_card, etc.).

    Equations
    Instances For
      def Core.InformationTheory.mutualInformation {α β : Type} [BEq α] [BEq β] (joint : List ((α × β) × )) (marginalX : List (α × )) (marginalY : List (β × )) :

      Mutual information I(X;Y) = H(X) + H(Y) - H(X,Y).

      Alternative: I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)

      Equations
      Instances For
        def Core.InformationTheory.conditionalEntropy {α β : Type} [BEq α] [BEq β] (joint : List ((α × β) × )) (marginalX : List (α × )) :

        Conditional entropy H(Y|X) = H(X,Y) - H(X).

        Used by MemorySurprisal for computing average surprisal S_M = H(W_t | M_t), and by LCEC for paradigm cell conditional entropy H(Cᵢ|Cⱼ).

        Equations
        Instances For
          def Core.InformationTheory.jsdOf {α : Type} [BEq α] (xs : List α) (p q : α) :

          Jensen-Shannon divergence over an explicit inventory.

          JSD(p, q) = H(m) - (H(p) + H(q)) / 2 where m(x) = (p(x) + q(x)) / 2. Symmetric, bounded (0 ≤ JSD ≤ 1 bit), and a metric (after sqrt).

          Used for grammar distance, register comparison, and anywhere KL divergence's asymmetry is undesirable.

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            ΔP: directional association measure (@cite{ellis-2006}, Table 1; via @cite{cheng-holyoak-1995} Probabilistic Contrast Model).

            ΔP(x → y) = P(y | x) - P(y | ¬x)

            Measures how much knowing x changes the probability of y. @cite{ellis-2006} uses ΔP to explain L2 morpheme acquisition: grammatical functors with low ΔP (many cue-outcome competitors) are acquired late. @cite{dunn-2025} uses ΔP to identify constructions from corpus data.

            Properties:

            • Bounded: ΔP ∈ [-1, 1] for valid probability inputs
            • ΔP = 0 when x and y are independent (see deltaP_eq_zero_of_independent)
            • Directional: ΔP(x→y) ≠ ΔP(y→x) in general

            Takes joint probability P(x,y), marginal P(x), and marginal P(y). Returns the directional association from x to y.

            Equations
            Instances For

              ΔP from a 2×2 contingency table (@cite{ellis-2006}, Table 1).

              Given observed counts:

              y¬y
              xab
              ¬xcd

              ΔP(x → y) = a/(a+b) - c/(c+d)

              This is the standard form for contingency learning: a is the frequency of cue-outcome co-occurrence, b is cue without outcome, c is outcome without cue, d is neither. Also used in corpus-based CxG (@cite{dunn-2025}) for slot-filler association strength.

              Equations
              Instances For
                theorem Core.InformationTheory.deltaP_eq_zero_of_independent (pX pY : ) (hpX_pos : 0 < pX) (hpX_lt : pX < 1) :
                deltaP (pX * pY) pX pY = 0

                ΔP vanishes under independence: if P(x,y) = P(x)·P(y), then ΔP = 0.

                This is the key property: ΔP measures departure from independence. When slot and filler are statistically independent (knowing the slot tells you nothing about the filler), ΔP is zero.