Information-Theoretic Primitives (ℚ-valued) #

@cite{ackerman-malouf-2013} @cite{cheng-holyoak-1995} @cite{cover-thomas-2006} @cite{dunn-2025} @cite{ellis-2006}

Domain-agnostic information-theoretic functions over rational numbers, suitable for decidable computation. These are used by both pragmatic models (RSA) and morphological complexity metrics.

For the ℝ-valued counterpart, see Core.shannonEntropy in Linglib/Core/Agent/RationalAction.lean (§4), which uses Real.log and supports proofs of non-negativity, max-entropy bounds, and Gibbs VP.

Main definitions #

log2Approx: rational approximation of log₂
entropy: Shannon entropy H(X)
conditionalEntropy: H(Y|X) = H(X,Y) - H(X)
mutualInformation: I(X;Y) = H(X) + H(Y) - H(X,Y)
deltaP: ΔP directional association measure
deltaPCounts: ΔP from a 2×2 contingency table

source

def Core.InformationTheory.log2Approx (x : ℚ) :

ℚ

Natural logarithm approximated as a rational (for decidable proofs).

We use a simple linear approximation log₂(x) ≈ 3 * (x - 1) / (x + 1). This is only used for concrete computations; proofs use abstract properties. For exact proofs about limiting behavior, we would need Mathlib.Analysis.

Equations

Core.InformationTheory.log2Approx x = if x ≤ 0 then 0 else have ratio := (x - 1) / (x + 1); 3 * ratio

Instances For

source

def Core.InformationTheory.entropy {α : Type} [BEq α] (dist : List (α × ℚ)) :

ℚ

Shannon entropy of a distribution (in bits).

H(X) = -Σ_x P(x) log P(x)

Note: 0 log 0 is defined as 0 (standard convention).

This is the ℚ counterpart of Core.shannonEntropy in RationalAction.lean, using log2Approx (rational linear approximation) instead of Real.log. Suitable for decidable computation; for mathematical proofs, use shannonEntropy and its theorems (shannonEntropy_nonneg, shannonEntropy_le_log_card, etc.).

Equations

Core.InformationTheory.entropy dist = (List.map (fun (x : α × ℚ) => match x with | (fst, p) => if p ≤ 0 then 0 else -p * Core.InformationTheory.log2Approx p) dist).sum

Instances For

source

def Core.InformationTheory.mutualInformation {α β : Type} [BEq α] [BEq β] (joint : List ((α × β) × ℚ)) (marginalX : List (α × ℚ)) (marginalY : List (β × ℚ)) :

ℚ

Mutual information I(X;Y) = H(X) + H(Y) - H(X,Y).

Alternative: I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)

Equations

Core.InformationTheory.mutualInformation joint marginalX marginalY = Core.InformationTheory.entropy marginalX + Core.InformationTheory.entropy marginalY - Core.InformationTheory.entropy joint

Instances For

source

def Core.InformationTheory.conditionalEntropy {α β : Type} [BEq α] [BEq β] (joint : List ((α × β) × ℚ)) (marginalX : List (α × ℚ)) :

ℚ

Conditional entropy H(Y|X) = H(X,Y) - H(X).

Used by MemorySurprisal for computing average surprisal S_M = H(W_t | M_t), and by LCEC for paradigm cell conditional entropy H(Cᵢ|Cⱼ).

Equations

Core.InformationTheory.conditionalEntropy joint marginalX = Core.InformationTheory.entropy joint - Core.InformationTheory.entropy marginalX

Instances For

source

def Core.InformationTheory.jsdOf {α : Type} [BEq α] (xs : List α) (p q : α → ℚ) :

ℚ

Jensen-Shannon divergence over an explicit inventory.

JSD(p, q) = H(m) - (H(p) + H(q)) / 2 where m(x) = (p(x) + q(x)) / 2. Symmetric, bounded (0 ≤ JSD ≤ 1 bit), and a metric (after sqrt).

Used for grammar distance, register comparison, and anywhere KL divergence's asymmetry is undesirable.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def Core.InformationTheory.deltaP (pXY pX pY : ℚ) :

ℚ

ΔP: directional association measure (@cite{ellis-2006}, Table 1; via @cite{cheng-holyoak-1995} Probabilistic Contrast Model).

ΔP(x → y) = P(y | x) - P(y | ¬x)

Measures how much knowing x changes the probability of y. @cite{ellis-2006} uses ΔP to explain L2 morpheme acquisition: grammatical functors with low ΔP (many cue-outcome competitors) are acquired late. @cite{dunn-2025} uses ΔP to identify constructions from corpus data.

Properties:

Bounded: ΔP ∈ [-1, 1] for valid probability inputs
ΔP = 0 when x and y are independent (see deltaP_eq_zero_of_independent)
Directional: ΔP(x→y) ≠ ΔP(y→x) in general

Takes joint probability P(x,y), marginal P(x), and marginal P(y). Returns the directional association from x to y.

Equations

Core.InformationTheory.deltaP pXY pX pY = (if pX ≤ 0 then 0 else pXY / pX) - if pX ≥ 1 then 0 else (pY - pXY) / (1 - pX)

Instances For

source

def Core.InformationTheory.deltaPCounts (a b c d : ℕ) :

ℚ

ΔP from a 2×2 contingency table (@cite{ellis-2006}, Table 1).

Given observed counts:

	y	¬y
x	a	b
¬x	c	d

ΔP(x → y) = a/(a+b) - c/(c+d)

This is the standard form for contingency learning: a is the frequency of cue-outcome co-occurrence, b is cue without outcome, c is outcome without cue, d is neither. Also used in corpus-based CxG (@cite{dunn-2025}) for slot-filler association strength.

Equations

Core.InformationTheory.deltaPCounts a b c d = (if ↑a + ↑b = 0 then 0 else ↑a / (↑a + ↑b)) - if ↑c + ↑d = 0 then 0 else ↑c / (↑c + ↑d)

Instances For

source

theorem Core.InformationTheory.deltaP_eq_zero_of_independent (pX pY : ℚ) (hpX_pos : 0 < pX) (hpX_lt : pX < 1) :

deltaP (pX * pY) pX pY = 0

ΔP vanishes under independence: if P(x,y) = P(x)·P(y), then ΔP = 0.

This is the key property: ΔP measures departure from independence. When slot and filler are statistically independent (knowing the slot tells you nothing about the filler), ΔP is zero.

Documentation

Linglib.Core.InformationTheory

Information-Theoretic Primitives (ℚ-valued) #

Main definitions #