Information-Theoretic Primitives (ℚ-valued) #
@cite{ackerman-malouf-2013} @cite{cheng-holyoak-1995} @cite{cover-thomas-2006} @cite{dunn-2025} @cite{ellis-2006}
Domain-agnostic information-theoretic functions over rational numbers, suitable for decidable computation. These are used by both pragmatic models (RSA) and morphological complexity metrics.
For the ℝ-valued counterpart, see Core.shannonEntropy in
Linglib/Core/Agent/RationalAction.lean (§4), which uses Real.log and
supports proofs of non-negativity, max-entropy bounds, and Gibbs VP.
Main definitions #
log2Approx: rational approximation of log₂entropy: Shannon entropy H(X)conditionalEntropy: H(Y|X) = H(X,Y) - H(X)mutualInformation: I(X;Y) = H(X) + H(Y) - H(X,Y)deltaP: ΔP directional association measuredeltaPCounts: ΔP from a 2×2 contingency table
Natural logarithm approximated as a rational (for decidable proofs).
We use a simple linear approximation log₂(x) ≈ 3 * (x - 1) / (x + 1). This is only used for concrete computations; proofs use abstract properties. For exact proofs about limiting behavior, we would need Mathlib.Analysis.
Equations
Instances For
Shannon entropy of a distribution (in bits).
H(X) = -Σ_x P(x) log P(x)
Note: 0 log 0 is defined as 0 (standard convention).
This is the ℚ counterpart of Core.shannonEntropy in RationalAction.lean,
using log2Approx (rational linear approximation) instead of Real.log.
Suitable for decidable computation; for mathematical proofs, use
shannonEntropy and its theorems (shannonEntropy_nonneg,
shannonEntropy_le_log_card, etc.).
Equations
Instances For
Mutual information I(X;Y) = H(X) + H(Y) - H(X,Y).
Alternative: I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)
Equations
- Core.InformationTheory.mutualInformation joint marginalX marginalY = Core.InformationTheory.entropy marginalX + Core.InformationTheory.entropy marginalY - Core.InformationTheory.entropy joint
Instances For
Conditional entropy H(Y|X) = H(X,Y) - H(X).
Used by MemorySurprisal for computing average surprisal S_M = H(W_t | M_t), and by LCEC for paradigm cell conditional entropy H(Cᵢ|Cⱼ).
Equations
- Core.InformationTheory.conditionalEntropy joint marginalX = Core.InformationTheory.entropy joint - Core.InformationTheory.entropy marginalX
Instances For
Jensen-Shannon divergence over an explicit inventory.
JSD(p, q) = H(m) - (H(p) + H(q)) / 2 where m(x) = (p(x) + q(x)) / 2. Symmetric, bounded (0 ≤ JSD ≤ 1 bit), and a metric (after sqrt).
Used for grammar distance, register comparison, and anywhere KL divergence's asymmetry is undesirable.
Equations
- One or more equations did not get rendered due to their size.
Instances For
ΔP: directional association measure (@cite{ellis-2006}, Table 1; via @cite{cheng-holyoak-1995} Probabilistic Contrast Model).
ΔP(x → y) = P(y | x) - P(y | ¬x)
Measures how much knowing x changes the probability of y. @cite{ellis-2006} uses ΔP to explain L2 morpheme acquisition: grammatical functors with low ΔP (many cue-outcome competitors) are acquired late. @cite{dunn-2025} uses ΔP to identify constructions from corpus data.
Properties:
- Bounded: ΔP ∈ [-1, 1] for valid probability inputs
- ΔP = 0 when x and y are independent (see
deltaP_eq_zero_of_independent) - Directional: ΔP(x→y) ≠ ΔP(y→x) in general
Takes joint probability P(x,y), marginal P(x), and marginal P(y). Returns the directional association from x to y.
Equations
Instances For
ΔP from a 2×2 contingency table (@cite{ellis-2006}, Table 1).
Given observed counts:
| y | ¬y | |
|---|---|---|
| x | a | b |
| ¬x | c | d |
ΔP(x → y) = a/(a+b) - c/(c+d)
This is the standard form for contingency learning: a is the frequency of
cue-outcome co-occurrence, b is cue without outcome, c is outcome
without cue, d is neither. Also used in corpus-based CxG (@cite{dunn-2025})
for slot-filler association strength.
Equations
Instances For
ΔP vanishes under independence: if P(x,y) = P(x)·P(y), then ΔP = 0.
This is the key property: ΔP measures departure from independence. When slot and filler are statistically independent (knowing the slot tells you nothing about the filler), ΔP is zero.