Generalised Surprisal and Incremental Alternative Sampling #
@cite{giulianelli-etal-2026}
Parameterized family of processing difficulty measures that decomposes prediction into explicit temporal and representational dimensions, generalizing standard surprisal.
Standard surprisal treats prediction error as a single scalar (−log P(next word)). The generalised framework disentangles this into:
- A warping function f mapping expected scores to processing measures
- A scoring function g measuring how well alternatives match the target
- A forecast horizon h: how many future symbols are considered
- A representational level: the abstraction at which alternatives are compared
Standard surprisal is the special case (negLog, indicator, 1, predictive). Incremental information value is the family (identity, distance, h, l).
Main definitions #
SurprisalConfig: Complete generalised surprisal specificationstandardSurprisal: The configuration corresponding to @cite{levy-2008}informationValue: The IAS configuration at a given (horizon, level)PsychMeasure: Standard psycholinguistic response typesias_recovers_surprisal: Standard surprisal is a special case of IAS
Connection to existing infrastructure #
Core.InformationTheory.conditionalEntropycomputes H(W|M), the expected surprisal under bounded memoryCore.Divergence.kl_pointMass_eq_neg_log: KL with point mass = surprisalCore.ProcessingModel.ProcessingProfile: multi-dimensional processing cost, which IAS motivates decomposing by temporal and representational resolution
Equations
- Core.GeneralisedSurprisal.instBEqWarpingFn.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Scoring functions measuring prediction accuracy. g(a, w, c) evaluates alternative a against target w in context c.
- indicator : ScoringFn
𝟙{w ≤ a}: binary prefix match. With negLog → standard surprisal.
- distance : ScoringFn
d_r(a, w): representational distance. With identity → information value.
- similarity : ScoringFn
sim(r(a), r(w)): semantic similarity. @cite{meister-giulianelli-pimentel-2024}
Instances For
Equations
- Core.GeneralisedSurprisal.instBEqScoringFn.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Forecast horizon: how many future symbols each alternative spans. h = 1 is standard surprisal's implicit horizon (next word only).
Instances For
Representational level at which predictions are evaluated.
Different layers of a neural language model capture different levels of linguistic processing. The key finding is that the most predictive level varies by psycholinguistic measure: lexical identity layers best predict explicit predictability; intermediate layers best predict reading times.
- lexical : RepLevel
Layer 0 / embedding: decontextualized lexical identity
- shallowSyntactic : RepLevel
Early-to-intermediate layers: shallow syntactic processing
- syntactic : RepLevel
Intermediate layers: deep syntactic, shallow semantic
- semantic : RepLevel
Deep layers: fully contextualized semantics
- predictive : RepLevel
Final layer: specialized for next-token prediction
Instances For
Equations
- Core.GeneralisedSurprisal.instBEqRepLevel.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
How pairwise distances between alternative sets are aggregated.
Different summaries capture different notions of predictability:
mean is the unbiased discrepancy estimate; min asks whether any
hypothesis is close to the outcome; max captures worst-case error.
Key finding: under min, surprisal correlates most strongly with
intermediate layers and medium horizons, revealing that surprisal's
predictability is closest to a best-case (closest-hypothesis) notion
rather than average discrepancy.
- mean : DistanceSummary
Average pairwise distance. Equivalent to the original information value definition.
- min : DistanceSummary
Minimum pairwise distance. Closest pre-observation hypothesis.
- max : DistanceSummary
Maximum pairwise distance. Worst-case prediction error.
Instances For
Equations
- Core.GeneralisedSurprisal.instBEqDistanceSummary.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
A generalised surprisal model: the complete parameter set for a specific processing measure.
- warp : WarpingFn
- scoring : ScoringFn
- horizon : ForecastHorizon
- level : RepLevel
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Core.GeneralisedSurprisal.instBEqSurprisalConfig.beq x✝¹ x✝ = false
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Standard surprisal: −log P(next word). @cite{levy-2008} @cite{smith-levy-2013}
Equations
- One or more equations did not get rendered due to their size.
Instances For
Incremental information value at temporal-representational resolution (h, l). @cite{giulianelli-etal-2026}
Equations
- Core.GeneralisedSurprisal.informationValue h l = { warp := Core.GeneralisedSurprisal.WarpingFn.identity, scoring := Core.GeneralisedSurprisal.ScoringFn.distance, horizon := h, level := l }
Instances For
Standard psycholinguistic response types that index processing effort.
- predictabilityRating : PsychMeasure
- clozeProbability : PsychMeasure
- clozeSurprisal : PsychMeasure
- firstFixationRT : PsychMeasure
- firstPassRT : PsychMeasure
- rightBoundedRT : PsychMeasure
- goPastRT : PsychMeasure
- selfPacedRT : PsychMeasure
- n400 : PsychMeasure
- p600 : PsychMeasure
Instances For
Equations
- Core.GeneralisedSurprisal.instBEqPsychMeasure.beq x✝ y✝ = (x✝.ctorIdx == y✝.ctorIdx)
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Explicit predictability judgements (cloze, rating) vs. implicit processing signatures (RTs, ERPs). Best-predicting IAS configurations differ between these classes: explicit measures peak at h = 1 with lexical-level representations; implicit measures benefit from longer horizons and intermediate representations.
Equations
Instances For
Expected sign of the relationship between information value and measurement. Positive: higher info value → larger response. Negative: inverse.
Equations
- Core.GeneralisedSurprisal.PsychMeasure.predictabilityRating.expectedSign = -1
- Core.GeneralisedSurprisal.PsychMeasure.clozeProbability.expectedSign = -1
- Core.GeneralisedSurprisal.PsychMeasure.clozeSurprisal.expectedSign = 1
- Core.GeneralisedSurprisal.PsychMeasure.firstFixationRT.expectedSign = 1
- Core.GeneralisedSurprisal.PsychMeasure.firstPassRT.expectedSign = 1
- Core.GeneralisedSurprisal.PsychMeasure.rightBoundedRT.expectedSign = 1
- Core.GeneralisedSurprisal.PsychMeasure.goPastRT.expectedSign = 1
- Core.GeneralisedSurprisal.PsychMeasure.selfPacedRT.expectedSign = 1
- Core.GeneralisedSurprisal.PsychMeasure.n400.expectedSign = -1
- Core.GeneralisedSurprisal.PsychMeasure.p600.expectedSign = 1
Instances For
Standard surprisal is IAS at horizon 1 with predictive-level representation and negLog/indicator replacing identity/distance. Subsumption by construction.