Documentation

Linglib.Core.GeneralisedSurprisal

Generalised Surprisal and Incremental Alternative Sampling #

@cite{giulianelli-etal-2026}

Parameterized family of processing difficulty measures that decomposes prediction into explicit temporal and representational dimensions, generalizing standard surprisal.

Standard surprisal treats prediction error as a single scalar (−log P(next word)). The generalised framework disentangles this into:

  1. A warping function f mapping expected scores to processing measures
  2. A scoring function g measuring how well alternatives match the target
  3. A forecast horizon h: how many future symbols are considered
  4. A representational level: the abstraction at which alternatives are compared

Standard surprisal is the special case (negLog, indicator, 1, predictive). Incremental information value is the family (identity, distance, h, l).

Main definitions #

Connection to existing infrastructure #

Warping functions mapping expected scores to processing measures. γ(w;c) = f(E[g(a,w,c)]).

  • negLog : WarpingFn

    f(x) = −log(x): standard surprisal (bits)

  • identity : WarpingFn

    f(x) = x: information value (raw expected distance)

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Scoring functions measuring prediction accuracy. g(a, w, c) evaluates alternative a against target w in context c.

      • indicator : ScoringFn

        𝟙{w ≤ a}: binary prefix match. With negLog → standard surprisal.

      • distance : ScoringFn

        d_r(a, w): representational distance. With identity → information value.

      • similarity : ScoringFn

        sim(r(a), r(w)): semantic similarity. @cite{meister-giulianelli-pimentel-2024}

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          @[reducible, inline]

          Forecast horizon: how many future symbols each alternative spans. h = 1 is standard surprisal's implicit horizon (next word only).

          Equations
          Instances For

            Representational level at which predictions are evaluated.

            Different layers of a neural language model capture different levels of linguistic processing. The key finding is that the most predictive level varies by psycholinguistic measure: lexical identity layers best predict explicit predictability; intermediate layers best predict reading times.

            • lexical : RepLevel

              Layer 0 / embedding: decontextualized lexical identity

            • shallowSyntactic : RepLevel

              Early-to-intermediate layers: shallow syntactic processing

            • syntactic : RepLevel

              Intermediate layers: deep syntactic, shallow semantic

            • semantic : RepLevel

              Deep layers: fully contextualized semantics

            • predictive : RepLevel

              Final layer: specialized for next-token prediction

            Instances For
              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                How pairwise distances between alternative sets are aggregated.

                Different summaries capture different notions of predictability: mean is the unbiased discrepancy estimate; min asks whether any hypothesis is close to the outcome; max captures worst-case error.

                Key finding: under min, surprisal correlates most strongly with intermediate layers and medium horizons, revealing that surprisal's predictability is closest to a best-case (closest-hypothesis) notion rather than average discrepancy.

                • mean : DistanceSummary

                  Average pairwise distance. Equivalent to the original information value definition.

                • min : DistanceSummary

                  Minimum pairwise distance. Closest pre-observation hypothesis.

                • max : DistanceSummary

                  Maximum pairwise distance. Worst-case prediction error.

                Instances For
                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    A generalised surprisal model: the complete parameter set for a specific processing measure.

                    Instances For
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For
                        Equations
                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Standard surprisal: −log P(next word). @cite{levy-2008} @cite{smith-levy-2013}

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Incremental information value at temporal-representational resolution (h, l). @cite{giulianelli-etal-2026}

                              Equations
                              Instances For

                                Standard psycholinguistic response types that index processing effort.

                                Instances For
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    Explicit predictability judgements (cloze, rating) vs. implicit processing signatures (RTs, ERPs). Best-predicting IAS configurations differ between these classes: explicit measures peak at h = 1 with lexical-level representations; implicit measures benefit from longer horizons and intermediate representations.

                                    Equations
                                    Instances For

                                      Standard surprisal is IAS at horizon 1 with predictive-level representation and negLog/indicator replacing identity/distance. Subsumption by construction.