ICML 2026 · Full Paper AAAI 2026 · Demo Paper

Cognitive Fatigue in Autoregressive Transformers

Riju Marwah1,2  ·  Ritvik Garimella2  ·  Vishal Pallagani2  ·  Atishay Jain2,3  ·  Michael Stewart2  ·  Amit Sheth2,4

1Guru Gobind Singh Indraprastha University, India    2Artificial Intelligence Institute of South Carolina, USA    3Indian Institute of Technology Kanpur, India    4Indian AI Research Organization, India

Paper (ICML) Demo Paper (AAAI) Code Demo Video Try on Colab
Attention decay (At)
Mean last-layer attention to prompt slice; declining trend = instruction loss
Embedding drift (Dt)
‖ht − h0‖₂ from prompt anchor; growth = representation wandering
Entropy deviation (Et)
Departure from healthy band; low = repetition, high = erratic indecision
Monotonicity
Scale invariance
Boundedness
Temporal stability
Compositionality
FATIGUE INDEX
0.82
FI = wAφA + wEφE + wDφD
FI ∈ [0,1] · computed at inference time

Three lightweight inference-time signals — normalized under explicit axioms and aggregated into the Fatigue Index — enable real-time monitoring of long-horizon generation reliability without retraining.

Part I The Concept: Cognitive Fatigue in LLMs
What is cognitive fatigue?

Language models do not fail all at once. They drift. Over long generations, a model that starts focused and coherent gradually loses its grip on the original instruction, its internal representations shift away from the task, and its predictions grow erratic or repetitive. By the time the output looks wrong, the deterioration has already been underway for some time.

We formalize this as cognitive fatigue: a progressive, within-run deterioration in instruction adherence, representation stability, and predictive calibration. It is not random noise or an edge case. It is a systematic consequence of how transformer decoders operate over extended sequences, and it is measurable at inference time, token by token, without retraining.

The critical insight is that fatigue has internal signatures. Attention patterns, hidden states, and output distributions all show signs of degradation before the generated text visibly breaks down. This makes it possible to detect unreliable generation as it happens, not after the fact.

The three signals
Signal A
Prompt attention decay
As decoding progresses, the model increasingly conditions on its own recent outputs rather than the original instruction. Transformer decoders expose this through attention weights: the mean last-layer attention mass allocated to the initial prompt slice declines even when the instruction remains relevant. A declining At signals growing instruction neglect.
At = (1/H) Σh Σj≤Lp Attnh(xt, xj)
Weight: wA = 0.40 (highest priority)
Signal D
Embedding drift
Long-horizon decoding repeatedly updates a shared residual stream, allowing small perturbations to accumulate. Hidden states gradually drift away from the representational subspace induced by the prompt — a latent degradation process that precedes surface-level incoherence. This internal drift is not directly observable from output text alone.
Dt = ‖ht − h0‖₂
Weight: wD = 0.25 (noisiest signal)
Signal E
Entropy deviation
During extended generation, models become overconfident (low entropy → repetition, degeneracy) or erratically uncertain (high entropy → unrelated to task difficulty). Shannon entropy of the next-token softmax provides a direct view of calibration state. Deviation from a healthy band in either direction signals fatigue.
Et = −Σ p log p  |  healthy: [H, Hu]
Weight: wE = 0.35
The Fatigue Index
FIt = wA φA(At) + wE φE(Et) + wD φD(Dt)
Each φ is a fixed monotone normalization map to [0,1]. Weights wA = 0.40, wE = 0.35, wD = 0.25 encode domain priors — prompt attention most directly governs instruction following; entropy governs degeneracy and repetition; drift signals longer-horizon instability but is noisier. Weights are frozen across all experiments. Higher FI = greater fatigue. FI ∈ [0,1].
Five axioms any valid fatigue measure must satisfy
A1
Monotonicity
Reduced prompt attention, increased entropy deviation, or increased embedding drift must each monotonically increase FI. Worsening signals must produce worsening scores — never the reverse.
A2
Scale invariance
Monotone reparameterizations of raw signals must preserve the fatigue ordering, ensuring comparability across decoding regimes and numerical scales.
A3
Boundedness
FI lies on a fixed, interpretable scale: FI ∈ [0, 1]. This supports stable thresholds and consistent online monitoring across runs and models without renormalization.
A4
Temporal stability
Small, transient perturbations in signals should not induce large, instantaneous FI changes. Prevents spurious oscillations under inference-time noise. Enforced via hysteresis with distinct activation and deactivation thresholds.
A5
Compositionality
FI must decompose into interpretable per-signal contributions: FI = Σ wk gk(sk,t). This enables attribution — identifying which failure mode is driving a high score — and supports simple stabilizing mechanisms. Together, A1–A3 and A5 characterize the family of valid additive fatigue measures (Theorem E.1).
Mechanisms that drive fatigue
MechanismDescriptionObserved effect
Attention dilution As sequence length grows, softmax attention mass over prompt tokens is diluted; RoPE and ALiBi positional biases further favor recent tokens. Declining At; worse performance when evidence is distant from decode head.
Residual drift Errors compound in the shared residual stream; deviations propagate rather than cancel due to autoregressive updates and LayerNorm dynamics. Monotonic increase in Dt; higher drift aligns with repetition and lower F1.
Entropy collapse Autoregressive training favors sharp predictions; greedy or low-temperature decoding amplifies overconfidence, especially under quantization. Entropy leaves healthy band; repetitive and degenerate outputs late in generation.
Context length stress Long contexts strain numerical precision in KV caches; RoPE phase saturation without scaling degrades long-range recall. Earlier FI onset; biased attention scores; unstable output distributions.
Reduced precision 4-bit quantization destabilizes predictive calibration without disrupting prompt focus or representation stability. Deeper, more variable entropy collapse under NF4 vs. FP16; At and Dt remain similar.
Part II Formalization and Measurement ICML 2026
Abstract

Autoregressive language models frequently degrade during long-horizon generation, producing repetitive text, losing instruction adherence, and exhibiting unstable entropy. Despite the prevalence of these failures, practitioners lack online diagnostics to detect them in real time as they occur. We formalize this degradation as cognitive fatigue, a measurable generation-time state characterized by decay in attention to the original prompt, representational drift, and entropy miscalibration. We introduce the Fatigue Index (FI), a lightweight, model-agnostic diagnostic that aggregates these three signals under explicit axiomsenabling reliable runtime monitoring. Across nine models (1B–13B parameters), FI trajectories exhibit structured temporal dynamics, predict task degradation (AUROC = 0.95) and repetition (ρ = 0.94), and reveal non-monotonic scaling behavior. Stress analyses further show that FI onset accelerates under longer contexts, middle-positioned evidence, and reduced numerical precision. These results establish cognitive fatigue as a coherent and measurable phenomenon, and position FI as a principled tool for runtime reliability monitoring in production LLM systems.

Key results
0.95
AUROC predicting task degradation
0.94
Spearman ρ with repetition ratio
>91%
Jitter reduction via hysteresis alerting
9
Models evaluated · 1B–13B params
Finding 1
Fatigue is domain-agnostic. FI accumulates consistently across HotpotQA (reasoning), TriviaQA (knowledge), and SQuAD (comprehension) — 27,405 generated sequences total — with mean values clustering tightly around 0.82 across all three. This rules out benchmark-specific artifacts and supports fatigue as a degradation process intrinsic to long-horizon decoding itself.
Finding 2
Aggregation is necessary. The full FI achieves AUROC = 0.977 on HotpotQA, significantly outperforming every individual signal in isolation: Entropy only (0.954), Drift only (0.930), Attention only (0.308). Attention alone performs particularly poorly — prompt focus is insufficient as a standalone indicator. The multi-component construction is empirically justified.
Finding 3
Degradation is cumulative, not Markovian. Spearman correlation between FI and repetition is ρ > 0.84 over full sequences but only ρ ≈ 0.40 over the first 20 tokens. Early-warning heuristics based on initial tokens are insufficient — an effective monitoring policy must track the full trajectory.
Finding 4
Context length accelerates FI onset. Longer contexts induce earlier and more sustained collapse of prompt-directed attention. Embedding drift increases across all conditions but becomes more variable with length, while entropy exhibits larger fluctuations — consistent with attention dilution and accumulation of residual deviations under extended sequences.
Finding 5
Middle-positioned evidence is systematically underused. Identical evidence placed at the start of a context receives substantially higher attention than the same evidence placed in the middle or end — a positional bias that accelerates prompt-forgetting and explains the "lost-in-the-middle" performance failures documented in prior work.
Finding 6
Quantization primarily destabilizes entropy. Comparing FP16 and 4-bit NF4 decoding under matched prompts and seeds: attention and drift trajectories remain similar, but entropy exhibits deeper and more variable collapse under quantization. Reduced precision disrupts predictive calibration far more than prompt focus or representation stability.
Non-monotonic scaling with instruction tuning. Instruction-tuned models below 3B parameters exhibit faster entropy collapse than base models under matched decoding — this trend reverses at 7B, where instruction tuning improves entropy calibration. At 13B, aggressive alignment produces a distinct failure mode: Llama-2-13B-Chat collapses into low-entropy refusal templates despite grammatical output ("safety fatigue"). Drift slopes remain approximately constant across all model sizes — larger models do not drift less, they drift more coherently.
Part III Chatsparent — Interactive Detection & Mitigation AAAI 2026 · Demo
Demo video
Overview

Today's chatbot interfaces offer little to no friction: seamless conversations conceal when the model is drifting, hallucinating, or failing. This lack of transparency fosters blind trust, even as models produce unstable or repetitive outputs. Chatsparent makes cognitive fatigue visible, measurable, and actionable. The system instruments all three token-level fatigue signals in real time, fuses them into the Fatigue Index, and streams FI live alongside model outputs, giving users a continuous view of generation reliability. When thresholds are crossed, the interface enables retrain-free interventions that restore generation stability without modifying model weights. By turning passive chatbot interaction into an interactive diagnostic experience, Chatsparent reframes auto-regressive generation as an active control problem.

Retrain-free interventions
SCA · Soft context anchor
Attention-triggered prompt reinsertion
When At falls below a threshold, the original prompt is re-prepended and only a short recent tail of tokens is retained within the context limit. This "break-glass" action refocuses the model without editing the key–value cache. Triggered adaptively when attention crosses threshold τA = 0.010.
PAR · Periodic attention reset
Scheduled context rebuild
At a fixed cadence k, the context is rebuilt as [prompt + recent tail]. PAR produces bumps in attention around reset boundaries, acting as a preventive nudge against gradual decay before thresholds are breached. Reset every 50 tokens with tail_keep = 128 in reported experiments.
ERD · Entropy-regularized decoding
Dynamic temperature adjustment
At each step, temperature T ∈ [Tmin, Tmax] is adjusted to track a target entropy H*: if entropy is too low, increase T; if too high, decrease it. ERD curbs entropy collapse and indirectly flattens attention decay while leaving representation dynamics largely unchanged.
PAUSE · Self-reflection
Chain-of-thought checkpoints
On a fixed cadence or when entropy or drift breach thresholds, the model briefly pauses generation to perform a targeted self-check. Grounding the model's next generation in a re-evaluation of its task context counteracts both attention decay and drift accumulation.
Results — Falcon-7B-Instruct · 4-bit NF4 · HotpotQA
MethodMean Fatigue Index ↓Change vs. baselineLatency (s)
Baseline0.36213.5
ERD0.31−0.05212.5
PAUSE0.31−0.05228.0
SCA0.32−0.04225.1
PAR0.34−0.02222.4

All interventions reduce mean FI with modest latency overhead. ERD achieves the best FI reduction with negligible added latency. Decoding defaults: top-p = 0.95, T = 1.0, max new tokens = 120.


Papers
1
International Conference on Machine Learning (ICML 2026)
Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement
Riju Marwah · Ritvik Garimella · Vishal Pallagani · Atishay Jain · Michael Stewart · Amit Sheth
Formalizes cognitive fatigue as a runtime state variable grounded in three token-level signals. Introduces the Fatigue Index with five explicit axioms and validates it across nine models (1B–13B) on HotpotQA, TriviaQA, and SQuAD under long-context, positional, and precision stress conditions.
2
Association for the Advancement of Artificial Intelligence (AAAI 2026) · Demo
Chatsparent: An Interactive System for Detecting and Mitigating Cognitive Fatigue in LLMs
Riju Marwah · Vishal Pallagani · Ritvik Garimella · Amit Sheth
An interactive system that streams fatigue signals live alongside model outputs and enables four retrain-free interventions (SCA, PAR, ERD, PAUSE). Turns passive chatbot interaction into a diagnostic experience that exposes model dynamics and improves long-horizon reliability without retraining.

Citation
ICML 2026 — Full paper
@inproceedings{marwah2026cognitivefatigue, title = {Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement}, author = {Marwah, Riju and Garimella, Ritvik and Pallagani, Vishal and Jain, Atishay and Stewart, Michael and Sheth, Amit}, booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)}, year = {2026} }
AAAI 2026 — Demo paper
@inproceedings{marwah2026chatsparent, title = {Chatsparent: An Interactive System for Detecting and Mitigating Cognitive Fatigue in {LLMs}}, author = {Marwah, Riju and Pallagani, Vishal and Garimella, Ritvik and Sheth, Amit}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, year = {2026} }