SOTAVerified

Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting

2026-03-08Unverified0· sign in to hype

Ran Cheng

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Catastrophic forgetting remains a central challenge in continual learning (CL), yet lacks a unified information-theoretic explanation for why some architectures forget catastrophically while others do not. We introduce Context Channel Capacity (C_ctx), the mutual information between a CL architecture's context signal and its generated parameters, and prove that zero forgetting requires C_ctx H(T), where H(T) is the task identity entropy. We establish an Impossibility Triangle -- zero forgetting, online learning, and finite parameters cannot be simultaneously satisfied by sequential state-based learners -- and show that conditional regeneration architectures (HyperNetworks) bypass this triangle by redefining parameters as function values rather than states. We validate this framework across 8 CL methods on Split-MNIST (1,130+ experiments over 86 days, 4 seeds each), showing that C_ctx perfectly predicts forgetting behavior: methods with C_ctx = 0 (NaiveSGD, EWC, SI, LwF, CFlow) exhibit catastrophic forgetting (6--97\%), while methods with C_ctx 1 (HyperNetwork) achieve zero forgetting (98.8\% ACC). We further propose Wrong-Context Probing (P5), a practical diagnostic protocol for measuring C_ctx, and extend the framework to CIFAR-10 via a novel Gradient Context Encoder that closes the oracle gap from 23.3pp to 0.7pp. A systematic taxonomy of 15+ closed research directions -- including the Hebbian null result (frozen random features outperform learned features), CFlow's θ_0-memorizer phenomenon, and the S_N symmetry barrier to column specialization -- provides the community with precisely diagnosed negative results. Our central design principle: architecture over algorithm -- the context pathway must be structurally unbypassable.

Reproductions