arXiv 2512.23675
End-to-End Test-Time Training for Long Context
By Arnuv Tandon, Karan Dalal, et al.
Published 2025-12-29
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture -- a Transformer with sliding-window attention. However, our model continues learning at test time via next-token prediction on the given context, compressing the context it reads into its weights. In addition, we improve the model's initializatio…