arXiv 2512.23675

End-to-End Test-Time Training for Long Context

By Arnuv Tandon, Karan Dalal, et al.

Published 2025-12-29

Citation lineage

Review the prior work and downstream research connected to this paper.

We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture -- a Transformer with sliding-window attention. However, our model continues learning at test time via next-token prediction on the given context, compressing the context it reads into its weights. In addition, we improve the model's initializatio…

View the original paper on arXiv