arXiv 2511.02817
Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
By Amanda Bertsch, Adithya Pratapa, et al.
Published 2025-11-04
Citation lineage
Review the prior work and downstream research connected to this paper.
As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently been released, these evaluations tend to rely on retrieval from one or more sections of the context, which allows nearly all of the context tokens to be disregarded as noise. This represents only one type of task that…