arXiv 2511.02817

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities

By Amanda Bertsch, Adithya Pratapa, et al.

Published 2025-11-04

Citation lineage

Review the prior work and downstream research connected to this paper.

As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently been released, these evaluations tend to rely on retrieval from one or more sections of the context, which allows nearly all of the context tokens to be disregarded as noise. This represents only one type of task that…

View the original paper on arXiv