arXiv 2511.02817

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities

By Amanda Bertsch, Adithya Pratapa, et al.

Published 2025-11-04

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently been released, these evaluations tend to rely on retrieval from one or more sections of the context, which allows nearly all of the context tokens to be disregarded as noise. This represents only one type of task that…

View the original paper on arXiv