arXiv 2603.01666

Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations

By Yibo Yan, Mingdong Ou, et al.

Published 2026-03-02

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Harnessing the full potential of visually-rich documents requires retrieval systems that understand not just text, but intricate layouts, a core challenge in Visual Document Retrieval (VDR). The prevailing multi-vector architectures, while powerful, face a crucial storage bottleneck that current optimization strategies, such as embedding merging, pruning, or using abstract tokens, fail to resolve without compromisin…

View the original paper on arXiv