arXiv 2604.10167

Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval

By Yibo Yan, Mingdong Ou, et al.

Published 2026-04-11

Citation lineage

Review the prior work and downstream research connected to this paper.

Multi-vector models dominate Visual Document Retrieval (VDR) due to their fine-grained matching capabilities, but their high storage and computational costs present a major barrier to practical deployment. In this paper, we propose ColChunk, a plug-and-play framework that introduces multimodal late chunking to construct efficient, contextualized multi-vectors. Unlike existing pruning or fixed-token approaches, ColCh…

View the original paper on arXiv