arXiv 2406.11251

Unifying Multimodal Retrieval via Document Screenshot Embedding

By Xueguang Ma, Sheng-Chieh Lin, et al.

Published 2024-06-17

Citation lineage

Review the prior work and downstream research connected to this paper.

In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding (DSE), a novel retrieval paradigm that regards document screenshots as…

View the original paper on arXiv