arXiv 2406.11251
Unifying Multimodal Retrieval via Document Screenshot Embedding
By Xueguang Ma, Sheng-Chieh Lin, et al.
Published 2024-06-17
Citation lineage
Review the prior work and downstream research connected to this paper.
In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding (DSE), a novel retrieval paradigm that regards document screenshots as…