arXiv 2407.01449

ColPali: Efficient Document Retrieval with Vision Language Models

By Manuel Faysse, Hugues Sibille, et al.

Published 2024-06-27

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Documents are visually rich structures that convey information through text, but also figures, page layouts, tables, or even fonts. Since modern retrieval systems mainly rely on the textual information they extract from document pages to index documents -often through lengthy and brittle processes-, they struggle to exploit key visual cues efficiently. This limits their capabilities in many practical document retrie…

View the original paper on arXiv