arXiv 2602.03992
Nemotron ColEmbed V2: Top-Performing Late Interaction Embedding Models for Visual Document Retrieval
By Gabriel de Souza P. Moreira, Ronay Ak, et al.
Published 2026-02-03
Discussion
Read the public discussion and references gathered around this paper.
Retrieval-Augmented Generation (RAG) systems have been popular for generative applications, powering language models by injecting external knowledge. Companies have been trying to leverage their large catalog of documents (e.g. PDFs, presentation slides) in such RAG pipelines, whose first step is the retrieval component. Dense retrieval has been a popular approach, where embedding models are used to generate a dense…