arXiv 2602.03992

Nemotron ColEmbed V2: Top-Performing Late Interaction Embedding Models for Visual Document Retrieval

By Gabriel de Souza P. Moreira, Ronay Ak, et al.

Published 2026-02-03

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Retrieval-Augmented Generation (RAG) systems have been popular for generative applications, powering language models by injecting external knowledge. Companies have been trying to leverage their large catalog of documents (e.g. PDFs, presentation slides) in such RAG pipelines, whose first step is the retrieval component. Dense retrieval has been a popular approach, where embedding models are used to generate a dense…

View the original paper on arXiv