arXiv 2511.05271
DeepEyesV2: Toward Agentic Multimodal Model
By Jack Hong, Chenxiao Zhao, et al.
Published 2025-11-07
Citation lineage
Review the prior work and downstream research connected to this paper.
Agentic multimodal models should not only comprehend text and images, but also actively invoke external tools, such as code execution environments and web search, and integrate these operations into reasoning. In this work, we introduce DeepEyesV2 and explore how to build an agentic multimodal model from the perspectives of data construction, training methods, and model evaluation. We observe that direct reinforceme…