arXiv 2511.05271
DeepEyesV2: Toward Agentic Multimodal Model
By Jack Hong, Chenxiao Zhao, et al.
Published 2025-11-07
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Agentic multimodal models should not only comprehend text and images, but also actively invoke external tools, such as code execution environments and web search, and integrate these operations into reasoning. In this work, we introduce DeepEyesV2 and explore how to build an agentic multimodal model from the perspectives of data construction, training methods, and model evaluation. We observe that direct reinforceme…