arXiv 2511.05271
DeepEyesV2: Toward Agentic Multimodal Model
By Jack Hong, Chenxiao Zhao, et al.
Published 2025-11-07
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Agentic multimodal models should not only comprehend text and images, but also actively invoke external tools, such as code execution environments and web search, and integrate these operations into reasoning. In this work, we introduce DeepEyesV2 and explore how to build an agentic multimodal model from the perspectives of data construction, training methods, and model evaluation. We observe that direct reinforceme…