arXiv 2511.05271

DeepEyesV2: Toward Agentic Multimodal Model

By Jack Hong, Chenxiao Zhao, et al.

Published 2025-11-07

Discussion

Read the public discussion and references gathered around this paper.

Agentic multimodal models should not only comprehend text and images, but also actively invoke external tools, such as code execution environments and web search, and integrate these operations into reasoning. In this work, we introduce DeepEyesV2 and explore how to build an agentic multimodal model from the perspectives of data construction, training methods, and model evaluation. We observe that direct reinforceme…

View the original paper on arXiv