arXiv 2505.14362

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

By Ziwei Zheng, Michael Yang, et al.

Published 2025-05-20

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large Vision-Language Models excel at multimodal understanding but struggle to deeply integrate visual information into their predominantly text-based reasoning processes, a key challenge in mirroring human cognition. To address this, we introduce DeepEyes, a model that learns to "think with images", trained end-to-end with reinforcement learning without requiring pre-collected reasoning data for cold-start supervis…

View the original paper on arXiv