arXiv 2505.14362
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
By Ziwei Zheng, Michael Yang, et al.
Published 2025-05-20
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Large Vision-Language Models excel at multimodal understanding but struggle to deeply integrate visual information into their predominantly text-based reasoning processes, a key challenge in mirroring human cognition. To address this, we introduce DeepEyes, a model that learns to "think with images", trained end-to-end with reinforcement learning without requiring pre-collected reasoning data for cold-start supervis…