arXiv 2508.01119

The Promise of RL for Autoregressive Image Editing

By Saba Ahmadi, Rabiul Awal, et al.

Published 2025-08-01

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We explore three strategies to enhance performance on a wide range of image editing tasks: supervised fine-tuning (SFT), reinforcement learning (RL), and Chain-of-Thought (CoT) reasoning. In order to study all these components in one consistent framework, we adopt an autoregressive multimodal model that processes textual and visual tokens in a unified manner. We find RL combined with a large multi-modal LLM verifier…

View the original paper on arXiv