arXiv 2407.03471

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

By Benno Krojer, Dheeraj Vattikonda, et al.

Published 2024-07-03

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other…

View the original paper on arXiv