arXiv 2504.13936

ViMo: A Generative Visual GUI World Model for App Agents

By Dezhao Luo, Bohan Tang, et al.

Published 2025-04-15

Discussion

Read the public discussion and references gathered around this paper.

App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effective agent planning. H…

View the original paper on arXiv