arXiv 2311.05997

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

By Zihao Wang, Shaofei Cai, et al.

Published 2023-11-10

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progresses. We introduce…

View the original paper on arXiv