arXiv 2311.05997
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
By Zihao Wang, Shaofei Cai, et al.
Published 2023-11-10
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progresses. We introduce…