arXiv 2505.03729

Visual Imitation Enables Contextual Humanoid Control

By Arthur Allshire, Hongsuk Choi, et al.

Published 2025-05-06

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

How can we teach humanoids to climb staircases and sit on chairs using the surrounding environment context? Arguably, the simplest way is to just show them-casually capture a human motion video and feed it to humanoids. We introduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everyday videos, jointly reconstructs the humans and the environment, and produces whole-body control policies for humanoid robots t…

View the original paper on arXiv