arXiv 2509.00328

Mechanistic interpretability for steering vision-language-action models

By Bear Häon, Kaylene Stocking, et al.

Published 2025-08-30

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Vision-Language-Action (VLA) models are a promising path to realizing generalist embodied agents that can quickly adapt to new tasks, modalities, and environments. However, methods for interpreting and steering VLAs fall far short of classical robotics pipelines, which are grounded in explicit models of kinematics, dynamics, and control. This lack of mechanistic insight is a central challenge for deploying learned p…

View the original paper on arXiv