arXiv 2509.00328

Mechanistic interpretability for steering vision-language-action models

By Bear Häon, Kaylene Stocking, et al.

Published 2025-08-30

Citation lineage

Review the prior work and downstream research connected to this paper.

Vision-Language-Action (VLA) models are a promising path to realizing generalist embodied agents that can quickly adapt to new tasks, modalities, and environments. However, methods for interpreting and steering VLAs fall far short of classical robotics pipelines, which are grounded in explicit models of kinematics, dynamics, and control. This lack of mechanistic insight is a central challenge for deploying learned p…

View the original paper on arXiv