arXiv 2510.07077

Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications

By Kento Kawaharazuka, Jihoon Oh, et al.

Published 2025-10-08

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and action data at scale, which have traditionally been studied separately, VLA models aim to learn policies that generalise across diverse tasks, objects, embodiments, and environments.…

View the original paper on arXiv