arXiv 2510.07077

Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications

By Kento Kawaharazuka, Jihoon Oh, et al.

Published 2025-10-08

Citation lineage

Review the prior work and downstream research connected to this paper.

Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and action data at scale, which have traditionally been studied separately, VLA models aim to learn policies that generalise across diverse tasks, objects, embodiments, and environments.…

View the original paper on arXiv