arXiv 2506.15953

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation

By Liang Heng, Haoran Geng, et al.

Published 2025-06-19

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Dexterous manipulation is a cornerstone capability for robotic systems aiming to interact with the physical world in a human-like manner. Although vision-based methods have advanced rapidly, tactile sensing remains crucial for fine-grained control, particularly in unstructured or visually occluded settings. We present ViTacFormer, a representation-learning approach that couples a cross-attention encoder to fuse high…

View the original paper on arXiv