arXiv 2511.04670

Cambrian-S: Towards Spatial Supersensing in Video

By Shusheng Yang, Jihan Yang, et al.

Published 2025-11-06

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We argue that progress in true multimodal intelligence calls for a shift from reactive, task-driven systems and brute-force long context towards a broader paradigm of supersensing. We frame spatial supersensing as four stages beyond linguistic-only understanding: semantic perception (naming what is seen), streaming event cognition (maintaining memory across continuous experiences), implicit 3D spatial cognition (inf…

View the original paper on arXiv