arXiv 2510.02425

Words That Make Language Models Perceive

By Sophie L. Wang, Phillip Isola, et al.

Published 2025-10-02

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large language models (LLMs) trained purely on text ostensibly lack any direct perceptual experience, yet their internal representations are implicitly shaped by multimodal regularities encoded in language. We test the hypothesis that explicit sensory prompting can surface this latent structure, bringing a text-only LLM into closer representational alignment with specialist vision and audio encoders. When a sensory…

View the original paper on arXiv