arXiv 2602.00462
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs
By Benno Krojer, Shravan Nayak, et al.
Published 2026-01-31
Citation lineage
Review the prior work and downstream research connected to this paper.
Transforming a large language model (LLM) into a Vision-Language Model (VLM) can be achieved by mapping the visual tokens from a vision encoder into the embedding space of an LLM. Intriguingly, this mapping can be as simple as a shallow MLP transformation. To understand why LLMs can so readily process visual tokens, we need interpretability methods that reveal what is encoded in the visual token representations at eā¦