arXiv 2504.01002
Token embeddings violate the manifold hypothesis
By Michael Robinson, Sourya Dey, et al.
Published 2025-04-01
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
A full understanding of the behavior of a large language model (LLM) requires our grasp of its input token space. If this space differs from our assumptions, our comprehension of and conclusions about the LLM will likely be flawed. We elucidate the structure of the token embeddings both empirically and theoretically. We present a novel statistical test assuming that the neighborhood around each token has a relativel…