arXiv 2410.19750
The Geometry of Concepts: Sparse Autoencoder Feature Structure
By Yuxiao Li, Eric J. Michaud, et al.
Published 2024-10-10
Discussion
Read the public discussion and references gathered around this paper.
Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: 1) The "atomic" small-scale structure contains "crystals" whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man-woman-king-queen). We find…