arXiv 2410.19750

The Geometry of Concepts: Sparse Autoencoder Feature Structure

By Yuxiao Li, Eric J. Michaud, et al.

Published 2024-10-10

Discussion

Read the public discussion and references gathered around this paper.

Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: 1) The "atomic" small-scale structure contains "crystals" whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man-woman-king-queen). We find…

View the original paper on arXiv