arXiv 2505.24473

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy

By Nikita Balagansky, Yaroslav Aksenov, et al.

Published 2025-05-30

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Sparse Autoencoders (SAEs) have proven to be powerful tools for interpreting neural networks by decomposing hidden representations into disentangled, interpretable features via sparsity constraints. However, conventional SAEs are constrained by the fixed sparsity level chosen during training; meeting different sparsity requirements therefore demands separate models and increases the computational footprint during bo…

View the original paper on arXiv