arXiv 2601.10387

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

By Christina Lu, Jack Gallagher, et al.

Published 2026-01-15

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We investigate the structure of the space of model personas by extracting activation directions corresponding to diverse character archetypes. Across several different models, we find that the leading component of this persona space is an "Assistant Axis," which captures th…

View the original paper on arXiv