arXiv 2604.07729

Emotion Concepts and their Function in a Large Language Model

By Nicholas Sofroniew, Isaac Kauvar, et al.

Published 2026-04-09

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion…

View the original paper on arXiv