arXiv 2604.07729

Emotion Concepts and their Function in a Large Language Model

By Nicholas Sofroniew, Isaac Kauvar, et al.

Published 2026-04-09

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion…

View the original paper on arXiv