arXiv 2505.14685

Language Models use Lookbacks to Track Beliefs

By Nikhil Prakash, Natalie Shapira, et al.

Published 2025-05-20

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze LMs' ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset, CausalToM, consisting of simple stories where two characters independently change the…

View the original paper on arXiv