arXiv 2505.14685

Language Models use Lookbacks to Track Beliefs

By Nikhil Prakash, Natalie Shapira, et al.

Published 2025-05-20

Citation lineage

Review the prior work and downstream research connected to this paper.

How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze LMs' ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset, CausalToM, consisting of simple stories where two characters independently change the…

View the original paper on arXiv