arXiv 2505.14685
Language Models use Lookbacks to Track Beliefs
By Nikhil Prakash, Natalie Shapira, et al.
Published 2025-05-20
Citation lineage
Review the prior work and downstream research connected to this paper.
How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze LMs' ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset, CausalToM, consisting of simple stories where two characters independently change the…