arXiv 2505.14685

Language Models use Lookbacks to Track Beliefs

By Nikhil Prakash, Natalie Shapira, et al.

Published 2025-05-20

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze LMs' ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset, CausalToM, consisting of simple stories where two characters independently change the…

View the original paper on arXiv