arXiv 2510.07686

Stress-Testing Model Specs Reveals Character Differences among Language Models

By Jifan Zhang, Henry Sleight, et al.

Published 2025-10-09

Citation lineage

Review the prior work and downstream research connected to this paper.

Large language models (LLMs) are increasingly trained from AI constitutions and model specifications that establish behavioral guidelines and ethical principles. However, these specifications face critical challenges, including internal conflicts between principles and insufficient coverage of nuanced scenarios. We present a systematic methodology for stress-testing model character specifications, automatically iden…

View the original paper on arXiv