arXiv 2510.07686

Stress-Testing Model Specs Reveals Character Differences among Language Models

By Jifan Zhang, Henry Sleight, et al.

Published 2025-10-09

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language models (LLMs) are increasingly trained from AI constitutions and model specifications that establish behavioral guidelines and ethical principles. However, these specifications face critical challenges, including internal conflicts between principles and insufficient coverage of nuanced scenarios. We present a systematic methodology for stress-testing model character specifications, automatically iden…

View the original paper on arXiv