arXiv 2212.08073
Constitutional AI: Harmlessness from AI Feedback
By Yuntao Bai, Saurav Kadavath, et al.
Published 2022-12-15
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and…