arXiv 2212.08073
Constitutional AI: Harmlessness from AI Feedback
By Yuntao Bai, Saurav Kadavath, et al.
Published 2022-12-15
Citation lineage
Review the prior work and downstream research connected to this paper.
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and…