arXiv 2212.08073
Constitutional AI: Harmlessness from AI Feedback
By Yuntao Bai, Saurav Kadavath, et al.
Published 2022-12-15
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and…