arXiv 2212.08073

Constitutional AI: Harmlessness from AI Feedback

By Yuntao Bai, Saurav Kadavath, et al.

Published 2022-12-15

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and…

View the original paper on arXiv