arXiv 2212.08073

Constitutional AI: Harmlessness from AI Feedback

By Yuntao Bai, Saurav Kadavath, et al.

Published 2022-12-15

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and…

View the original paper on arXiv