arXiv 2510.06652
Aligning Large Language Models via Fully Self-Synthetic Data
By Shangjian Yin, Zhepei Wei, et al.
Published 2025-10-08
Citation lineage
Review the prior work and downstream research connected to this paper.
Traditional reinforcement learning from human feedback (RLHF) for large language models (LLMs) relies on expensive human-annotated datasets, while Reinforcement Learning from AI Feedback (RLAIF) also incurs significant costs, requiring the collection of diverse prompts and corresponding responses, often necessitating external reward models or proprietary models like GPT-4 to annotate preference pairs. In this work,…