arXiv 2510.06652

Aligning Large Language Models via Fully Self-Synthetic Data

By Shangjian Yin, Zhepei Wei, et al.

Published 2025-10-08

Citation lineage

Review the prior work and downstream research connected to this paper.

Traditional reinforcement learning from human feedback (RLHF) for large language models (LLMs) relies on expensive human-annotated datasets, while Reinforcement Learning from AI Feedback (RLAIF) also incurs significant costs, requiring the collection of diverse prompts and corresponding responses, often necessitating external reward models or proprietary models like GPT-4 to annotate preference pairs. In this work,…

View the original paper on arXiv