arXiv 2502.16182
IPO: Your Language Model is Secretly a Preference Classifier
By Shivank Garg, Ayush Singh, et al.
Published 2025-02-22
Discussion
Read the public discussion and references gathered around this paper.
Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. While it enables LLMs to achieve human-level alignment, it often incurs significant computational and financial costs due to its reliance on training external reward models or human-labeled preferences. In this work, we propose Implicit Preference Optimization (IPO), a…