arXiv 2502.16182

IPO: Your Language Model is Secretly a Preference Classifier

By Shivank Garg, Ayush Singh, et al.

Published 2025-02-22

Discussion

Read the public discussion and references gathered around this paper.

Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. While it enables LLMs to achieve human-level alignment, it often incurs significant computational and financial costs due to its reliance on training external reward models or human-labeled preferences. In this work, we propose Implicit Preference Optimization (IPO), a…

View the original paper on arXiv