arXiv 2502.16182

IPO: Your Language Model is Secretly a Preference Classifier

By Shivank Garg, Ayush Singh, et al.

Published 2025-02-22

Citation lineage

Review the prior work and downstream research connected to this paper.

Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. While it enables LLMs to achieve human-level alignment, it often incurs significant computational and financial costs due to its reliance on training external reward models or human-labeled preferences. In this work, we propose Implicit Preference Optimization (IPO), a…

View the original paper on arXiv