arXiv 2403.10704

Parameter Efficient Reinforcement Learning from Human Feedback

By Hakim Sidahmed, Samrat Phatale, et al.

Published 2024-03-15

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models (LLMs, and VLMs) with human preferences, its computational cost and complexity hamper its wider adoption. To alleviate some of the computational burden of fine-tuning, parameter efficient methods, like LoRA were introduced. In this work, we empirically evaluate the setup of Parameter Effici…

View the original paper on arXiv