arXiv 2403.10704
Parameter Efficient Reinforcement Learning from Human Feedback
By Hakim Sidahmed, Samrat Phatale, et al.
Published 2024-03-15
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models (LLMs, and VLMs) with human preferences, its computational cost and complexity hamper its wider adoption. To alleviate some of the computational burden of fine-tuning, parameter efficient methods, like LoRA were introduced. In this work, we empirically evaluate the setup of Parameter Effici…