arXiv 2403.10704

Parameter Efficient Reinforcement Learning from Human Feedback

By Hakim Sidahmed, Samrat Phatale, et al.

Published 2024-03-15

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models (LLMs, and VLMs) with human preferences, its computational cost and complexity hamper its wider adoption. To alleviate some of the computational burden of fine-tuning, parameter efficient methods, like LoRA were introduced. In this work, we empirically evaluate the setup of Parameter Effici…

View the original paper on arXiv