arXiv 2507.19457

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

By Lakshya A Agrawal, Shangyin Tan, et al.

Published 2025-07-25

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language can often provide a much richer learning medium for LLMs, compared with policy gradients derived from sparse, scalar rewards. To test this, we i…

View the original paper on arXiv