arXiv 2507.19457

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

By Lakshya A Agrawal, Shangyin Tan, et al.

Published 2025-07-25

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language can often provide a much richer learning medium for LLMs, compared with policy gradients derived from sparse, scalar rewards. To test this, we i…

View the original paper on arXiv