arXiv 2509.24372

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

By Xin Qiu, Yulu Gan, et al.

Published 2025-09-29

Discussion

Read the public discussion and references gathered around this paper.

Fine-tuning pre-trained large language models (LLMs) for down-stream tasks is a critical step in the AI deployment pipeline. Reinforcement learning (RL) is arguably the most prominent fine-tuning method, contributing to the birth of many state-of-the-art LLMs. In contrast, evolution strategies (ES), which once showed comparable performance to RL on models with a few million parameters, was neglected due to the pessi…

View the original paper on arXiv