arXiv 2502.14499

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

By Deepak Nathani, Lovish Madaan, et al.

Published 2025-02-20

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. MLGym-bench consists of 13 diverse and open-ended AI research tasks from diverse domains such as computer vision, natural language p…

View the original paper on arXiv