arXiv 2510.24684
SPICE: Self-Play In Corpus Environments Improves Reasoning
By Bo Liu, Chuanyang Jin, et al.
Published 2025-10-28
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics, the Challenger creates an automatic curriculum at t…