arXiv 2510.24684

SPICE: Self-Play In Corpus Environments Improves Reasoning

By Bo Liu, Chuanyang Jin, et al.

Published 2025-10-28

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics, the Challenger creates an automatic curriculum at t…

View the original paper on arXiv