arXiv 2505.19640

Interleaved Reasoning for Large Language Models via Reinforcement Learning

By Roy Xie, David Qiu, et al.

Published 2025-05-26

Citation lineage

Review the prior work and downstream research connected to this paper.

Long chain-of-thought (CoT) significantly enhances large language models' (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inherently possess the…

View the original paper on arXiv