arXiv 2505.21097

Thinker: Learning to Think Fast and Slow

By Stephen Chung, Wenyu Du, et al.

Published 2025-05-27

Citation lineage

Review the prior work and downstream research connected to this paper.

Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting…

View the original paper on arXiv