arXiv 2505.21097
Thinker: Learning to Think Fast and Slow
By Stephen Chung, Wenyu Du, et al.
Published 2025-05-27
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting…