arXiv 2505.21097

Thinker: Learning to Think Fast and Slow

By Stephen Chung, Wenyu Du, et al.

Published 2025-05-27

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting…

View the original paper on arXiv