arXiv 2505.09343
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
By Chenggang Zhao, Chengqi Deng, et al.
Published 2025-05-14
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This pā¦