arXiv 2505.09343
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
By Chenggang Zhao, Chengqi Deng, et al.
Published 2025-05-14
Discussion
Read the public discussion and references gathered around this paper.
The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This pā¦