arXiv 2505.09343

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

By Chenggang Zhao, Chengqi Deng, et al.

Published 2025-05-14

Discussion

Read the public discussion and references gathered around this paper.

The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This p…

View the original paper on arXiv