arXiv 2506.14202
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
By Makoto Shing, Masanori Koyama, et al.
Published 2025-06-17
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
End-to-end backpropagation requires storing activations throughout all layers, creating memory bottlenecks that limit model scalability. Existing block-wise training methods offer means to alleviate this problem, but they rely on ad-hoc local objectives and remain largely unexplored beyond classification tasks. We propose , a principled framework for transforming transformer-based networks into genuinely independent…