arXiv 2506.14202
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
By Makoto Shing, Masanori Koyama, et al.
Published 2025-06-17
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
End-to-end backpropagation requires storing activations throughout all layers, creating memory bottlenecks that limit model scalability. Existing block-wise training methods offer means to alleviate this problem, but they rely on ad-hoc local objectives and remain largely unexplored beyond classification tasks. We propose , a principled framework for transforming transformer-based networks into genuinely independent…