arXiv 2402.12550
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
By James Oldfield, Markos Georgopoulos, et al.
Published 2024-02-19
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. However, a major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization. In this paper, we propose the Multilinear Mixture of Experts ( MoE) layer to add…