arXiv 2402.12550

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

By James Oldfield, Markos Georgopoulos, et al.

Published 2024-02-19

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. However, a major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization. In this paper, we propose the Multilinear Mixture of Experts ( MoE) layer to add…

View the original paper on arXiv