arXiv 2604.11615

CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

By Jinpeng Ye, Chongxi Wang, et al.

Published 2026-04-13

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design overhead. Tight coupling with the CPU pipeline complicates integration across diverse CPUs, while fine-grained synchronous instructions hinder the development of high-performance kernels. This paper proposes a unified and…

View the original paper on arXiv