arXiv 2510.15870

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

By Hanrong Ye, Chao-Han Huck Yang, et al.

Published 2025-10-17

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curation. For model architecture, we present three key innovations: (i) OmniAlignNet for strengthening alignment between visi…

View the original paper on arXiv