arXiv 2510.15870
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
By Hanrong Ye, Chao-Han Huck Yang, et al.
Published 2025-10-17
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curation. For model architecture, we present three key innovations: (i) OmniAlignNet for strengthening alignment between visi…