arXiv 2510.15870
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
By Hanrong Ye, Chao-Han Huck Yang, et al.
Published 2025-10-17
Citation lineage
Review the prior work and downstream research connected to this paper.
Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curation. For model architecture, we present three key innovations: (i) OmniAlignNet for strengthening alignment between visi…