arXiv 2510.15870
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
By Hanrong Ye, Chao-Han Huck Yang, et al.
Published 2025-10-17
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curation. For model architecture, we present three key innovations: (i) OmniAlignNet for strengthening alignment between visi…