arXiv 2510.15870

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

By Hanrong Ye, Chao-Han Huck Yang, et al.

Published 2025-10-17

Citation lineage

Review the prior work and downstream research connected to this paper.

Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curation. For model architecture, we present three key innovations: (i) OmniAlignNet for strengthening alignment between visi…

View the original paper on arXiv