arXiv 2510.15870

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

By Hanrong Ye, Chao-Han Huck Yang, et al.

Published 2025-10-17

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curation. For model architecture, we present three key innovations: (i) OmniAlignNet for strengthening alignment between visi…

View the original paper on arXiv