arXiv 2510.08377
UniVideo: Unified Understanding, Generation, and Editing for Videos
By Cong Wei, Quande Liu, et al.
Published 2025-10-09
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Unified multimodal models have shown promising results in multimodal content generation and editing but remain largely limited to the image domain. In this work, we present UniVideo, a versatile framework that extends unified modeling to the video domain. UniVideo adopts a dual-stream design, combining a Multimodal Large Language Model (MLLM) for instruction understanding with a Multimodal DiT (MMDiT) for video gene…