arXiv 2510.08377
UniVideo: Unified Understanding, Generation, and Editing for Videos
By Cong Wei, Quande Liu, et al.
Published 2025-10-09
Citation lineage
Review the prior work and downstream research connected to this paper.
Unified multimodal models have shown promising results in multimodal content generation and editing but remain largely limited to the image domain. In this work, we present UniVideo, a versatile framework that extends unified modeling to the video domain. UniVideo adopts a dual-stream design, combining a Multimodal Large Language Model (MLLM) for instruction understanding with a Multimodal DiT (MMDiT) for video gene…