arXiv 2410.20502

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

By Zongyi Li, Shujie Hu, et al.

Published 2024-10-27

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Text-to-video models have recently undergone rapid and substantial advancements. Nevertheless, due to limitations in data and computational resources, achieving efficient generation of long videos with rich motion dynamics remains a significant challenge. To generate high-quality, dynamic, and temporally consistent long videos, this paper presents ARLON, a novel framework that boosts diffusion Transformers with auto…

View the original paper on arXiv