arXiv 2502.12632

MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

By Sihyun Yu, Meera Hahn, et al.

Published 2025-02-18

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Diffusion models are successful for synthesizing high-quality videos but are limited to generating short clips (e.g., 2-10 seconds). Synthesizing sustained footage (e.g. over minutes) still remains an open research question. In this paper, we propose MALT Diffusion (using Memory-Augmented Latent Transformers), a new diffusion model specialized for long video generation. MALT Diffusion (or just MALT) handles long vid…

View the original paper on arXiv