arXiv 2512.07829
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
By Yuan Gao, Chen Chen, et al.
Published 2025-12-08
Citation lineage
Review the prior work and downstream research connected to this paper.
Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, there has been growing interest in leveraging high-quality pre-trained visual representations, either by aligning them inside VAEs or directly within the generative model. However, adapting such representations remains challenging due to fundamental mismatche…