arXiv 2305.16397
Are Diffusion Models Vision-And-Language Reasoners?
By Benno Krojer, Elinor Poole-Dayan, et al.
Published 2023-05-25
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we t…