arXiv 2306.08818
Pragmatic Inference with a CLIP Listener for Contrastive Captioning
By Jiefu Ou, Benno Krojer, et al.
Published 2023-06-15
Discussion
Read the public discussion and references gathered around this paper.
We propose a simple yet effective and robust method for contrastive captioning: generating discriminative captions that distinguish target images from very similar alternative distractor images. Our approach is built on a pragmatic inference procedure that formulates captioning as a reference game between a speaker, which produces possible captions describing the target, and a listener, which selects the target give…