arXiv 2306.08818

Pragmatic Inference with a CLIP Listener for Contrastive Captioning

By Jiefu Ou, Benno Krojer, et al.

Published 2023-06-15

Discussion

Read the public discussion and references gathered around this paper.

We propose a simple yet effective and robust method for contrastive captioning: generating discriminative captions that distinguish target images from very similar alternative distractor images. Our approach is built on a pragmatic inference procedure that formulates captioning as a reference game between a speaker, which produces possible captions describing the target, and a listener, which selects the target give…

View the original paper on arXiv