arXiv 2310.02567
Improving Automatic VQA Evaluation Using Large Language Models
By Oscar Mañas, Benno Krojer, et al.
Published 2023-10-04
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the performance of VQA…