arXiv 2310.02567

Improving Automatic VQA Evaluation Using Large Language Models

By Oscar Mañas, Benno Krojer, et al.

Published 2023-10-04

Citation lineage

Review the prior work and downstream research connected to this paper.

8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the performance of VQA…

View the original paper on arXiv