arXiv 2407.13766
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
By Tsung-Han Wu, Giscard Biamby, et al.
Published 2024-07-18
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Large Multimodal Models (LMMs) have made significant strides in visual question-answering for single images. Recent advancements like long-context LMMs have allowed them to ingest larger, or even multiple, images. However, the ability to process a large number of visual tokens does not guarantee effective retrieval and reasoning for multi-image question answering (MIQA), especially in real-world applications like ph…