arXiv 2506.22139

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

By Shaojie Zhang, Jiahui Yang, et al.

Published 2025-06-27

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Multimodal Large Language Models (MLLMs) have demonstrated significant success in visual understanding tasks. However, challenges persist in adapting these models for video comprehension due to the large volume of data and temporal complexity. Existing Video-LLMs using uniform frame sampling often struggle to capture the query-related crucial spatiotemporal clues of videos effectively. In this paper, we introduce Q-…

View the original paper on arXiv