arXiv 2506.22139
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
By Shaojie Zhang, Jiahui Yang, et al.
Published 2025-06-27
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Multimodal Large Language Models (MLLMs) have demonstrated significant success in visual understanding tasks. However, challenges persist in adapting these models for video comprehension due to the large volume of data and temporal complexity. Existing Video-LLMs using uniform frame sampling often struggle to capture the query-related crucial spatiotemporal clues of videos effectively. In this paper, we introduce Q-…