arXiv 2506.22139
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
By Shaojie Zhang, Jiahui Yang, et al.
Published 2025-06-27
Citation lineage
Review the prior work and downstream research connected to this paper.
Multimodal Large Language Models (MLLMs) have demonstrated significant success in visual understanding tasks. However, challenges persist in adapting these models for video comprehension due to the large volume of data and temporal complexity. Existing Video-LLMs using uniform frame sampling often struggle to capture the query-related crucial spatiotemporal clues of videos effectively. In this paper, we introduce Q-…