arXiv 2506.22139
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
By Shaojie Zhang, Jiahui Yang, et al.
Published 2025-06-27
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Multimodal Large Language Models (MLLMs) have demonstrated significant success in visual understanding tasks. However, challenges persist in adapting these models for video comprehension due to the large volume of data and temporal complexity. Existing Video-LLMs using uniform frame sampling often struggle to capture the query-related crucial spatiotemporal clues of videos effectively. In this paper, we introduce Q-…