arXiv 2506.22139

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

By Shaojie Zhang, Jiahui Yang, et al.

Published 2025-06-27

Citation lineage

Review the prior work and downstream research connected to this paper.

Multimodal Large Language Models (MLLMs) have demonstrated significant success in visual understanding tasks. However, challenges persist in adapting these models for video comprehension due to the large volume of data and temporal complexity. Existing Video-LLMs using uniform frame sampling often struggle to capture the query-related crucial spatiotemporal clues of videos effectively. In this paper, we introduce Q-…

View the original paper on arXiv