arXiv 2312.17432
Video Understanding with Large Language Models: A Survey
By Yolo Yunlong Tang, Jing Bi, et al.
Published 2023-12-29
Discussion
Read the public discussion and references gathered around this paper.
With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding that harness the power of LLMs (Vid-LLMs). The emergent cā¦