arXiv 2312.17432

Video Understanding with Large Language Models: A Survey

By Yolo Yunlong Tang, Jing Bi, et al.

Published 2023-12-29

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding that harness the power of LLMs (Vid-LLMs). The emergent c…

View the original paper on arXiv