arXiv 2406.04325

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

By Lin Chen, Xilin Wei, et al.

Published 2024-06-06

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating strategy. 2) ShareCap…

View the original paper on arXiv