arXiv 2212.03191

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

By Yi Wang, Kunchang Li, et al.

Published 2022-12-06

Discussion

Read the public discussion and references gathered around this paper.

The foundation models have recently shown excellent performance on a variety of downstream tasks in computer vision. However, most existing vision foundation models simply focus on image-level pretraining and adpation, which are limited for dynamic and complex video-level understanding tasks. To fill the gap, we present general video foundation models, InternVideo, by taking advantage of both generative and discrimi…

View the original paper on arXiv