arXiv 2203.12602
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
By Zhan Tong, Yibing Song, et al.
Published 2022-03-23
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets. In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP). We are inspired by the recent ImageMAE and propose customized video tube masking with an extremely high ratio. This simple design make…