arXiv 2304.07193

DINOv2: Learning Robust Visual Features without Supervision

By Maxime Oquab, Timothée Darcet, et al.

Published 2023-04-14

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods,…

View the original paper on arXiv