arXiv 2103.00020
Learning Transferable Visual Models From Natural Language Supervision
By Alec Radford, Jong Wook Kim, et al.
Published 2021-02-26
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple…