arXiv 2103.00020

Learning Transferable Visual Models From Natural Language Supervision

By Alec Radford, Jong Wook Kim, et al.

Published 2021-02-26

Discussion

Read the public discussion and references gathered around this paper.

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple…

View the original paper on arXiv