arXiv 2103.00020

Learning Transferable Visual Models From Natural Language Supervision

By Alec Radford, Jong Wook Kim, et al.

Published 2021-02-26

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple…

View the original paper on arXiv